Optimizing Hash Table Storage for Efficient Record Retrieval

  • Comp Sci
  • Thread starter JonnyG
  • Start date
  • Tags
    Table
In summary, the conversation discusses the implementation of a database with a hash table data structure. The question arises on where to store the hash table, which is required to hold about 100 records that are 128 bytes each. The suggestion of storing the hash table as an array of references to files in main memory is brought up, but it is deemed not feasible due to the program needing to keep track of multiple files and not following the given requirements. It is suggested that the hash table and data should be stored as a single file, with the use of the fseek() command for efficient modification and saving.
  • #1
JonnyG
234
45
Homework Statement
Implement a database stored on disk using bucket hashing. Define records to
be 128 bytes long with a 4-byte key and 120 bytes of data. The remaining
4 bytes are available for you to store necessary information to support the
hash table. A bucket in the hash table will be 1024 bytes long, so each bucket
has space for 8 records. The hash table should consist of 27 buckets (total
space for 216 records with slots indexed by positions 0 to 215) followed by
the overflow bucket at record position 216 in the file. The hash function for
key value K should be K mod 213. (Note that this means the last three
slots in the table will not be home positions for any record.) The collision
resolution function should be linear probing with wrap-around within the
bucket. For example, if a record is hashed to slot 5, the collision resolution
process will attempt to insert the record into the table in the order 5, 6, 7, 0,
1, 2, 3, and finally 4. If a bucket is full, the record should be placed in the
overflow section at the end of the file.

Your hash table should implement the dictionary ADT of Section 4.4. When
you do your testing, assume that the system is meant to store about 100 or so
records at a time.
Relevant Equations
No relevant equations. I want to implement this hash table in c++
I understand hashing (at least in theory), but I am struggling with actual implementation. I was wondering what the best way to go about this problem was, and this just led to more questions. Now I am confused.

1) The database has to be stored on disk. Let's say it's stored as a folder of individual files, with each file containing a record. But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size. I can fit this in main memory but the question asks me to store the hash table on disk. But then every time I want to search for a record, I would have to make a disk access, which would be relatively slow.

Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead? (maybe store a string that contains the path and filename). A string would be 32 bytes (I used the sizeof operator in c++ to check this) and so an array of 100 strings would easily fit into main memory. This would mean that there are no disk accesses when searching in the hash table and so the program should be faster and would take up less space than if I store the entire actual records in the array (where each record is 128 byes).

Am I thinking about this the wrong way?
 
Last edited:
Physics news on Phys.org
  • #2
The hash table can be stored on disk when it is modified and saved, but it can be loaded into memory for interactive use.
 
  • #3
JonnyG said:
Let's say it's stored as a folder of individual files, with each file containing a record.
That's not reasonable -- it should be stored as a single file that contains the 27 buckets with their 216 records.
JonnyG said:
But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size.
It's stored on disk -- that's one of the requirements. The idea here is to have an idea of how to work with data that could conceivably be too large to fit into main memory.
JonnyG said:
Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead?
No. For one thing your program would need to keep track of the multitude of files. For another thing, your program really needs to follow the requirements that are given.
 
  • Like
Likes FactChecker
  • #4
The hash table and data should be stored as a single file that is read in, modified in memory, and saved when appropriate.
 
  • Like
Likes JonnyG
  • #5
Hint: fseek() command.
 

FAQ: Optimizing Hash Table Storage for Efficient Record Retrieval

What is a hash table?

A hash table is a data structure that stores data in key-value pairs. It uses a hashing function to map keys to indices in an array, making it efficient for retrieving and storing data.

What are the advantages of using a hash table?

Hash tables have a fast lookup time, making it efficient for retrieving data. They also have a flexible size, as the size of the array can be adjusted as needed. Additionally, hash tables can handle large amounts of data and are useful for implementing caching and indexing.

How do you implement a hash table?

To implement a hash table, you first need to choose a hashing function that will map keys to indices in the array. Then, you need to create an array with the desired size and initialize it with null values. Next, you can insert data into the hash table by hashing the key and using the resulting index to store the value in the array. Lastly, you can retrieve data by hashing the key and using the resulting index to access the value in the array.

What is collision resolution in a hash table?

Collision resolution is the process of handling situations where two or more keys map to the same index in the array. There are various methods for collision resolution, such as separate chaining and open addressing, that determine how to handle these collisions and ensure that data is not lost.

What is the time complexity for operations on a hash table?

The time complexity for operations on a hash table, such as insertion, retrieval, and deletion, is O(1) on average. This means that the time it takes to perform these operations does not depend on the size of the data stored in the hash table, making it a fast and efficient data structure.

Similar threads

Replies
3
Views
926
Replies
2
Views
1K
Replies
21
Views
2K
Replies
17
Views
5K
Replies
5
Views
2K
Replies
7
Views
1K
Back
Top