- #1
JonnyG
- 234
- 45
- Homework Statement
- Implement a database stored on disk using bucket hashing. Define records to
be 128 bytes long with a 4-byte key and 120 bytes of data. The remaining
4 bytes are available for you to store necessary information to support the
hash table. A bucket in the hash table will be 1024 bytes long, so each bucket
has space for 8 records. The hash table should consist of 27 buckets (total
space for 216 records with slots indexed by positions 0 to 215) followed by
the overflow bucket at record position 216 in the file. The hash function for
key value K should be K mod 213. (Note that this means the last three
slots in the table will not be home positions for any record.) The collision
resolution function should be linear probing with wrap-around within the
bucket. For example, if a record is hashed to slot 5, the collision resolution
process will attempt to insert the record into the table in the order 5, 6, 7, 0,
1, 2, 3, and finally 4. If a bucket is full, the record should be placed in the
overflow section at the end of the file.
Your hash table should implement the dictionary ADT of Section 4.4. When
you do your testing, assume that the system is meant to store about 100 or so
records at a time.
- Relevant Equations
- No relevant equations. I want to implement this hash table in c++
I understand hashing (at least in theory), but I am struggling with actual implementation. I was wondering what the best way to go about this problem was, and this just led to more questions. Now I am confused.
1) The database has to be stored on disk. Let's say it's stored as a folder of individual files, with each file containing a record. But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size. I can fit this in main memory but the question asks me to store the hash table on disk. But then every time I want to search for a record, I would have to make a disk access, which would be relatively slow.
Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead? (maybe store a string that contains the path and filename). A string would be 32 bytes (I used the sizeof operator in c++ to check this) and so an array of 100 strings would easily fit into main memory. This would mean that there are no disk accesses when searching in the hash table and so the program should be faster and would take up less space than if I store the entire actual records in the array (where each record is 128 byes).
Am I thinking about this the wrong way?
1) The database has to be stored on disk. Let's say it's stored as a folder of individual files, with each file containing a record. But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size. I can fit this in main memory but the question asks me to store the hash table on disk. But then every time I want to search for a record, I would have to make a disk access, which would be relatively slow.
Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead? (maybe store a string that contains the path and filename). A string would be 32 bytes (I used the sizeof operator in c++ to check this) and so an array of 100 strings would easily fit into main memory. This would mean that there are no disk accesses when searching in the hash table and so the program should be faster and would take up less space than if I store the entire actual records in the array (where each record is 128 byes).
Am I thinking about this the wrong way?
Last edited: