Skip to content

Hashing

Static Hashing

  • A bucket is a unit of storage with one or more entries. (typically a disk block).
    • A bucket's adress is the output of a hash function, with the input being the index.
    • The bucket contains pointer/s to any row/s.
  • In a hash index, bucket store pointers to records
  • In a hash file-organisation, buckets store the records themselves
  • This doesn't work well when a database significantly grows or shrinks. We can restructure the index, but that leads to a non-uniform time expenditure.

Overflow Buckets

  • Overflow buckets exist for when a bucket gets filled up.
  • This can happen due to a lack of buckets, or a skew in distribution of records.
  • Overflow buckets are typically kept as a linked list.

Dynamic Hashing

  • In Periodic Rehashing, hashing is done periodically. Say if the number of entries is 1.5x the size of the hash table. We make a new table of larger size, and reload all entries into them
  • Linear Hashing does rehashing in an incremental manner.
  • Extendable Hashing allows sharing of a bucket by multiple hash values, allowing to increase the number of hash entries without increasing number of buckets.

Hashing vs. Ordered Indexing

  • Hashes involve periodic reorganisation, which is suboptimal in databases with lots of insertions and deletions.
  • Hashes are better at retrieving records of a specific key, ordered indices are better at pulling out ranges. Hashing is not used a lot in practice