Skip to content

Hash-join

Hash-Join

Given two tables, if one table has a hash-index already, bring that index table to the main memory if possible. Then perform Indexed nested-loop join.

The optimiser may generate hash-tables automatically. This is significantly fast if the hash-table fits completely into the memory.

Pseudo-Code

Pull hashed index \(H(I)\) into the memory. For each requested record ID \(P\), put \(P\) into the hash function to get \(H(P)\). lookup \(H(P)\) in \(H(I)\). over all records \(\{R_i\}\) in \(H(I)\), if the indices also match, select the record.

Large Table

If the table is large, partition it into smaller tables, by using a hashing function \(h\), different from the hashing code for the indices.

Generate partitioned hash indices for tables \(R\) and \(S\).

For each partition \((B_i, B_j)\) in \(R, S\), Load \((B_i, B_j)\) into memory. For each searched record \(P\) in \(R\), Lookup tuples in \(R\) with the same hash as \(P\), If the predicate \(\theta\) is satisfied, Select the tuple. Unload \((B_i, B_j)\) from memory.