Hash-join¶

Hash-Join¶

Given two tables, if one table has a hash-index already, bring that index table to the main memory if possible. Then perform Indexed nested-loop join.

The optimiser may generate hash-tables automatically. This is significantly fast if the hash-table fits completely into the memory.

Pseudo-Code¶

Pull hashed index \(H(I)\) into the memory. For each requested record ID \(P\), put \(P\) into the hash function to get \(H(P)\). lookup \(H(P)\) in \(H(I)\). over all records \(\{R_i\}\) in \(H(I)\), if the indices also match, select the record.

Large Table¶

If the table is large, partition it into smaller tables, by using a hashing function \(h\), different from the hashing code for the indices.

Generate partitioned hash indices for tables \(R\) and \(S\).

For each partition \((B_i, B_j)\) in \(R, S\), Load \((B_i, B_j)\) into memory. For each searched record \(P\) in \(R\), Lookup tuples in \(R\) with the same hash as \(P\), If the predicate \(\theta\) is satisfied, Select the tuple. Unload \((B_i, B_j)\) from memory.