Slideshow:
R ∩B S |
R = {1, 2, 2, 2, 3}; S = {2, 2, 3, 3, 4}; R ∩B S = { 2, 2, 3 } |
Important observation:
|
Assumption:
|
Algorithm:
initialize a search structure H on all attributes ; /* ============================================================ Phase 1: Use 1 buffer and scan the SMALLER relation first. Build a search structure on the SMALLER relation The search structure contains a count for each value in S ============================================================ */ while ( S has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We need a search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∉ H ) { insert t in H; initialize count(t) = 1 for H; } else { update count(t) = count(t) + 1 for H; } } } /* =================================================== Now we know how many of each element is in S =================================================== */ /* ============================================================ Phase 2: Use 1 buffer and scan the other relation. Use the search structure to output the common elements, but at most count times !! ============================================================ */ while ( R has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We use search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∈ H ) { if ( count(t) > 0 ) { output( t ); // Because t is in BOTH R and S Update count(t)-- in H; // Use up 1 copy of t ! } } else { // Ignore t, it's not in intersection ! } } } |
|
|
|
|