initialize a search structure H on all attributes of S; /* =========================================================== Phase 1: Use 1 buffer and scan the SMALLER relation first. Build a search structure on the SMALLER relation to help speed up finding common elements. =========================================================== */ while ( S has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { insert t in H; // Build search structure // (hash table or search tree) } } /* ======================================================== Phase 2: Output only those tuples in R that are also in S We use the search structure H to implement the test t ∈ H efficiently !!! For H, we can use hash table or some bin. search tree ========================================================= */ while ( R has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { if ( t ∈ H ) { output t; // t in R and S } } } |
Buffer utilization using M buffers:
|
|
|
|
Result:
|
|
Therefore:
|
|
Result:
|
Total amount of work performed:
Re-distribute S:
Read (re-distribute) relation S: B(S)
1-P
Transfer tuples in S: --- B(S)
P
Write tuples of relation S: B(S)
Performing the one-pass intersection:
Read R and S: B(R) + B(S)
Write tuples in R ∩ S: 0 (not counted)
|
Note:
|
1 --- ( B(R) + 3 B(S) ) read/write disk block operations P |