Slideshow:
R ∪S S |
R = {1, 2, 3};
S = {2, 3, 4};
R ∪S S = { 1, 2, 3, 4}
Note: output common values once only !!!
|
Important observation:
|
Assumption:
|
Algorithm:
initialize a search structure H on all attributes of S; /* =========================================================== Phase 1: Use 1 buffer and scan the SMALLER relation first. Build a search structure on the SMALLER relation to help speed up removal of duplicates. Because R ∪S S contains S: we output every tuples in S =========================================================== */ while ( S has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { insert t in H; // Build search structure // (hash table or search tree) output t; // S is part of the union. } } /* ======================================================== Phase 2: Output only those tuples in R that are NOT in S We use the search structure H to implement the test t ∈ H efficiently !!! For H, we can use hash table or some bin. search tree ========================================================= */ while ( R has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { if ( t ∈ H ) { /* ----------------------------------- This tuples was in S, duplicate ! ----------------------------------- */ discard t; // I.e.: do not output t } else { output t; // We do not need to insert t in H // because R is a set !!! } } } |
Buffer utilization when there are M buffers available:
|
|
|