Slideshow:
R −B S |
R = {1, 2, 2, 2, 3}; S = {2, 2, 3, 3, 4}; R −B S = { 1, 2 } |
Assumption:
|
1. S - R (S is the smaller of the 2 relations) 2. R - S (S is the smaller of the 2 relations) |
S = {2, 2, 3, 3, 4}; R = {1, 2, 2, 2, 3, 5}; {2, 2, 3, 3, 4} − {1, 2, 2, 2, 3, 5} = { 3, 4 } ^^^^^^^^^^^^^^ Index this set |
initialize a search structure H on all attributes ; /* ============================================================ Phase 1: Use 1 buffer and scan the SMALLER relation first. Build a search structure on the SMALLER relation The search structure contains a count for the search key ============================================================ */ while ( S has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We need a search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∉ H ) { insert t in H; initialize count(t) = 1 for H; } else { update count(t) = count(t) + 1 for H; } } } /* =================================================== Now we know how many of each element is in S =================================================== */ /* ============================================================ Phase 2: output tuples in S Use 1 buffer and scan the other relation. Use the search structure to remove common elements, but at most count times !! ============================================================ */ while ( R has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We use search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∈ H ) { if ( count(t) > 0 ) { Update count(t)-- in H; // We lost 1 copy of t } } else { // Ignore t, it's not in difference ! } } } /* =================================================== ONLY now we can output the difference =================================================== */ for ( every t ∈ H ) { output count(t) number of tuples t ; } |
|
|
R = {1, 2, 2, 2, 3, 5}; S = {2, 2, 3, 3, 4}; {1, 2, 2, 2, 3, 5} − {2, 2, 3, 3, 4} = { 1, 2, 5 } ^^^^^^^^^^^^^ Index this set |
initialize a search structure H on all attributes in tuples of S; /* ============================================================ Phase 1: (same) Use 1 buffer and scan the SMALLER relation first. Build a search structure on the SMALLER relation The search structure contains a count for the search key ============================================================ */ while ( S has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We need a search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∉ H ) { insert t in H; initialize count(t) = 1 for H; } else { update count(t) = count(t) + 1 for H; } } } /* =================================================== Now we know how many of each element is in S =================================================== */ /* ============================================================ Phase 2: output tuples in R Use 1 buffer and scan the other relation. Use the search structure to "throttle output" of common elements, but for at most count times !! ============================================================ */ while ( R has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We use search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∈ H ) { /* ============================================== Check if there is (still) a copy in S ============================================== */ if ( count(t) > 0 ) { Update count(t)-- in H; // We lost 1 copy of t } else { Output t; // Tuple did not get subtracted ! } else { Output t; // Tuple did not get subtracted ! } } } |
|
|
|
|