initialize a search structure H on all attributes of R; while ( R has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We need a search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∈ H ) { discard t // duplicate } else { insert t in H; // Help find duplicates move t to output // This is the first occurence } } } |
Buffer utilization when there are M buffers available:
|
Result:
|
duplicate tuples will always stored at the same processor !!!
|
Therefore:
|