The bag intersection (∩_B) operator

Note: we need a search structure that contains (key, count (# occurences)) !

The bag intersection (∩_B) operator - observation

The bag intersection (∩_B) operator - assumption

The one-pass bag intersection (∩_B) algorithm - phase 1

The one-pass bag intersection (∩_B) algorithm - phase 2

The one-pass bag intersection (∩_B) algorithm - phase 1 - example

Phase 1: read S and build search index with (key, #occurence)

The one-pass bag intersection (∩_B) algorithm - phase 2 - example

Phase 2: read R and decrement the search key count; output the key if count > 0

The one-pass bag intersection (∩_B) algorithm - phase 2 - example

Phase 2: a is not found → discard a

The one-pass bag intersection (∩_B) algorithm - phase 2 - example

Phase 2: count(b) = 2 (> 0) → output b and decrement count(b)

The one-pass bag intersection (∩_B) algorithm - phase 2 - sample result

Notice the values in the count in the search (hash) table

IO cost and buffer requirement for ∩_B

❮ ❯

initialize a search structure H on all attributes ; /* ============================================================ Phase 1: Use 1 buffer and scan the SMALLER relation first. Build a search structure on the SMALLER relation The search structure contains a count for each value in S ============================================================ */ while ( S has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We need a search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∉ H ) { insert t in H; initialize count(t) = 1 for H; } else { update count(t) = count(t) + 1 for H; } } } /* =================================================== Now we know how many of each element is in S =================================================== */ /* ============================================================ Phase 2: Use 1 buffer and scan the other relation. Use the search structure to output the common elements, but at most count times !! ============================================================ */ while ( R has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { /* ===================================================== We use search structure H to implement the test t ∈ H efficiently !!! We can use hash table or some bin. search tree ====================================================== */ if ( t ∈ H ) { if ( count(t) > 0 ) { output( t ); // Because t is in BOTH R and S Update count(t)-- in H; // Use up 1 copy of t ! } } else { // Ignore t, it's not in intersection ! } } }

One-pass Algorithm for ∩B

One-pass Algorithm for ∩_B