One-pass Algorithm for ⋈ (very similar to ∪_S)

Slideshow:

The join (⋈) operator

Observation: we must build a search structure with attributes in the join coindition

The join (⋈) operator - assumption

The one-pass join (⋈) algorithm - phase 1

Phase 1: read S and build search structure on join attributes

The one-pass join (⋈) algorithm - phase 2

Phase 2: scan R and find joining tuples using the search structure

The one-pass join (⋈) algorithm - phase 1

Phase 1: read S and build search structure on join attributes

The one-pass join (⋈) algorithm - phase 2

Phase 2: scan R and find joining tuples using the search structure

IO cost and buffer requirement for ⋈

❮ ❯

Join ⋈

Join operator:

Example Join:

R = R( name, dno ) S = S( dnumber, dname ) R = { (john, 1), (jane, 4) }; S = { (1, Research), (4, Payroll) }; R ⋈_dno=dnumber S = { (john, 1, 1, Research), (jane, 4, 4, Payroll) }

One-pass algorithm:

Assumption:

The relation S is the smaller relation

Algorithm: (the classic hash-join algorithm)

initialize a search structure H on JOIN attributes of S; /* =========================================================== Phase 1: Use 1 buffer and scan the SMALLER relation first. Build a search structure on the SMALLER relation to help speed up finding common elements. =========================================================== */ while ( S has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { insert t in H; // Build search structure // (hash table or search tree) } } /* ======================================================== Phase 2: Output only those tuples in R that have join attrs equal to some tuple in S We use the search structure H to implement the test t(join attrs) ∈ H efficiently !!! For H, we can use hash table or some bin. search tree ========================================================= */ while ( R has more data blocks ) { read 1 data block in buffer b; for ( each tuple t ∈ b ) { if ( t(join attrs) ∈ H (search structure) ) { for ( each s ∈ Bucket[ t(join attrs) ] ) output (t, s); // successful join } } }

Buffer utilization when there are M buffers available:

Phase 1: partition the M buffers as follows:
Use 1 buffer for input from S
Use M−1 buffers for the search structure
Phase 2: partition the M buffers as follows:
Use 1 buffer for input from R
We are still using M−1 buffers for the search structure in phase 2

Cost Analysis for ⋈
- # disk I/O used:
- Memory requirement:

One-pass Algorithm for ⋈ (very similar to ∪S)

One-pass Algorithm for ⋈ (very similar to ∪_S)