The join (⋈) operator

(Cannot build a search structure because there is not enough memory)

The IO-intensive TPMMS base join (⋈) algorithm - Pass 1

(Graphically explained in next slide)

The IO-intensive TPMMS base join (⋈) algorithm - Pass 1

Run the complete TPMMS sort algorithm on R and S (separately):

Result: R and S are sorted and stored on disk

The IO-intensive TPMMS base join (⋈) algorithm - Pass 2

Use this procedure to join the tuples of R (with value X) by scanning S:

(Graphically explained in next slide)

The IO-intensive TPMMS base join (⋈) algorithm - Pass 2

Store all tuples with the same join value in R and use use 1 buffer to scan S:

IO cost of the IO-intensive TPMMS base join (⋈) algorithm

Remark: this is a 3-pass algorithm !

Memory requirement of the IO-intensive TPMMS base join (⋈) algorithm

Memory constraint 1:

Memory requirement of the IO-intensive TPMMS base join (⋈) algorithm

Memory constraint 2:

Memory requirement of the IO-intensive TPMMS base join (⋈) algorithm

Memory constraint summary:

❮ ❯

R = R( name, dno ) S = S( dnumber, dname ) R = { (john, 1), (jane, 4) }; S = { (1, Research), (4, Payroll) }; R ⋈_dno=dnumber S = { (john, 1, 1, Research), (jane, 4, 4, Payroll) }

Read R until all tuples with 1st join value are stored in memory buffers; Read the first block of S; While ( R ≠ empty OR S ≠ empty ) { Let r = the current smallest join value ∈ R Let S = the current smallest join value ∈ S if ( r < s ) { Situation:

skip all tuples with join attr y1 in R;

} else if ( s < r ) { skip all tuples with join attr y1 in S;

} else /* r = s = y1 */ { /* =================================================== Join on join value r = s = y1 =================================================== */ read S as long as join attr = s (= y1);

Join tuples in S with join attr = y1; When done: reuse buffers; Read R until all tuples with next smallest join value are stored in memory buffers; } }

Read R: B(R) Write sorted chunks: B(R) Read sorted chunks: B(R) Write (merge) sorted rel R: B(R)

Read S: B(S) Write sorted chunks: B(S) Read sorted chunks: B(S) Write (merge) sorted rel S: B(S)

(1) B(R) ≤ M(M − 1) // Pass 2 of TPMMS needs 1 output buffer, hence: M−1)

M ≥ sqrt( B(R) ) ........... (1)

(2) B(S) ≤ M(M − 1) // Pass 2 of TPMMS needs 1 output buffer, hence: M−1)

M ≥ sqrt( B(S) ) ........... (2)

(1) In order to sort the relation R using TPMMS: -------- M ≥ \/ B(R) buffers // From: B(R) ≤ M(M − 1)

(2) In order to sort the relation S using TPMMS: -------- M ≥ \/ B(S) buffers // From: B(S) ≤ M(M − 1)

(3) Join tuples with equal join attr values: Size of tuples in R with common join attr. values ≤ (M-1) blocks

(1) Nested-loop will read S once: # disk I/Os = B(S) (2) Number of fragments S_i read by Nested-loop: B(S)/(M-1) Nested-loop will read R: B(S)/(M-1) times # disk I/Os = (B(S)/(M-1)) × B(R)

B(S) Total cost = B(S) + ------- B(R) M-1 B(S)×B(R) t = B(S) + -------- M-1

Version 2: an IO=intensive join (⋈) algorithm based on the complete TPMMS