Slideshow:
R ⋈cond S |
R = R( name, dno ) S = S( dnumber, dname ) R = { (john, 1), (jane, 4) }; S = { (1, Research), (4, Payroll) }; R ⋈dno=dnumber S = { (john, 1, 1, Research), (jane, 4, 4, Payroll) } |
|
Graphically: (this is the TPMMS algorithm)
|
The sort-join Algorithm:
Read R until all tuples with 1st join value
are stored in memory buffers;
Read the first block of S;
While ( R ≠ empty OR S ≠ empty )
{
Let r = the current smallest join value ∈ R
Let S = the current smallest join value ∈ S
if ( r < s )
{
Situation:
skip all tuples with join attr y1 in R;
}
else if ( s < r )
{
skip all tuples with join attr y1 in S;
}
else /* r = s = y1 */
{ /* ===================================================
Join on join value r = s = y1
=================================================== */
read S as long as join attr = s (= y1);
Join tuples in S with join attr = y1;
When done:
reuse buffers;
Read R until all tuples with next smallest join value
are stored in memory buffers;
}
}
|
|
|
|
|
(1) In order to sort the relation R using TPMMS: -------- M ≥ \/ B(R) buffers // From: B(R) ≤ M(M − 1) |
(1) Nested-loop will read S once: # disk I/Os = B(S) (2) Number of fragments Si read by Nested-loop: B(S)/(M-1) Nested-loop will read R: B(S)/(M-1) times # disk I/Os = (B(S)/(M-1)) × B(R) |
So:
# Disk IO (nested-loop join) = O ( B(R) × B(S) )
|
# Disk IO (TPMMS-based join) = 5 B(R) + 5 B(S)
= O( B(R) + B(S) )
|
|