Slideshow:
M << B(S) + 1 ( << means: much smaller) |
Therefore:
|
Possible solution:
|
(is multiple runs of the one-pass cartesian product algorithm using (M−1) blocks of the S relation)
Let M = # available buffers; /* ------------------------------------------- Outer loop: read M-1 block of S ------------------------------------------- */ while ( S has more data blocks ) { read the next M-1 data blocks of S; // B(S) > M-1 !!! Rewind R; // Set read pointer back to the start of R /* ------------------------------------------- Innerloop: read through R and compute × ------------------------------------------- */ while ( R has more data blocks ) { read 1 data block of R; for ( each tuple s ∈ M-1 blocks of S and each tuple t ∈ 1 blocks of R ) do { output (s, t); } } } |
Graphically:
|
|
Algorithm read S once: # disk I/Os = B(S) |
|
|
the cost of the nested-loop cartesian product algorithm is not symmetric:
Cost(R × S) ≠ Cost(S × R)
|
(Cost = number of disk block read)
B(R) = 10,000 B(S) = 5,000 M = 101 |
Configuration: (1) Use 100 buffers to hold tuples of S (2) Use 1 buffers to scan R |
Configuration: (1) Use 100 buffers to hold tuples of R (2) Use 1 buffers to scan S |