Slideshow:
Organization: use 1 buffer to read R and (M−1) buffer to write Ri
Assumed size of each sub-relation:
Minimum buffer requirement: (derive using the fact that B(Ri) ≤ M−1)
Notice that:
Read R: B(R) Write M-1 buckets (total): B(R)
Read M-1 buckets (total): B(R) Write non-duplicate tuples: ≤ B(R) (not counted - use buffer to pass !)
Cost of δ(R) = 3 B(R)
Therefore, in the worst case (when all tuples are unique):
B(Ri) ≤ M−1 blocks
1 B(Ri) = --- B(R) M-1
(Otherwise, some sub-relation will larger than B(R)/(M−1) !!!)
Therefore:
B(Ri) ≤ M−1 blocks and 1 B(Ri) = --- B(R) M-1 We have: 1 --- B(R) ≤ M−1 blocks M-1 <==> (M-1)2 ≥ B(R) ------ <==> M - 1 ≥ \/ B(R) ------ <==> M ≥ \/ B(R) + 1
------ M ≥ \/ B(R)
because it is not neccessary to have exactly this amount of memory.
The result of a few buffers less is minimal thrashing (virtual memory will swap some buffers in/out to disk)