The join (⋈) operator
(Cannot build
a search structure because
there is not enough
memory)
Pre-requisite for using the
IO-efficient TPMMS-based join algorithm
The IO-efficient
2-pass TPMMS based join
(⋈) algorithm
- Pass 1
(Graphically explained in next slide)
The IO-efficient
2-pass TPMMS based join
(⋈) algorithm
- Pass 1
Use M buffers and
sort the
relation R and S into
chunks of
(sorted) M blocks:
IO cost for
pass 1 =
2 B(R) + 2 B(S)
The IO-efficient
2-pass TPMMS based join
(⋈) algorithm
- Pass 2
Pass 2:
(Graphically explained in next slide)
The IO-efficient
2-pass TPMMS based join
(⋈) algorithm
- Pass 2
Pass 2:
You must be able to
store
all joining tuples from
R and S in (a few)
program variables
The IO-efficient
2-pass TPMMS based join
(⋈) algorithm
- Example
- Pass 1
Pass 1:
read chunks of
M blocks from
relations R and S and
sort:
Write the
sorted chunks
(each chunk is M blocks) to
disk
The IO-efficient
2-pass TPMMS based join
(⋈) algorithm
- Example
- Pass 2
Pass 2:
read 1 block from
every (sorted) chunk and
find the
smallest tuple in
each relation:
If R(smallest tuple value) =
S(smallest tuple value) then
join tuples and
advance R and S
If R(smallest tuple value) <
S(smallest tuple value) then
discard R's tuple
and
advance R
If R(smallest tuple value) >
S(smallest tuple value) then
output S's tuple and
advance S
Cost analysis of
the IO-efficient
2-pass TPMMS based join
(⋈) algorithm
Buffer requirement of
the IO-efficient
2-pass TPMMS based join
(⋈) algorithm
The relation R
and S together
must have
at most
M chunks, because
Pass 2
can use
at most
M buffers:
(We need to use 1 buffer to read 1 sorted chunk)
Buffer requirement of
the IO-efficient
2-pass TPMMS based join
(⋈) algorithm
❮
❯