The set difference operator (−set)
Good news: the
algorithm for
−set is the
same as the algorithm for
−bag
!!!
The 2-pass TPMMS based set difference
(−set) algorithm
- Pass 1
(Graphically explained in next slide)
The 2-pass TPMMS based set difference
(−set) algorithm
- Pass 1
Use M buffers and
sort the
relation R and S into
chunks of
(sorted) M blocks:
IO cost for
pass 1 =
2 B(R) + 2 B(S)
The 2-pass TPMMS based set difference
(−set) algorithm
- Pass 2
Pass 2:
(Graphically explained in next slide)
The 2-pass TPMMS based set difference
(−set) algorithm
- Pass 2
Pass 2:
You only need
a program variable to
store
1 tuple for each
relation
(i.e.: do not require an entire buffer)
The 2-pass TPMMS based set difference
(−set) algorithm
- Example
- Pass 1
Pass 1:
read chunks of
M blocks from
relations R and S and
sort:
Write the
sorted chunks
(each chunk is M blocks) to
disk
The 2-pass TPMMS based set difference
(−set) algorithm
- Example
- Pass 2
(R and
S are
sets !)
Pass 2:
read 1 block from
every (sorted) chunk and
find the
smallest tuple in
each relation:
If R(smallest tuple value) <
S(smallest tuple value) then
output R's tuple
and
advance R
(You can only
get 1 value in
the output because
R is a
set)
If R(smallest tuple value) =
S(smallest tuple value) then
discard R's tuple (subtract !) and
advance R and S
If R(smallest tuple value) >
S(smallest tuple value) then
(discard S's tuple) and
advance S
Cost analysis of
the 2-pass TPMMS based set difference
(−set) algorithm
Buffer requirement of
the 2-pass TPMMS based set difference
(−set) algorithm
The relation R
and S together
must have
at most
M chunks, because
Pass 2
can use
at most
M buffers:
(We need to use 1 buffer to read 1 sorted chunk)
Buffer requirement of
the 2-pass TPMMS based set difference
(−set) algorithm
❮
❯