The duplicate elimination operator (δ)
The 2-pass TPMMS based duplicate elimination
(δ) algorithm
- Pass 1
The 2-pass TPMMS based duplicate elimination
(δ) algorithm
- Pass 2
Pass 2:
(Graphically explained in next slide)
The 2-pass TPMMS based duplicate elimination
(δ) algorithm
- Pass 2
Pass 2:
You only need
a program variable to
store
1 tuple for the
output
(i.e.: not an entire buffer)
The 2-pass TPMMS based duplicate elimination
(δ) algorithm
- Example
- Pass 1
Input file:
The 2-pass TPMMS based duplicate elimination
(δ) algorithm
- Example
- Pass 1
Pass 1:
read chunks of
M blocks and
sort on
entire tuple
Write the
sorted chunks to
disk
The 2-pass TPMMS based duplicate elimination
(δ) algorithm
- Example
- Pass 2
Pass 2:
read 1 block from
every (sorted) chunk and
find the
next smallest tuple
If smallest tuple == last seen tuple then
discard it
(and repeat)
The 2-pass TPMMS based duplicate elimination
(δ) algorithm
- Example
- Pass 2
Pass 2:
read 1 block from
every (sorted) chunk and
find the
next smallest value
If smallest tuple > last seen tuple then
output it
and update the
last seen tuple
Cost analysis of
the 2-pass TPMMS based duplicate elimination
(δ) algorithm
Buffer requirement of
the 2-pass TPMMS based duplicate elimination
(δ) algorithm
Buffer requirement of
the 2-pass TPMMS based duplicate elimination
(δ) algorithm
I.e.:
if the size of
relation R is
B(R) blocks, we need
M = Sqrt(B(R)) buffers
to execute
2-pass TPMMS based δ algorithm
❮
❯