The grouping operator (γ)
The 2-pass TPMMS based grouping
(γ) algorithm
- Pass 1
The 2-pass TPMMS based grouping
(γ) algorithm
- Pass 2
Pass 2:
(I.e.: you can process all tuples in the "smallest" group first -
graphically explained in next slide)
The 2-pass TPMMS based grouping
(γ) algorithm
- Pass 2
Pass 2:
You only need
a program variable to
store
1 tuple for the
output
(i.e.: not an entire buffer)
The 2-pass TPMMS based grouping
(γ) algorithm
- Example
- Pass 1
Input file:
The 2-pass TPMMS based grouping
(γ) algorithm
- Example
- Pass 1
Pass 1:
read chunks of
M blocks and
sort on
the grouping attributes
Write the
sorted chunks to
disk
The 2-pass TPMMS based grouping
(γ) algorithm
- Example
- Pass 2
Pass 2:
read 1 block from
every (sorted) chunk and
find the
next smallest tuple (= group)
If smallest tuple ≠ last seen tuple then
initialize the statistic for
this (new) group
The 2-pass TPMMS based grouping
(γ) algorithm
- Example
- Pass 2
Pass 2:
read 1 block from
every (sorted) chunk and
find the
next smallest value
If smallest tuple = last seen tuple then
update the statistic of
this group
Cost analysis of
the 2-pass TPMMS based grouping
(γ) algorithm
Buffer requirement of
the 2-pass TPMMS based grouping
(γ) algorithm
The relation R can be
sorted into
at most
M chunks, because in
Pass 2,
we can use
at most
M buffers:
(We need to use 1 buffer to read 1 sorted chunk)
Buffer requirement of
the 2-pass TPMMS based grouping
(γ) algorithm
I.e.:
if the size of
relation R is
B(R) blocks, we need
M = Sqrt(B(R)) buffers
to execute
2-pass TPMMS based γ algorithm
❮
❯