Prelude to sort-based
2-pass algorithm:
sorting large files
The sort-based
2-pass algorithms are
based on the
TPMMS algorithm
So we will
study the
TPMMS algorithm
first
Resource availability assumption
Question:
how to
sort an
input file that is
(much) larger than
M blocks
using M buffers ??
The Two-Pass Multi Way Sort Algorithm
- pass 1
Pass 1:
read M blocks
chunks of
data and
sort the data.
Write the
sorted data back to
disk !
The Two-Pass Multi Way Sort Algorithm
- pass 1
Pass 1:
read (next) M blocks
chunks of
data and
sort the data.
Write the
sorted data back to
disk !
The Two-Pass Multi Way Sort Algorithm
- pass 1
Pass 1:
read (the final) M blocks
chunks of
data and
sort the data.
Write the
sorted data back to
disk !
The Two-Pass Multi Way Sort Algorithm
- pass 2
Pass 2:
read 1 block for
each (sorted) chunk and
merge sort
the data
Move the
next smallest key value among
all (sorted) chunk
to the output buffer
Write the
output buffer to
disk when it become
full
The Two-Pass Multi Way Sort Algorithm
- Example
Pass 1:
read M blocks
chunks of
data and
sort the data.
Write the
sorted data back to
disk !
The Two-Pass Multi Way Sort Algorithm
- Example
Pass 1:
read (next) M blocks
chunks of
data and
sort the data.
Write the
sorted data back to
disk !
The Two-Pass Multi Way Sort Algorithm
- Example
Pass 1:
read (the final) M blocks
chunks of
data and
sort the data.
Write the
sorted data back to
disk !
The Two-Pass Multi Way Sort Algorithm
- Example
Pass 2:
read 1 block for
each (sorted) chunk and
merge sort
the data
Notice the
constraint imposed by
the M buffers:
you can only read
≤ M−1 (sorted) chunks
The Two-Pass Multi Way Sort Algorithm
- Example
Pass 2:
read 1 block for
each (sorted) chunk and
merge sort
the data
Find the
smallest sort key value in
the buffers:
a
The Two-Pass Multi Way Sort Algorithm
- Example
Pass 2:
read 1 block for
each (sorted) chunk and
merge sort
the data
Move
a to
the output buffer -
and find the
new smallest sort key value:
b
The Two-Pass Multi Way Sort Algorithm
- Example
Pass 2:
read 1 block for
each (sorted) chunk and
merge sort
the data
Move
b to
the output buffer -
and find the
new smallest sort key value...
and so on.
The Two-Pass Multi Way Sort Algorithm
- Example
Pass 2:
read 1 block for
each (sorted) chunk and
merge sort
the data
When output buffer is
full,
write it
to disk
(and re-use the
output buffer) !!
❮
❯