Do we need
to use multi-dimensional indexes ?
Case study on using B+-tree on multi-dimensional data
Index file used:
(1) B+-tree on the
x-coordinate of objects
Index file used:
(2) B+-tree on the
y-coordinate of objects
Process a range query using the
B+-tree indexes
How to use the B+-tree indexes to process the range query:
How to use the B+-tree indexes to process the range query:
Result:
we found all coordinate records
(x,y)
where xL ≤
x ≤
xH:
How to use the B+-tree indexes to process the range query:
Do
the same thing with the
y-index file:
find
all coordinate records
(x,y)
where yL ≤
y ≤
yH:
How to use the B+-tree indexes to process the range query:
Compute the
intersection of these
2 sets of records DB addresses (pointers):
The Database pointers in the
intersection satisfy:
xL ≤
x ≤
xH
and
yL ≤
y ≤
yH
Retrieve the records
using this set of
record pointers
to answer to the
range query
Question:
what is the
performance cost
(= # disk IO needed) ?
IO cost of range query using B+-tree indexes
Data file statistics:
IO cost of range query using B+-tree indexes
Data file statistics:
(
No disk I/O operation needed to
read
an internal node
in the
B+-tree index)
IO cost of range query using B+-tree indexes
Expected (average) # points selected
in the output:
IO cost of range query using B+-tree indexes
Processing step 1:
find the
xL-coordinate:
#Disk IOs used =
0
(because we assumed
internal nodes stored
in memory)
IO cost of range query using B+-tree indexes
Processing step 2:
traverse the
x-index file between
xL=450 and
xH=550:
#record ptrs =
((550-540) ⁄ 1000) × 1,000,000
= 100,000
(because we assumed
uniform distribution over
1 - 1000)
#Disk IOs used =
100,000 /
200 =
500 blocks
(because we assumed
200 index recs per
index blk)
IO cost of range query using B+-tree indexes
Processing step 3:
do the same for
the y-coordinate
#record ptrs =
((550-540) ⁄ 1000) × 1,000,000
= 100,000
(because we assumed
uniform distribution over
1 - 1000)
#Disk IOs used =
100,000 /
200 =
500 blocks
(because we assumed
200 index recs per
index blk)
IO cost of range query using B+-tree indexes
Processing step 4:
compute the
intersection of these
2 sets of pointers
#Disk IOs used =
0 blocks
(because all pointers were read
in memory by
previous 2 steps)
#record ptrs in intersection ~=
10,000
(the
expected # points in the
output - go 4 slides back if necessary)
IO cost of range query using B+-tree indexes
Processing step 5:
retrieve the
(10,000) records using the
record ptrs in
intersection
IO cost of range query using B+-tree indexes
Grand total:
Alternate solution: scan the
entire data file
(without using index)
Brute force search in
the data file
(without using the indexes):
Alternate solution: scan the
entire data file
(without using index)
The cost of the
brute force search in
the data file
(without using the indexes):
Conclussion of this exercise...
❮
❯