Semi-join reduction
Dangling
tuple
Dangling tuple:
Dangling tuple
= a
tuple
in a
relation
that does
not
join
with
any
tuple
in the
other
relation
Example:
Fact:
Dangling tuples
can be
omitted
when we
perform
a
join
operation
A more
efficient
way to perform
⋈
in distributed database
More efficient
join execution
:
Transfer
only
the
non-dangling
tuples
to the
"join site"
Graphically:
The Semi-join ⋉ operation
The
semi
-join (⋉)
operation
:
Definition
of the
Semi join (⋉)
operation
:
R ⋉ S
= π
R
(
R ⋈ ( π
Y
(S) )
) where:
Y
= join attribute(s) in
S
Example:
Conclussion:
R ⋉ S
= the
set
of
non-dangling
tuples
in
R
Eliminating
dangling tuples
using semi-join
How to
eliminate dangling tuples
using
semi-join
:
Compute
π
Y
(S)
and
transfer
the
result
to
R's site
:
Compute
R ⋉ π
Y
(S)
and
transfer
the
result
to
S's site
:
Perform
subset(R) ⋈ S
at
S's site
:
Problem
with this solution
Problem:
We
incur
a
communication cost
to
transfer
:
π
C
(S)
Graphically:
Solution:
We can use
bloom filters
to
remove
most
of the
dangling tuples
with
little
additional communication overhead