Estimating the size of the result of
R⋈S
when joining on
multiple attributes
- problem description
Note:
the independence assumption allows
us to use the
multiplication rule in
Probabilities
Probability that
r ∈ R and s ∈ S will join (and produce a tuple)
When will
r ⋈ s produce a
tuple in result:
Individual probabilities of
each type of event
(from last webpage):
Probability that
r ∈ R and s ∈ S will join (and produce a tuple)
When will
r ⋈ s produce a
tuple in result:
Individual probabilities of
each type of event
(from last webpage):
Assuming that the
attribute values of Y1 and
Y2 are
independent, we have:
(multiply rule)
Estimating the join result set
R(X,Y1,Y2)
⋈
S(Y1,Y2,Z)
An arbitrary tuple
s ∈ S
and
an arbitrary tuple
r ∈ R
will produce a
join tuple
with probability 1/(max(V(R,Y1)...):
Estimating the join result set
R(X,Y1,Y2)
⋈
S(Y1,Y2,Z)
Then:
an arbitrary tuple
s ∈ S joining
with
all tuple
r ∈ R
will produce:
1/max(V(R,Y1)..
× T(R) tuples
Estimating the join result set
R(X,Y1,Y2)
⋈
S(Y1,Y2,Z)
Finally:
joining all tuples
s ∈ S
with
all tuple
r ∈ R
will produce:
T(S) ×
1/max(V(R,Y1)...
× T(R) tuples
Estimation formula:
T( R(X,Y1,Y2)
⋈
S(Y1,Y2,Z) )
= T(R) × T(S) /
( max( V(R,Y1),
V(S,Y1) )
×
max( V(R,Y2),
V(S,Y2) )
Estimating the join result set
R(X,Y1,Y2)
⋈
S(Y1,Y2,Z) -
Example 3
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 3
Method 3:
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 3
Method 3:
Note:
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 3
Method 3:
Estimate for
the # tuples in
R(a,b) ⋈ U(c,d):
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 3
Method 3:
Estimate for
the # tuples in
R(a,b) ⋈ U(c,d)
= 5,000,000
Estimate for
the # tuples in
R(a,b) ⋈ U(c,d)
⋈ S(b,c):
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 3
Method 3:
Estimate for
the # tuples in
R(a,b) ⋈ U(c,d)
= 5,000,000
Estimate for
the # tuples in
R(a,b) ⋈ U(c,d)
⋈ S(b,c):
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 3
Method 3:
Estimate for
the # tuples in
R(a,b) ⋈ U(c,d)
= 5,000,000
Estimate for
the # tuples in
R(a,b) ⋈ U(c,d)
⋈ S(b,c):
Comment and observation
We got the same answer as
before:
❮
❯