Estimating the size of the result of
R⋈S
when joining on
1 attribute
- problem description
Estimating the size of the result of
R⋈S
when joining on
1 attribute
- problem description
Estimating the size of the result of
R⋈S
when joining on
1 attribute
- problem description
We randomly select
a tuple
r ∈ relation R and
a tuple
s ∈ relation S.
I.e.: what's the
probability that
r and
s will
produce a tuple ?
Probability that
r ∈ R and s ∈ S will join (and produce a tuple)
The computation is
handled differently in
2 different cases:
- V(S, Y) ≤ V(R, Y)
(i.e.: attribute Y in
R takes on
more different values)
- V(R, Y) ≤ V(S, Y)
(i.e.: attribute Y in
S takes on
more different values)
|
Probability that
r ∈ R and s ∈ S will join (and produce a tuple) -
V(S, Y) ≤ V(R, Y)
Probability that
r ∈ R and s ∈ S will join (and produce a tuple) -
V(S, Y) ≤ V(R, Y)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
V(S, Y) ≤ V(R, Y)
An arbitrary tuple
s ∈ S
and
an arbitrary tuple
r ∈ R
will produce a
join tuple
with probability 1/V(R,Y):
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
V(S, Y) ≤ V(R, Y)
Then:
an arbitrary tuple
s ∈ S joining
with
all tuple
r ∈ R
will produce:
1/V(R,Y)
× T(R) tuples
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
V(S, Y) ≤ V(R, Y)
Finally:
joining all tuples
s ∈ S
with
all tuple
r ∈ R
will produce:
T(S) ×
1/V(R,Y)
× T(R) tuples
Estimation formula:
T( R(X,Y) ⋈
S(Y,Z) )
= T(R) × T(S) / V(R,Y)
Probability that
r ∈ R and s ∈ S will join (and produce a tuple) -
V(R, Y) ≤ V(S, Y)
Probability that
r ∈ R and s ∈ S will join (and produce a tuple) -
V(R, Y) ≤ V(S, Y)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
V(R, Y) ≤ V(S, Y)
An arbitrary tuple
r ∈ R
and
an arbitrary tuple
s ∈ S
will produce a
join tuple
with probability 1/V(S,Y):
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
V(R, Y) ≤ V(S, Y)
Then:
an arbitrary tuple
r ∈ R joining
with
all tuple
s ∈ S
will produce:
1/V(S,Y)
× T(S) tuples
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
V(R, Y) ≤ V(S, Y)
Finally:
joining all tuples
r ∈ R
with
all tuple
s ∈ S
will produce:
T(R) ×
1/V(S,Y)
× T(S) tuples
Estimation formula:
T( R(X,Y) ⋈
S(Y,Z) )
= T(R) × T(S) / V(S,Y)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
combined result
Summary of the
finding:
- V(S, Y) ≤ V(R, Y)
(i.e.: attribute Y in
R takes on
more different values)
T(R) × T(S)
T( R ⋈ S ) = -----------
V(R,Y)
|
- V(R, Y) ≤ V(S, Y)
(i.e.: attribute Y in
S takes on
more different values)
T(R) × T(S)
T( R ⋈ S ) = -----------
V(S,Y)
|
|
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 1
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 1
Method 1:
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 1
Method 1:
Estimate for
the # tuples in
R(a,b) ⋈ S(b,c):
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 1
Method 1:
Estimate for
the # tuples in
R(a,b) ⋈ S(b,c) =
40,000
Estimate for
the # tuples in
(
R(a,b) ⋈ S(b,c) )
⋈ U(c,d):
(same formula)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 1
Method 1:
Estimate for
the # tuples in
R(a,b) ⋈ S(b,c) =
40,000
Estimate for
the # tuples in
(
R(a,b) ⋈ S(b,c) )
⋈ U(c,d):
(same formula)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 1
Method 1:
Estimate for
the # tuples in
R(a,b) ⋈ S(b,c) =
40,000
Estimate for
the # tuples in
(
R(a,b) ⋈ S(b,c) )
⋈ U(c,d):
(same formula)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 2 (same join, different order)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 2
Method 2:
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 2
Method 2:
Estimate for
the # tuples in
S(b,c) ⋈ U(c,d):
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 2
Method 2:
Estimate for
the # tuples in
S(b,c) ⋈ U(c,d) =
20,000
Estimate for
the # tuples in
R(a,b) ⋈
(
S(b,c) ⋈ U(c,d) )
:
(same formula)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 2
Method 2:
Estimate for
the # tuples in
S(b,c) ⋈ U(c,d) =
20,000
Estimate for
the # tuples in
R(a,b) ⋈
(
S(b,c) ⋈ U(c,d) )
:
(same formula)
Estimating the join result set
R(X,Y)
⋈
S(Y,Z) -
Example 2
Method 2:
Estimate for
the # tuples in
S(b,c) ⋈ U(c,d) =
20,000
Estimate for
the # tuples in
R(a,b) ⋈
(
S(b,c) ⋈ U(c,d) )
:
(same formula)
Postscript...
❮
❯