Intro to estimating the result set size of a join

Slideshow:

Estimating the size of the result set R ⋈ S

Problem description:

Estimating the size of the result set R ⋈ S

Some possible outcomes:

Estimating the size of the result set R ⋈ S

Some possible outcomes:

Simplifying assumption 1: value set containment assumption

The containment of value sets assumption:

Simplifying assumption 2: value set perservation assumption

The preservation of value sets assumption:

❮ ❯

Problem description
- Given:
  Question:

The range of the size of R(X,Y) ⋈ S(Y,Z)

Possible outcomes of R(X,Y) ⋈ S(Y,Z) :

If the Y attribute values in R(X,Y) and S(Y,Z) are disjoint:
If Y attribute is a key in S:
If every tuple in R and S has the same Y attribute value:

Range of T( R ⋈ S ):

0 ≤ T( R ⋈ S ) ≤ T(R)×T(S)

Simplifying Assumptions...

Fact:

Assumptions that helps use find an estimate of R(X,Y) ⋈ S(Y,Z) :

The containment of value sets assumption:

An attribute Y in a relation R(...,Y) always takes on a prefix of a fixed list of values:

y₁ y₂ y₃ y₄ ....

Example:

Relations: R( .... , Y ) S( .... , Y ) U( .... , Y ) Then: Attr values of Y in R can be one of: y₁ y₂ ..... y_R Attr values of Y in S can be one of: y₁ y₂ ........ y_S Attr values of Y in U can be one of: y₁ y₂ ... y_U

(The containment of value sets assumption will helps use estimate the size of T(R⋈S) )

The preservation of value sets assumption:

The join operation R(X,Y) ⋈ S(Y,Z) will preserve all the possible values of the non-joining attributes

In other words:

The attribute values taken on by X in R(X, Y) ⋈ S(Y,Z) and R(X, Y) are same
The attribute values taken on by Z in R(X, Y) ⋈ S(Y,Z) and S(Y, Z ) are same

(The preservation of value sets assumption will helps use estimate the size of T(R⋈S⋈U) )