Example on selection implementation algorithm for a query plan - B(R⋈S) large

Problem description:

Find the best algorithm to execute ⋈₁ and ⋈₂ for M = 101 - click to pull out

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Simplified Problem :

Solution method: brute force search for the min. cost algorithms that can operate with M = 101

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 1: check if we can use a 1-pass algorithm for ⋈₁:

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 2: check if we can use a 2-pass (hashing based) algorithm for ⋈₁:

Next, we find the best algorithm for ⋈₂ (considering the buffer utilization of ⋈₁ !)

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Determining the buffer utilization by the ⋈₁ execution:

(⋈₂ is inactive and will not use any buffers)

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Determining the buffer utilization by the ⋈₁ execution:

Next: we run pass 2 of the 2-pass (hashing-based) algorithm (= a 1-pass algorithm on each R_i and S_i)

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Determining the buffer utilization by the ⋈₁ execution:

Note: pass 2 is run for every chunk R_i and S_i and pass 2 will output tuples to ⋈₂ (→ becomes active !)

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Cost (so far):

Click on image to pull out (keep running cost)

Next: determine the best suitable join algorithm for ⋈₂

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Prelude to considering the implementation algorithm for ⋈₂:

⋈₁ is actively using its buffers to produce tuples for ⋈₂:

I.e.: we must find the best algorithm for ⋈₂ using M = 101 − 51 = 50 buffers !!

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 3: check if we can use a 1-pass algorithm for ⋈₂:

Therefore: we cannot use the 1-pass join algorithm

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 4: check if we can use a 2-pass (hashing based) algorithm for ⋈₂:

Note: B(R⋈₁S) > 5000. Therfore: B(R⋈₁S)/50 > 100 (each chunk of the first relation of ⋈₂ > 100 blks)

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 4: check if we can use a 2-pass (hashing based) algorithm for ⋈₂:

Therefore: we cannot use 2-pass algorithm to execute ⋈₂

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 5: check if we can use a 3-pass (hashing based) algorithm for ⋈₂:

Note: each "sub-sub-relation chunk is about B(R⋈S)/(50²) blks

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 5: check if we can use a 3-pass (hashing based) algorithm for ⋈₂:

Next, we hash the 2nd input relation T in the same manner...

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 5: check if we can use a 3-pass (hashing based) algorithm for ⋈₂:

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 5: check if we can use a 3-pass (hashing based) algorithm for ⋈₂:

Notice that: size of each sub-sub-relation of R⋈₁S is > 5000/2500 = 2 blks
size of each sub-sub-relation of T is = 10000/2500 = 4 blks

So pass 3 may use T_ij (chunks of T) as build relation if R⋈₁S > 10000 !

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Step 5: check if we can use a 3-pass (hashing based) algorithm for ⋈₂:

Therefore: we can always use the 3-pass algorithm for bowtie;₂ when B(R⋈₁S) > 5000

Example on selection implementation algorithm for a query plan - B(R⋈S) large

Cost analysis:

❮ ❯

Pass 1 of the 2-pass hash join alg: (1) We use 101 buffer to hash R into 100 sub-relations R_i

(2) Use 101 buffer to hash S into 100 sub-relations S_i

Assuming uniform distribution: B(R_i) = 5000/100 = 50 blocks (smaller) B(S_i) = 10000/100 = 100 blocks

Pass 2 of the 2-pass hash join alg: Perform a 1-pass join alg on each R_i and S_i:

Because B(R_i) = 50 ---> We will only use 50 buffers to index R_i + 1 buffer to scan S_i

Hash R: B(R) [read] + B(R) [write] Hash S: B(S) [read] + B(S) [write] 1 pass join on each R_i ⋈ S_i: B(R) [read] + B(S) [read] ----------------------------------------------------------------- Total # disk IOs: 3 B(R) + 3 B(S) = 3 × 5000 + 3 × 10000 = 45000 blocks

Hash R ⋈ S: (first pass) 0 [pipe read] + B(R⋈S) [write] Hash R ⋈ S: (2nd pass) B(R⋈S) [read] + B(R⋈S) [write] Hash T: (first pass) 10000 [read] + 10000 [write] Hash T: (2nd pass) 10000 [read] + 10000 [write] Pass 3 hash join (R ⋈ S) ⋈ T: B(R⋈S) [read (R ⋈ S)] + 10000 [read T] --------------------------------------------------------------------- Total: 4 × B(R⋈S) + 50000

Example on how to choose an algorithm for join (⋈) - part 3