|
|
(where SqError(a,b) = the minimum squared error sum of input values for xa... xb - which is also the mininum error for a single bucket histogram on these values).
Input data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 19 OPT[k][i]: i: 1 2 3 4 5 6 7 8 9 10 11 12 13 k 1 0 0.5 2.0 5.0 10.0 17.5 28.0 42.0 60.0 82.5 110.0 143.0 182.0 ... 2 0 0.0 0.5 1.0 2.5 4.0 7.0 10.0 15.0 20.0 27.5 35.0 45.5 ... 3 0 0 0.0 0.5 1.0 1.5 3.0 4.5 6.0 9.0 12.0 15.0 20.0 ... |
|
Example:
Input data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 19 OPT[k][i]: i: 1 2 3 4 5 6 7 8 9 10 11 12 13 k 1 0 0.5 2.0 5.0 10.0 17.5 28.0 42.0 60.0 82.5 110.0 143.0 182.0 ... 2 0 0.0 0.5 1.0 2.5 4.0 7.0 10.0 15.0 20.0 27.5 35.0 45.5 ... 2' 0 0 0 0 2.5 2.5 7.0 7.0 7.0 20.0 20.0 20.0 20.0 3 0 0 0.0 0.5 1.0 1.5 3.0 4.5 6.0 9.0 12.0 15.0 20.0 ... |
|
|
|
|
|
|
Input data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 19 OPT[k][i]: i: 1 2 3 4 5 6 7 8 9 10 11 12 13 k 1 0 0.5 2.0 5.0 10.0 17.5 28.0 42.0 60.0 82.5 110.0 143.0 182.0 ... 2 0 0.0 0.5 1.0 2.5 4.0 7.0 10.0 15.0 20.0 27.5 35.0 45.5 ... 2' 0 0 0 0 2.5 2.5 7.0 7.0 7.0 20.0 20.0 20.0 20.0 Err: 0 0 0.5 1.0 0 1.5 0 3.0 8.0 0 7.5 15.0 25.0 3 0 0 0.0 0.5 1.0 1.5 3.0 4.5 6.0 9.0 12.0 15.0 20.0 ... |
The error can be reduced by using 5 buckets to approximate OPT[k][2]: :
Input data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 19 OPT[k][i]: i: 1 2 3 4 5 6 7 8 9 10 11 12 13 k 1 0 0.5 2.0 5.0 10.0 17.5 28.0 42.0 60.0 82.5 110.0 143.0 182.0 ... 2 0 0.0 0.5 1.0 2.5 4.0 7.0 10.0 15.0 20.0 27.5 35.0 45.5 ... 2' 0 0 0 0 2.5 2.5 7.0 7.0 7.0 20.0 20.0 35.0 35.0 Err: 0 0 0.5 1.0 0 1.5 0 3.0 8.0 0 7.5 0.0 10.0 3 0 0 0.0 0.5 1.0 1.5 3.0 4.5 6.0 9.0 12.0 15.0 20.0 ... |
Notations: OPT[k][i] and SOL[i][k]:
Input data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 19 OPT[k][i]: i: 1 2 3 4 5 6 7 8 9 10 11 12 13 k 1 0 0.5 2.0 5.0 10.0 17.5 28.0 42.0 60.0 82.5 110.0 143.0 182.0 ... --> OPT[2][i] 0 0.0 0.5 1.0 2.5 4.0 7.0 10.0 15.0 20.0 27.5 35.0 45.5 ... --> Sol[2][i] 0 0 0 0 2.5 2.5 7.0 7.0 7.0 20.0 20.0 35.0 35.0 Err: 0 0 0.5 1.0 0 1.5 0 3.0 8.0 0 7.5 0.0 10.0 3 0 0 0.0 0.5 1.0 1.5 3.0 4.5 6.0 9.0 12.0 15.0 20.0 ... |
How to ensure accuracy:
|
Histogram with p backets used to approximate OPT[p][..]: x OPT[p][bjp] | x OPT[p][ajp] | | | | | | | ----+-----------------------+------- ajp <-----------------> bjp |
In other words:
|
OPT[p][bjp] - OPT[p][ajp] ----------------------------- < δ OPT[p][ajp] <==> OPT[p][bjp] - OPT[p][ajp] < δ × OPT[p][ajp] <==> OPT[p][bjp] < (1 + δ) OPT[p][ajp] |
Requirement:
|
Input data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 19 OPT[k][i]: i: 1 2 3 4 5 6 7 8 9 10 11 12 13 k 1 0 0.5 2.0 5.0 10.0 17.5 28.0 42.0 60.0 82.5 110.0 143.0 182.0 ... --> OPT[2][i] 0 0.0 0.5 1.0 2.5 4.0 7.0 10.0 15.0 20.0 27.5 35.0 45.5 ... --> Sol[2][i] 0 0.0 0.5 0.5 2.5 2.5 7.0 7.0 15.0 15.0 27.5 27.5 27.5 ... +------+ +-------+ +---------+ +---------++----------+ +-----------------+ |
|
Problem: construct an approximate V-optimal histogram with B = 3 buckets
Histogram with 1 bucket: Values: 1..1 | 1..2 | 1..3 | 1..4 | 1..5 | 1..6 | 1..7 | 1..8 | -------+------+------+------+------+------+------+------+--- Min Error: 0.0 | 2.0 | 2.0 | 8.75 | 10.0 | 13.3 | 63.7 | 161.5| |
Input: 4 2 3 6 5 6 12 16 |
[ 4 2 ] [ 3 ] ===> MinError[1][2] + 0 [ 4 ] [ 2 3 ] ===> MinError[1][1] + (2 - 2.5)2 + (3 - 2.5)2 | | +-------+ 1 bucket optimal histogram Using the result from the 1 bucket optimal histogram: [ 4 2 ] [ 3 ] ===> 2.0 + 0 = 2.0 [ 4 ] [ 2 3 ] ===> 0.0 + 0.5 = 0.5 <---- Min |
Result: larger than 2*0 --> start a new bucket with value = 0.5
Input: 4 2 3 6 5 6 12 16 |
[ 4 2 3 ] [ 6 ] ===> MinError[1][3] + 0 [ 4 2 ] [ 3 6 ] ===> MinError[1][2] + (3 - 4.5)2 + (6 - 4.5)2 [ 4 ] [ 2 3 6 ] ===> MinError[1][1] + (2 - 3.66)2 + (3 - 3.66)2 + (6 - 3.66)2 | | +---------+ 1 bucket optimal histogram Using the result from the 1 bucket optimal histogram: [ 4 2 3 ] [ 6 ] ===> 2.0 + 0 = 2.0 <--- Min [ 4 2 ] [ 3 6 ] ===> 2.0 + 4.5 = 6.5 [ 4 ] [ 2 3 6 ] ===> 0.0 + 8.666 = 8.666 |
Result: greater than 2*0.5 --> start a new bucket with value = 2.0
Input: 4 2 3 6 5 6 12 16 |
[ 4 2 3 6 ] [ 5 ] ===> MinError[1][4] + 0 [ 4 2 3 ] [ 6 5 ] ===> MinError[1][3] + (6 - 5.5)2 + (5 - 5.5)2 [ 4 2 ] [ 3 6 5 ] ===> MinError[1][2] + (3 - 4.66)2 + (6 - 4.66)2 + (5 - 4.66)2 [ 4 ] [ 2 3 6 5 ] ===> MinError[1][1] + (2 - 4)2 + (3 - 4)2 + (6 - 4)2 + (5 - 4)2 | | +-----------+ 1 bucket optimal histogram Using the result from the 1 bucket optimal histogram: [ 4 2 3 6 ] [ 5 ] ===> 8.75 + 0 = 8.75 [ 4 2 3 ] [ 6 5 ] ===> 2.0 + 0.5 = 2.5 <--- Min [ 4 2 ] [ 3 6 5 ] ===> 2.0 + 4.666 = 6.666 [ 4 ] [ 2 3 6 5 ] ===> 0.0 + 10 = 10 |
Result: 2.5 < 2* 2.0 Use 2.0 to approximate
Input: 4 2 3 6 5 6 12 16 |
|
Input: 4 2 3 6 5 6 12 16 |
Important Different between Guha's Algorithm and Jagadish's V-opt. Histogram:
|
Input: 4 2 3 6 5 6 12 16 |
{ 4 2 3 } [ 6 ] ===> MinError[2][3] + 0 { 4 2 } [ 3 6 ] ===> MinError[2][2] + (3 - 4.5)2 + (6 - 4.5)2 { 4 } [ 2 3 6 ] ===> MinError[2][1] + (2 - 3.66)2 + (3 - 3.66)2 + (6 - 3.66)2 | | +---------+ 2 bucket optimal histogram Using the result from the 2 bucket optimal histogram: { 4 2 3 } [ 6 ] ===> 0.5 + 0 = 0.5 <---- Min { 4 2 } [ 3 6 ] ===> 0.0 + 4.5 = 4.5 { 4 } [ 2 3 6 ] ===> 0.0 + 8.666 = 8.666 |
Result: 0.5 > (1 + δ) * 0.0 ==> Start a new bucket !
Input: 4 2 3 6 5 6 12 16 |
{ 4 2 3 6 } [ 5 ] ===> MinError[2][4] + 0 { 4 2 3 } [ 6 5 ] ===> MinError[2][3] + (6 - 5.5)2 + (5 - 5.5)2 { 4 2 } [ 3 6 5 ] ===> MinError[2][2] + (3 - 4.66)2 + (6 - 4.66)2 + (5 - 4.66)2 { 4 } [ 2 3 6 5 ] ===> MinError[2][1] + (2 - 4)2 + (3 - 4)2 + (6 - 4)2 + (5 - 4)2 | | +-----------+ 2 bucket optimal histogram Using the result from the 1 bucket optimal histogram: { 4 2 3 6 } [ 5 ] ===> 2.0 + 0 { 4 2 3 } [ 6 5 ] ===> 0.5 + 0.5 = 1.0 <--- Min { 4 2 } [ 3 6 5 ] ===> 0.0 + 4.666 = 4.666 { 4 } [ 2 3 6 5 ] ===> 0.0 + 10.0 = 10.0 |
Result: 1.0 <= (1 + δ) * 0.5 - extend the current bucket
Input: 4 2 3 6 5 6 12 16 |
|
+-----------------------+ | a (left boundary) | +-----------------------+ | b (right boundary) | +-----------------------+ | Sol (histogram value | +-----------------------+ |
Approximate V-optimal Histogram with 2 bucket: Values: 1..1 | 1..2 | 1..3 | 1..4 | 1..5 | 1..6 | 1..7 | 1..8 | -------+------+------+------+------+------+------+------+--- Appr Error: 0.0 | 0.0 | 0.5 | 2.0 | 2.0 | 2.0 | 13.3 | 13.3 | |
is represented as follows:
![]() |
/* ------------------------------------------------ Help function to compute Error in a bucket ------------------------------------------------ */ SqError(int a, int b) { s2 = PP[b] - PP[a]; s1 = P[b] - P[a]; return (s2 - s1*s1/(b-a+1)); } /* ---------------------------------------------- Prepare arrays to compute error efficiently ---------------------------------------------- */ P[0] = 0; PP[0] = 0; for (i = 1; i <= N; i++) { P[i] = P[i-1] + xi PP[i] = PP[i-1] + xi2 } // Note: We don't approximate OPT[1][..] // OPT[1][..] is computed exactly using SqError(a,b) /* ------------------------------------------------------------------ Now we compute the approximate V-opt. histogram with B buckets Output: BestError[i][k] = best error of histogram using k buckets on data points (1..i) ------------------------------------------------------------------ */ // The dynamic algorithm uses these variables: // // k = # buckets // i = current item - items processed are: (1..i) // Initialization /* ----------------------------------------------------- Set up (B-1) linked list Q[k] (k = 2..B) to store the approximate solution histograms Each Q[k] is a linked list of records of the form: Q[k].a = left point of bucket Q[k].b = right point of bucket Q[k].Sol = approximate solution at point Q[k].a ----------------------------------------------------- */ for (k = 2; k <= B; k++) { Q[k] -> (a = 1, b = 1, Sol = 0); // List element } for (k = 1; k <= B; k++) { // Find optimal histogram using k buckets currApprox = Q[k] -> Sol // initially equal to 0 for (i = 1; i <= N; i++) { minError = INFINITE; // Start value // Try every possible size for the last bucket for (j = 1; j <= i-1; j++) // Last bucket is [j..i] { if ( Lookup(Q[k-1], j) + SqError(j+1,i) < minError ) { minError = Lookup(Q[k-1], j) + SqError(j+1,i); // Better division found } } if ( minError > (1 + δ) * currApprox ) { Add new bucket (a = i, b = i, Sol = minError) to Q[k] } } } |
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |