|
This step requires computer science...
Their paper: click here
|
This problem is solved by a brute force search
(A smart brute force search :-))
![]() |
Meaning of the figure:
|
|
Therefore:
Optimal Histogram for [a,b] using k buckets
=
Optimal Histogram of [a..x-1] using k-1 buckets
|
|
Answer:
|
Optimal Histogram for [a,b] using k buckets = minx=a..b{ Optimal Histogram of [a..x-1] using k-1 buckets + last bucket is [x..b]} |
|
Problem: construct a V-optimal histogram with B = 3 buckets
Histogram with 1 bucket: Values: 1..1 | 1..2 | 1..3 | 1..4 | 1..5 | 1..6 | 1..7 | 1..8 | -------+------+------+------+------+------+------+------+--- Min Error: 0.0 | 2.0 | 2.0 | 8.75 | 10.0 | 13.3 | 63.7 | 161.5| |
Input: 4 2 3 6 5 6 12 16 |
[ 4 2 ] [ 3 ] ===> MinError[1][2] + 0 [ 4 ] [ 2 3 ] ===> MinError[1][1] + (2 - 2.5)2 + (3 - 2.5)2 | | +-------+ 1 bucket optimal histogram Using the result from the 1 bucket optimal histogram: [ 4 2 ] [ 3 ] ===> 2.0 + 0 = 2.0 [ 4 ] [ 2 3 ] ===> 0.0 + 0.5 = 0.5 <---- Min |
Result:
Input: 4 2 3 6 5 6 12 16 |
[ 4 2 3 ] [ 6 ] ===> MinError[1][3] + 0 [ 4 2 ] [ 3 6 ] ===> MinError[1][2] + (3 - 4.5)2 + (6 - 4.5)2 [ 4 ] [ 2 3 6 ] ===> MinError[1][1] + (2 - 3.66)2 + (3 - 3.66)2 + (6 - 3.66)2 | | +---------+ 1 bucket optimal histogram Using the result from the 1 bucket optimal histogram: [ 4 2 3 ] [ 6 ] ===> 2.0 + 0 = 2.0 <--- Min [ 4 2 ] [ 3 6 ] ===> 2.0 + 4.5 = 6.5 [ 4 ] [ 2 3 6 ] ===> 0.0 + 8.666 = 8.666 |
Result:
Input: 4 2 3 6 5 6 12 16 |
Input: 4 2 3 6 5 6 12 16 |
Input: 4 2 3 6 5 6 12 16 |
{ 4 2 3 } [ 6 ] ===> MinError[2][3] + 0 { 4 2 } [ 3 6 ] ===> MinError[2][2] + (3 - 4.5)2 + (6 - 4.5)2 { 4 } [ 2 3 6 ] ===> MinError[2][1] + (2 - 3.66)2 + (3 - 3.66)2 + (6 - 3.66)2 | | +---------+ 2 bucket optimal histogram Using the result from the 2 bucket optimal histogram: { 4 2 3 } [ 6 ] ===> 0.5 + 0 = 0.5 <---- Min { 4 2 } [ 3 6 ] ===> 0.0 + 4.5 = 4.5 { 4 } [ 2 3 6 ] ===> 0.0 + 8.666 = 8.666 |
Result:
Input: 4 2 3 6 5 6 12 16 |
{ 4 2 3 6 } [ 5 ] ===> MinError[2][4] + 0 { 4 2 3 } [ 6 5 ] ===> MinError[2][3] + (6 - 5.5)2 + (5 - 5.5)2 { 4 2 } [ 3 6 5 ] ===> MinError[2][2] + (3 - 4.66)2 + (6 - 4.66)2 + (5 - 4.66)2 { 4 } [ 2 3 6 5 ] ===> MinError[2][1] + (2 - 4)2 + (3 - 4)2 + (6 - 4)2 + (5 - 4)2 | | +-----------+ 2 bucket optimal histogram Using the result from the 1 bucket optimal histogram: { 4 2 3 6 } [ 5 ] ===> 2.0 + 0 { 4 2 3 } [ 6 5 ] ===> 0.5 + 0.5 = 1.0 <--- Min { 4 2 } [ 3 6 5 ] ===> 0.0 + 4.666 = 4.666 { 4 } [ 2 3 6 5 ] ===> 0.0 + 10.0 = 10.0 |
Result:
Input: 4 2 3 6 5 6 12 16 |
Input: 4 2 3 6 5 6 12 16 |
/* ------------------------------------------------ Help function to compute Error in a bucket ------------------------------------------------ */ SqError(int a, int b) { s2 = PP[b] - PP[a]; s1 = P[b] - P[a]; return (s2 - s1*s1/(b-a+1)); } /* ---------------------------------------------- Prepare arrays to compute error efficiently ---------------------------------------------- */ P[0] = 0; PP[0] = 0; for (i = 1; i <= N; i++) { P[i] = P[i-1] + xi PP[i] = PP[i-1] + xi2 } /* --------------------------------------------- Compute the best error for 1 bucket histogram --------------------------------------------- */ for (i = 1; i <= N; i++) { // Single bucket: use error formula... BestErr[k][i] = SqError(1,i); } /* --------------------------------------------------------- Now we compute the V-opt. histogram with B buckets Output: BestError[k][i] = best error of histogram using k buckets on data points (1..i) --------------------------------------------------------- */ // The dynamic algorithm uses these variables: // // k = # buckets // i = current item - items processed are: (1..i) // BestError[k][i] = min. error in histogram of k buckets for f1..fi for (k = 1; k <= B; k++) { // Find optimal histogram using k buckets for (i = 1; i <= N; i++) { // Multiple buckets: search BestError[k][i] = INFINITE; // Start value // Try every possible size for the last bucket for (j = 1; j <= i-1; j++) // Last bucket is [j..i] { if ( BestError[k-1][j] + SqError(j+1,i) < BestError[k][i] ) { BestError[k][i] = BestError[k-1][j] + SqError(j+1,i); // Better division found } } } } |
/* ------------------------------------------------ Help function to compute Error in a bucket ------------------------------------------------ */ SqError(int a, int b) { s2 = PP[b] - PP[a]; s1 = P[b] - P[a]; return (s2 - s1*s1/(b-a+1)); } /* ---------------------------------------------- Prepare arrays to compute error efficiently ---------------------------------------------- */ P[0] = 0; PP[0] = 0; for (i = 1; i <= N; i++) { P[i] = P[i-1] + xi PP[i] = PP[i-1] + xi2 } /* --------------------------------------------- Compute the best error for 1 bucket histogram --------------------------------------------- */ for (i = 1; i <= N; i++) { // Single bucket: use error formula... BestErr[k][i] = SqError(1,i); } index[1] = 1; // First index ************* /* --------------------------------------------------------- Now we compute the V-opt. histogram with B buckets Output: BestError[k][i] = best error of histogram using k buckets on data points (1..i) --------------------------------------------------------- */ // The dynamic algorithm uses these variables: // // k = # buckets // i = current item - items processed are: (1..i) // BestError[k][i] = min. error in histogram of k buckets for f1..fi for (k = 1; k <= B; k++) { // Find optimal histogram using k buckets for (i = 1; i <= N; i++) { // Multiple buckets: search BestError[k][i] = INFINITE; // Start value // Try every possible size for the last bucket for (j = 1; j <= i-1; j++) // Last bucket is [j..i] { if ( BestError[k-1][j] + SqError(j+1,i) < BestError[k][i] ) { BestError[k][i] = BestError[k-1][j] + SqError(j+1,i); // Better division found index[i] = j+1; // ******************** } } } } /* --------------------------------- Print bucket boundaries --------------------------------- */ i = B; j = n; while (i >= 2) { int end_point; end_point = j; j = min_index[j]; System.out.println("[" + j + " .. " + end_point + "]"); j--; i--; } System.out.println("[" + 1 + " .. " + j + "]"); |