|
|
Example:
![]() |
Example:
![]() |
|
|
The join operation will produce no output tuples !!!
Input 1: Input 2: Output: --------- ---------- ------------ (a1,b0) (a2,c0) (a2,b1) (a1,c1) (a1,b0,a1,c1), (a1,b0,a1,c2), ..., (a1,b0,a1,ck), (a2,b2) (a1,c2) (a2,b1,a2,c0), (a2,b2,a2,c0), ..., (a2,bk,a2,c0) ... ... (a2,bk) (a1,ck) |
There is only a single tuple (a1,b0) in R1
There is only a single tuple (a1,b0) in R1
|
Answer:
|
|
Example:
![]() |
|
w(i) = relative weight of item i // Must be given a priori N = 0; // N = number of tuples selected while ( not EOF ) { t = read_next_tuple(); // Get next tuple from input // Select tuple if ( random() < w(t) ) { Sample[N] = t; N++; } } |
|
f = 1; // f*w(t) = selection probability S = empty; // S = Concise Sample while ( not EOF ) { t = next input value; if ( random() < f*w(t) ) { if ( t ∈ S ) { increase count of the t value in S } else { add (t,1) to S } } /* ------------------------------------------- Deletion step: Adjust sample when it gets too large... ------------------------------------------- */ if ( size(S) > MaxSize ) { f' = β × f; // New selection probab // β < 1 for ( each sample t ∈ S ) do { for ( i = 1; i <= t.count; i++ ) { if ( random() < 1-β ) t.count--; } if ( t.count == 0 ) delete t from S; } f = f'; } } |
|
|
The sample S is clearly a weight random sample.
It is not obvious that S' will be a weighted random sample.
So, just like the uniform case, we must show that S' will also be a weighted random sample
Input stream: ... x1 ... x1 ... x1 ... x1 .... x1 ... | | | | | Selected with | | | | | probab = f*w(x1) v v v v v S: x1 x1 |
Each value x1 is included with probability f*w(x1)
So:
|
Graphically:
Input stream: ... x1 ... x1 ... x1 ... x1 .... x1 ... | | | | | Selected with | | | | | probab = f*w(x1) v v v v v S: x1 x1 | Selected with | probab = β v S': x1 |
Input stream: ... x1 ... x1 ... x1 ... x1 .... x1 ... | | | | | | | | | | v v v v v | Selected with | probab = β * f * w(x1) v S': x1 |
The resulting sample is also weighted