|
|
Each ai is a value from the domain N = {1, 2, ..., n}
(Basically, A is a sequence of integers and each integers is taken from the set N...)
mi = | {j | aj = i} | |
In English:
|
Input stream: 1 2 3 1 2 4 5 1 ... Then: m1 = 3 m2 = 2 m3 = 1 m4 = 1 m5 = 1 |
|
|
Input stream: 1 2 3 1 2 4 5 1 ... With: m1 = 3 m2 = 2 m3 = 1 m4 = 1 m5 = 1 Then: Cardinality = 30 + 20 + 10 + 10 + 10 = 5 (# different values in input stream (so far) |
|
|
(Since mi is the number of occurence of the value i, adding all mi's will give us the total number of items in the sequence...)
Input stream: 1 2 3 1 2 4 5 1 ... With: m1 = 3 m2 = 2 m3 = 1 m4 = 1 m5 = 1 Then: Sequence length = 3 + 2 + 1 + 1 + 1 = 8 = # elements in input stream (so far) |
|
|
(Since mi is the number of occurence of the value i, a self-join will pair these mi values with each other - resulting in mi2 tuples.
Summing all these values will give us the total number of tuples in the self-join...)
|
for any value of k
|
![]() |
|
Theorem 2.1 (page 2)
|
Note::
|
| Y - Fk | ---------- > λ Fk |
|
|
The number of variables used in the computation will be:
|
(the accuracy will increase if more memory is used and more informations are computed)
|
(The values Yi is not stored but computed using Xij, so they do not contribute to the memory requirement)
|
It is just like the π computation method where we counted the number of points inside the quarter circle
But it is more complicated than just "throwing a dart at a unit square". Nevertheless, it is comparable to ""throwing a dart".
|
|
Variable usage: --------------- Lij = matching value m = number of elements in stream that has been processed rij = number of values in the input that matches Lij |
|
Example:
|
|
Example:
|
![]() |
|
Modified algorithm that does not use a priori knowledge of m:
Variable usage: --------------- L = matching value m = number of elements in stream that has been processed r = number of matching elements seen in the input (upto now) |
So the algorithm must use an array r[1..s1 × s2]
Psuedo Code:
Variable usage: --------------- Lij = matching value m = number of elements in stream that has been processed rij = number of matching elements seen in the input (upto now) |
|