SELECT * FROM R WHERE R.A = value |
Other queries may require an infinite amount of buffer space to compute the exact answer...
Example:
SELECT * FROM R, S WHERE R.A = S.A |
To find the exact answer to the above query, all the tuples from R and S must be present and accounted for.
But in a stream database where data is constantly generated, we need to wait an indefinite amount of time to obtain all tuples...
So finding the exact answer to join queries is usually not the goal of a stream database system
|
Phone book FName LName Phone Project FName LName ======================== -------> ============== John Smith 123-5678 John Smith Jane Smith 123-5678 Jane Smith John Doe 234-2039 John Doe Jack Rabbit 876-9876 Jack Rabbit |
|
Obviously, the duplicate removing projection requires searching and it has more overhead
Phone book FName LName Phone Project LName Phone ======================== -------> ============== John Smith 123-5678 Smith 123-5678 Jane Smith 123-5678 Smith 123-5678 John Doe 234-2039 Doe 234-2039 Jack Rabbit 876-9876 Rabbit 234-2039 |
Phone book FName LName Phone Project LName Phone ======================== -------> ============== John Smith 123-5678 Smith 123-5678 Jane Smith 123-5678 John Doe 234-2039 Doe 234-2039 Jack Rabbit 876-9876 Rabbit 234-2039 |
![]() |
With:
|
|
|
The outcome may be different... we need to evaluate the query for BOTH types of projection operators.
  π A ( σ A > 10 (S) )   |
|
|
  π A ( σ A > 10 ∧ A < 20 (S) )   |
|
|
π A ( σ A=D ( S x T ) ) |
It is re-stated here for completeness
            π A ( σ A=D ∧ A > 10 ∧ D < 20 ( S   x   T ) )             |
|
Consider the following summary information:
|
|
For a tuple with s.A is between 10 and 20 do:
Example:
![]() |
How an arriving S tuple
is processed:
NOTE: After processing the tuples, ACnt[13] and ACnt[19] will be increased by 1 (not shown in figure) |
If t.D is between 10 and 20 then:
Example:
![]() |
The explanation is similar
to the processing of an S-tuple...
NOTE: DCnt[17] and DCnt[18] will be increased by 1 (not shown in figure) |
|
If s.A is between 10 and 20 then:
Example:
![]() |
How an arriving S tuple
is processed:
NOTE: ACnt[13] and ACnt[19] will be increased by 1 (not shown in figure) |
If t.D is between 10 and 20 then:
Example: (it's similar to the one above)
![]() |
How a T tuple is processed.
NOTE: DCnt[17] and DCnt[18] will be increased by 1 (not shown in figure) |
            π A ( σ B < D ∧ A > 10 ∧ A < 20 ( S   x   T ) )             |
Important characteristic of this query:
(Assuming that S.A is integer, there are 9 different values for S.A: 11, 12, .., 19)
(This fact is very important in this example).
What information is essential to process the condition B < D ∧ A > 10 ∧ A < 20 ?
![]() |
(The attribute value A=12 will be outputted if the query condition is satisfied)
Reason:
|
So the tuple (12,8,99) is not necessary when we retain the tuple (12,6,99)
Important:
|
What information is essential to process arriving S tuple ?
![]() |
Reason:
|
So we only need to remember the largest value in the D attributes
|
Example of information content:
the data structure will contain:
![]() |
|
Processing an incoming S tuple:
if ( s.A > 10 && s.A < 20 ) { if ( s.B < DMAX ) if s.A has not been outputted then Output s.A set output[s.A] = true; if ( s.B < BMIN[s.A] ) BMIN[s.A] = s.B; } |
Graphically:
![]() |
When (12,4,99) arrives:
|
Processing an incoming T tuple
for i = 11, 12, ..., 19 do if ( BMIN[i] < t.D ) if i has not been outputted then Output i set output[i] = true; if ( t.D > DMAX ) DMAX = t.D; |
Graphically:
![]() |
When t(5,99) arrives:
|
|
So it is the opposite to a simple project query !
            π A ( σ B < D ∧ A > 10 ∧ A < 20 ( S   x   T ) )             |
cannot be processed with exact accuracy with limited amount of memory.
Example:
![]() |
|
Since there is an infinite number of integers, this is not feasible.
|
|
Filter1 AND Filter2: +---------+ +---------+ S -----> | Filter1 | -------> | Filter2 | -------> output +---------+ +---------+ |
Filter1 OR Filter2: +---------+ +--> | Filter1 | ---+ | +---------+ | | | --->+ + -----> output | | | +---------+ | +--> | Filter2 | ---+ +---------+ |
|
Each attribute has a value: age=60, color="red", salary=40,000, etc.
|
|
|
|
|
E is totally ordered by P if:
|
(Because 5 < B follows from A < B ∧ 5 < A)
|
|
|
|
|
|
|
Remember just a moment ago, we showed that the predicate P = {A < B, 5 < A} has a closure:
And P+ imposes a total ordering on {5, A, B}.
Hence, the above query is a Locally Totally Ordered query.
|
|
Theorem 4.1
|
            π A ( σ A=D ∧ A > 10 ∧ D < 20 ( S   x   T ) )             |
|
|
In fact:
|
We can't order the other elements...
So there is a stream (namely S(A,B,C)) where we can't impose an ordering.
And therefore:
is not a Locally Totally Ordered query
Here is how it can be done:
A=D ∧ A > 10 ∧ D < 20 = (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 < E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 = E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 > E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C = B ∧ 20 < E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C = B ∧ 20 = E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C = B ∧ 20 > E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C > B ∧ 20 < E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C > B ∧ 20 = E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C > B ∧ 20 > E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B = 10 ∧ C < B ∧ 20 < E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B = 10 ∧ C < B ∧ 20 = E) ∨ (A=D ∧ A > 10 ∧ D < 20 ∧ B = 10 ∧ C < B ∧ 20 > E) ... and so on (complete the serie) |
|
The predicate "A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 < E" imposes the following ordering on the elements {A, B, C, 10, 20}:
|
The predicate "A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 < E" imposes the following ordering on the elements {D, E, 10, 20}:
|
Therefore, the query
|
is a Locally Totally Ordered query.
|
The predicate "A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 = E" imposes the following ordering on the elements {A, B, C, 10}:
|
(The constant 20 is not involved in any relation with an attribute in stream S - so we do not need to include it)
The predicate "A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 = E" imposes the following ordering on the elements {D, E, 10, 20}:
|
Therefore, the query
|
is also a Locally Totally Ordered query.
In other words, each derived Locally Totally Ordered query will retrieve a portion of the tuples that the original query.
π A ( σ A=D ∧ A > 10 ∧ D < 20
( S   x   T ) ) =
           
π A ( σ A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 < E ( S   ×   T ) ) ∪ π A ( σ A=D ∧ A > 10 ∧ D < 20 ∧ B < 10 ∧ C < B ∧ 20 = E ( S   ×   T ) ) ∪ ... (and so on, a lot more Totally Ordered queries) |
That's the gist of Theorem 4.1...
|
|
|
|
|
Example: streams S(A,B,C) and T(D,E)
|
|
|
|
![]() |
The attributes in the streams S and T are:
            π A ( σ A=D ∧ A > 10 ∧ D < 20 &and B > 20 &and C < 10 &and E < 10 ( S   x   T ) )             |
|
You can now see which attributes are bounded and unbounded:
|
|
Conclusion:
|
![]() |
The attributes in the streams S and T are:
π+A ( σ B < D &and A > 10 &and A < 20 &and B > 20 &and C < 10 &and D > 20 &and E < 10 (S × T) ) |
Question:
|
You can now see which attributes are bounded and unbounded:
|
|
Conclusion:
|
Example tuple arrival:
![]() |
Situation:
![]() |
Analysis:
|
|
![]() |
πA ( σ B < D &and A > 10 &and A < 20 &and B > 20 &and C < 10 &and D > 20 &and E < 10 (S × T) ) |
|
You can now see which attributes are bounded and unbounded:
|
|
Conclusion:
|
|
Question:
|
Answer (in the paper:)
|
|
πA ( σ B < D &and C < E ∧ A > 10 &and A < 20 &and B > 20 &and C < 10 &and D > 20 &and E < 10 (S × T) ) |
(This query differs from the previous one by only the extra clause C < E )
|
You can now see which attributes are bounded and unbounded:
|
|
Conclusion:
|
![]() |
|
|
Fact:
Therefore:
|
This requires an infinite amount of memory...