Vous êtes sur la page 1sur 25

NiagaraCQ : A Scalable

Continuous Query System


for Internet Databases

Jianjun Chen et al
Computer Sciences Dept.
University of Wisconsin-Madison

SIGMOD 2000
Presented by
Mukund Agrawal
Continuous Queries
A triple ( Q, A, Stop)
Scope also includes future data
Example
Inform me when there is a new publication related to
multi-query optimization

A broad classification
 Change based
 Timer based
NiagaraCQ
A CQ system for the Internet
Continuous Queries on XML data sets
Scalable CQ processing
Incremental group optimization
Handles both change based and timer based
queries in a uniform way
Outline
General strategy of incremental group
optimization
Query split with materialized intermediate
files
Incremental grouping of selection and join
operators
System architecture
Experimental results
NiagaraCQ command language
Creating a CQ
Create CQ_name
XML-QL query
Do action
{ START start_time} { EVERY time_interval}
{ EXPIRE expiration_time}

Delete CQ_name
Incremental group optimization
General Strategy

Why can’t we regroup all queries when


a new query is added ?

Use of expression signatures for


grouping
 Same syntax structure
 Different constant values
Expression Signature
Query examples
Where <Quotes><Quote><Symbol>INTC</></></>
element_as $g in “http://www.stock.com/quotes.xml”
construct $g

Where <Quotes><Quote><Symbol>MSFT</></></>
element_as $g in “http://www.stock.com/quotes.xml”
construct $g

Expression signatures
=

Quotes.Quote.Symbol constant
in quotes.xml
Query plans
Trigger Action I Trigger Action J

Select Select
Symbol = “INTC” Symbol = “MSFT”

File Scan File Scan

quotes.xml quotes.xml
Group
Group Signature
 Common signature of all queries in the group

Group constant table

Constant_value Dest_buffer

INTC Dest. I
MSFT Dest. J
The group plan
Incremental Grouping Algo
When a new query is submitted
If the expression signature of the new query
matches that of existing groups
Break the query plan into two parts
Remove the lower part
Add the upper part onto the group plan
else create a new group
Query split with materialized
intermediate files
Why not use a pipeline scheme ?
 Split operator may block simple queries
 Gives a single complicated execution plan
 A large portion of query plan may not need to be
executed at each invocation
 Does not work for grouping timer based queries

Using intermediate files


 Cut query plan into 2 parts at split operator
 Add a file scan operator to upper part to read
intermediate file
The query split scheme
Trade-offs
Other advantages of materialized
intermediate files
 Only the necessary queries are executed
 Uniform handling of intermediate files and
original data source files

Disadvantages
 Split operator becomes a blocking operator
 Extra disk I/Os
Incremental grouping of
selection predicates
Multiple selection predicates in a query
 CNF for predicates on same data source
Incremental grouping
 Choose the most selective conjunct
Evaluation of other predicates
 Upper levels of continuous query
Example query
Where <Quotes><Quote><Symbol>”INTC”</>
<Current_Price>$p</></> element_as $g </>
in “quotes.xml”, $p < 100
Construct $g
Range-query groups
Problem
 Intermediate files may contain duplicate tuples

Solution : Virtual intermediate files


 Virtual intermediate file stores value ranges
 One real intermediate file has a clustered index
Incremental grouping of
join operators
A join query
Quotes.Quote.Change_Ratio constant in “quotes.xml”
Where <Quotes><Quote><Symbol>$s</></>
element_as $g </> in “quotes.xml”,
<Companies><Company><Symbol>$s</></>
element_as $t</> in “companies.xml”
construct $g, $t
Queries that contain both
join and selection
Example query :
Where <Quotes><Quote><Symbol>$s</>
<Industry>”Computer Service”</></>
element_as $g </> in “quotes.xml”,
<Companies><Company><Symbol>$s</></>
element_as $t</> in “companies.xml”
construct $g, $t

Where to place the selection operator ?


 Below the join
 Above the join
Grouping timer-based queries
Challenge
 Sharing common computation

Event List
 Stores time events sorted in time order
Incremental evaluation
Invoke queries only on changed data

For each file, NiagaraCQ keeps a delta


file

Incremental evaluation of join operators


requires complete data files
Memory Caching
Thousands of continuous queries can’t
fit in memory
What should we cache ?
 Grouped query plans
 What about non-grouped queries ?

 Favor small delta files


 Front part of the event list
System Architecture
CQ processing
Experimental Results
Example query :
Where <Quotes><Quote><Symbol>”INTC”</></>
element_as $g </> in “quotes.xml”, construct $g
Thank You

Vous aimerez peut-être aussi