Vous êtes sur la page 1sur 30

Overview of Query Evaluation

The DBMS describes the data that it manages, including tables and indexes. This descriptive data or metadata stored in special tables called the system catalogs is used to find the best way to evaluate a query. SQL queries are translated into an extended form of relational algebra, and query evaluation plans are represented as trees of relational operators, along with labels that identify the algorithm to use at each code. Relational operators serve as building blocks for evaluating queries and the implementation of these operators is carefully optimized for good performance.

Queries are composed of several operators, and the algorithms for individual operators can be combined in many ways to evaluate a query. The process of finding a good evaluation plan is called query optimization.

The system catalog

We can store a table using one of several alternative file structures, and we can create one or more indexes each stored as a file on every table In a relational DBMS, every file contains either the tuples in a table or the entries in an index. The collection of files corresponding to users tables and indexes represents the data in the database. A relational DBMS maintains information about every table and index that it contains. The descriptive information is itself stored in a collection of special tables called the catalog tables. The catalog tables are also called the data dictionary, the system catalog or simply the catalog.

Information in the catalog

At a minimum, we have system-wide information about individual tables, indexes and views.
For each table: - Its table name, file name and the file structure of the file in which it is stored. - The attribute name and type of each of its attributes. - The index name of each index on the table. - The integrity constraints on the table. For each index: - The index name and the structure of the index. - The search key attributes. For each view: - Its view name and definition In addition, statistics about tables and indexes are stored in the system catalogs and updated periodically.


The following information is commonly stored: Cardiality: # of tuples NTuples(R) for each table R Size: # of pages NPages(R) for each table R. Index Cardinality: # of distinct key values NKeys(I) for each index I. Index Size: # of pages INPages(I) for each index I. Index Height: # of nonleap levels IHeight(I) for each tree indiex I. Index Range: The minimum present key value ILow(I) and maximum present key value IHigh(I) for each index I. The database contains the two tables are:
Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string)

Introduction to operator evaluation

Several alternative algorithms are available for implementing each relational operator and for most operators no algorithm is universally superior. Several factors influence which algorithm performs best, including the sizes of the tables involved, existing indexes and sort orders , the size of the available buffer pool, and the buffer replacement policy.

Algorithms for evaluating relational operators use some simple ideas extensively: Indexing: Can use WHERE conditions to retrieve small set of tuples (selections, joins) Iteration: Sometimes, faster to scan all tuples even if there is an index. (And sometimes, we can scan the data entries in an index instead of the table itself.) Partitioning: By using sorting or hashing, we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs.

Access Paths
An access path is a method of retrieving tuples from a table File scan, or index that matches a selection condition(in the query). -A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key. E.g., Tree index on <a, b, c> matches the selection a=5 AND b=3, and a=5 AND b>6, but not b=3. -A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index. E.g., Hash index on <a, b, c> matches a=5 AND b=3 AND c=5; but it does not match b=3, or a=5 AND b=3, or a>5 AND b=3 AND c=5. A Note on Complex Selections -(day<8/9/94 AND rname=Paul) OR bid=5 OR sid=3) -Selection conditions are first converted to conjunctive normal form (CNF) (day<8/9/94 OR bid=5 OR sid=3 ) AND (rname=Paul OR bid=5 OR sid=3)

Contd., Find the most selective access path, retrieve tuples using it, and apply any remaining terms that dont match the index: Most selective access path: An index or file scan that we estimate will require the fewest page I/Os. Terms that match this index reduce the number of tuples retrieved; other terms are used to discard some retrieved tuples, but do not affect number of tuples/pages fetched. Consider day<8/9/94 AND bid=5 AND sid=3. A B+ tree index on day can be used; then, bid=5 and sid=3 must be checked for each retrieved tuple. Similarly, a hash index on <bid, sid> could be used; day<8/9/94 must then be checked.

Algorithms for relational operations:

1. Selection: The selection operation is a simple retrieval of tuples from a table, and its implementation is essentially covered in our discussion of access paths. Given a selection of the form R.attr op value(R), if there is no index on R attr, we have to scan R. If one or more indexes on R match the selection, we can use index to retrieve matching tuples and apply any matching selection conditions to further restrict the result set.

Using an Index for Selections

Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data entries (typically small)plus cost of retrieving records (could be large w/o clustering). In example, assuming uniform distribution of names,about 10% of tuples qualify (100 pages, 10000 tuples). With a clustered index, cost is little more than 100 I/Os; if unclustered, upto 10000 I/Os! SELECT * FROM Reserves R WHERE R.rname < C%;

As a rule thumb, it is probably cheaper to simply scan the entire table(instead of using an unclustered index) if over 5% of the tuples are to be retrieved.

Contd., 2. Projection: The projection operation requires us to drop certain fields of the input, which is easy to do. The expensive part is removing duplicates. SQL systems dont remove duplicates unless the keyword DISTINCT is specified in a query. SELECT DISTINCT R.sid, R.bid FROM Reserves R; Sorting Approach: Sort on <sid, bid> and remove duplicates. (Can optimize this by dropping unwanted information while sorting.) Hashing Approach: Hash on <sid, bid> to create partitions. Load partitions into memory one at a time, build in-memory hash structure, and eliminate duplicates. If there is an index with both R.sid and R.bid in the search key, may be cheaper to sort data entries!

Contd., 3. Join: Joins are expensive operations and very common. This systems typically support several algorithms to carry out joins. Consider the join of Reserves and Sailors, with the join condition Reserves.sid=Sailors.sid. Suppose one of the tables, say Sailors, has an index on the sid column. We can scan Reserves and for each tuple, use the index to probe Sailors for matching tuples. This approach is called index nested loops join.
Ex: The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple. The cost of scanning Reserves is 1000. There are 100*1000 =100000 tuples in Reserves. For each of these tuples, retrieving the index page containing the rid of the matching Sailors tuple costs 1.2 I/Os(on avg), in addition we have to retrieve the Sailors page containing the qualifying tuple. Therefore we have 100000*(1+1.2) I/Os to retrieve matching Sailors tuples. The total cost is 221000 I/Os.

If we do not have an index that matches the join condition on either table, we cannot use index nested loops. In this case, we can sort both tables on the join column, and then scan them to find matches. This is called sort-merge join.

Ex: We can sort Reserves and Sailors in two passes. Read and Write Reserves in each pass the sorting cost is 2*2*1000=4000 I/Os. Similarly we can sort Sailors at a cost of 2*2*500=2000 I/Os. In addition, the second phase of the sortmerge join algorithm requires an additional scan of both tables. Thus the total cost is 4000+2000+1000+500=7500 I/Os.

Introduction to query optimization


Query optimization is one of the most important tasks of a relational DBMS. A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1).Queries are parsed and then presented to query optimizer, which is responsible for identifying an efficient execution plan. The optimizer generates alternative plan and chooses the plan with the least estimated cost.

Query Parser
Parsed query

Query Optimizer
Plan Generator Plan cost Estimator
Catalog Manager

Evaluation Plan

Query Plan Evaluator

The space of plans considered by a typical Fig(1). Query Parsing, Optimization & Execution relational query optimizer can be carried out on the result of the - algebra expression. Optimizing such a relational algebra expression involves two basic steps: 1. Enumerating alternative plans for evaluating the expression. Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large. 2. Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost.

Query Evaluation Plans: A query evaluation plan consists of an extended relational algebra tree, with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator. Ex: select s.sname from Reserves r, Sailors s where r.sid=s.sid and r.bid=100 and s.rating>5

The above query can be expressed in relational algebra as follows:

bid=100 ^ rating>5(Reserves sid=sid


This expression is shown in the form of a tree in Fig.(2).


bid=100 ^ rating>5




Fig(2). Query Expressed as a Relational Algebra Tree

Contd., The algebra expression partially specifies how to evaluate the query we first compute the natural join of Reserves and Sailors, then perform the selections and finally project the sname field. To obtain a fully specified evaluation plan, we must decide on an implementation for each of the algebra operations involved. For ex, we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced; the result of the join before the selections and projections is never stored in its entirety. This query evaluation plan is shown in Fig.(3).


bid=100 ^ rating>5 (on-the-fly)

sid=sid (simple nested loops)

(File scan)Reserves

Sailors(File scan)

Fig(3). Query Evaluation Plan for Sample Query

Alternative Plans:Motivating Example

SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5; Cost: 500+500*1000 I/Os By no means the worst plan! Misses several opportunities: selections could have been `pushed earlier, no use is made of any available indexes, etc. Goal of optimization: To find more efficient plans that compute the same answer RA Tree: is as shown in fig(2) and plan is as shown in fig(3)

Alternative Plans 1 (No Indexes) as shown in fig(4): Main difference: push selects.

1. With 5 buffers, cost of plan: 2. Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution). 3. Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings). 4. Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250) 5. Total: 3560 page I/Os. If we used BNL join, join cost = 10+4*250, total cost = 2770. If we `push projections, T1 has only sid, T2 only sid and sname: 1. T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000.



sid=sid (Sort-merge join)

(scan write to temp T1)


rating>5 (scan

write to temp T1)

(File scan)Reserves

Sailors(File scan)

Alternative Plans 2 With Indexes as shown in fig(5): With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on
1000/100 = 10 pages. INL with pipelining (outer is not materialized). - Projecting out unnecessary fields from outer doesnt help. Join column sid is a key for Sailors - At most one matching tuple, unclustered index on sid OK. Decision not to push rating>5 before the join is based on availability of sid index on Sailors. Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os.

Fig(4). A second query evaluation plan



rating>5 (on-the-fly)

sid=sid (Index nested loops, with pipelining)

(use hash index; do not write result to temp)


Sailors(Hash index on sid)

(Hash index on sid)Reserves

Fig(6). A query evaluation plan using Indexes

Highlights of System R Optimizer:


Impact: -Most widely used currently; works well for < 10 joins. Cost estimation: Approximate art at best. -Statistics, maintained in system catalogs, used to estimate cost of operations and result sizes. -Considers combination of CPU and I/O costs. Plan Space: Too large, must be pruned. -Only the space of left-deep plans is considered. -> Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation. -Cartesian products avoided.

Cost Estimation:

For each plan considered, must estimate cost: Must estimate cost of each operation in plan tree. Depends on input cardinalities. Weve already discussed how to estimate the cost of operations (sequential scan, index scan, joins, etc.) Must also estimate size of result for each operation in tree! Use information about the input relations. For selections and joins, assume independence of predicates.


There are several alternative evaluation algorithms for each relational operator. A query is evaluated by converting it to a tree of operators and evaluating the operators in the tree. Must understand query optimization in order to fully understand the performance impact of a given database design (relations, indexes) on a workload (set of queries). Two parts to optimizing a query: Consider a set of alternative plans. - Must prune search space; typically, left-deep plans only. Must estimate cost of each plan that is considered. - Must estimate size of result and cost for each plan node. - Key issues: Statistics, indexes, operator implementations.

Why Sort?
A classic problem in computer science! Data requested in sorted order - e.g., find students in increasing gpa order Sorting is first step in bulk loading B+ tree index. Sorting useful for eliminating duplicate copies in a collection of records (Why?) Sort-merge join algorithm involves sorting. Problem: sort 1Gb of data with 1Mb of RAM. - why not virtual memory?

When does a DBMS sort data?

Sorting a collection of records on some search key is a very useful operation. The key can be a single attribute or an ordered list of attributes. Sorting is required in a variety of situations, including the following important ones: Users may want answers in some order; for example, by increasing age. Sorting records is the first step in bulk loading a tree index. Sorting is useful for eliminating duplicate copies in a collection of records. A widely used algorithm for performing a very important relational algebra operation, called join, requires a sorting step.
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well. When the data to be sorted is too large to fit into available main memory, we need an external sorting algorithm. Such algorithm seek to minimize the cost of disk accesses.


We begin by presenting a simple algorithm to illustrate the idea behind external sorting. This algorithm utilizes only three pages of main memory, and it is presented only for pedagogical purposes. When sorting a file, several sorted subfiles are typically generated in intermediate steps. Here we refer to each subfile as a run. Even if the entire file does not fit into the available main memory, we can sort it by breaking it into smaller subfiles, sorting these subfiles and then merging them using a minimal amount of main memory at any given time. In the first pass, the pages in the file are read in one at a time. After a page is read in, the records on it are sorted and the sorted page is written out. Quicksort or any other in memory sorting technique can be used to sort the records on a page. In subsequent passes, pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long. This algorithm is shown in Fig(1).

Fig(1). Two-Way Merge Sort proc 2-way_extsort (file) // Given a file on disk; sorts it using three buffer pages


//Produce runs that are one page long: Pass 0

Read each page into memory, sort it, Write it out. //Merge pairs of runs to produce longer runs until only

//one run ( containing all records of input file) is left

While the number of runs at end of previous pass is > 1: // Pass i=1,2,. While there are runs to be merged from previous pass: Choose next two runs (from previous pass). Read each run into an input buffer; page at a time. Merge the runs and write to the output buffer; force output buffer to disk one page at a time. endproc


If the number of pages in the input file is 2k, for some k, then:
Pass 0 produces 2k sorted runs of one page each,

Pass 1 produces 2k-1 sorted runs of two pages each,

Pass 2 produces 2k-2 sorted runs of four pages each, And so on, until

Pass K produces one sorted run of 2k pages.

In each pass, we read every page in the file, process it and write it out. Therefore we have two disk I/Os per page, per pass. The number of passes is [log2N]+1, where N is the number of pages in the file. The overall cost is 2N([log2N]+1) I/Os.


The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2).
3,4 3,4 2,3 4,6 2,3 6,2 2,6 9,4 8,7 5,6 3,1 2

Input file Pass 0

4,7 8,9


1,3 5,6


1-page runs Pass 1


2-page runs Pass 2

6,7 8,9

1,2 3,5 6 4,5 1,2 2,3 3,4

4-page runs

Pass 3

7,8 9

8-page runs

The sort takes four passes, and in each pass, we read and write seven pages, for a total of 56 I/Os. This result agrees with the preceding analysis because 2.7([log27]+1)=56. The dark pages in the figure illustrate what would happen on a file of eight pages; the number of passes remains at four([log28]+1=4), but we read and write an additional page in each pass for a total of 64 I/Os. The algorithm requires just three buffer pages in main memory, as shown in fig(3) illustrates. This observation raises an important point: Even if we have more buffer space available, this simple algorithm does not utilize it effectively.
Fig(3) Two-Way Merge Sort with Three buffer pages


Input 1 output Input 2 Disk Main memory buffers Disk


Suppose that B buffer pages are available in memory and that we need to sort large file with N pages. The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes. There are two important modifications to the two-way merge sort algorithm:

1. In pass 0, read in B pages at a time and sort internally to produce [N/B] runs of B pages each (except for the last run, which may contain fewer pages). This modification is illustrated in fig(4), using the input from fig(2) and a buffer pool with four pages.
2. In passes i=1,2, use B-1 buffer pages for input and use the remaining page for output; hence, you do a (B1)-way merge in each pass. The utilization of buffer pages in the merging passes is illustrated in fig(5).


Fig(4). External Merge Sort with B buffer pages: Pass 0

3,4 6,2 9,4 8,7 8,9 3,4 2,3 4,4 6,7 8,9

1st output run

Input file
5,6 3,1 2 8,9 7,8 1,2 3,5 6

2nd output run

Buffer pool with B=4 pages


Fig(5) External Merge Sort with B Buffer Pages: Pass i >0

Input 1 Input 2


Input B-1 B Main memory buffers Disk Disk

The first refinement reduces the


External sorting is important; DBMS may dedicate part of buffer pool for sorting! External merge sort minimizes disk I/O cost: Pass 0: Produces sorted runs of size B (# buffer pages). Later passes: merge runs. # of runs merged at a time depends on B, and block size. Larger block size means less I/O cost per page. Larger block size means smaller # runs merged. In practice, # of runs rarely more than 2 or 3.