Vous êtes sur la page 1sur 50

Algorithms and Data Structures

Databases 2010

Michael I. Schwartzbach Computer Science, University of Aarhus

Implementing Queries
Implementation of queries requires:
data structures algorithms

Only a few basic operations to consider:


sorting selection () projection () join ( ) remove duplicates
2

Algorithms and Data Structures

Data Storage
Relations are stored as bits. Functionality requirements
Sequentially process rows Search for rows that meet some condition Insert and delete rows

Performance objectives
Achieve a high packing density (little wasted space) Achieve fast response time

Algorithms and Data Structures

File Organization Overview


File
A database is stored as a collection of files. Storage (usually a disk)
Random access (requires disk arm move) Non-volatile

Records
A file is a set of records, generally of the same type. A record is a sequence of fields. Record size
Fixed - same number of fields of same size Variable - may have a different number of fields or a field may vary in size
Algorithms and Data Structures

Disk Concepts Files


Modern file systems
Blocks not necessarily contiguous
report.doc book.htm report.doc report.doc unused block Allocated in blocks block 1 1 block 2 block 3

Contiguous block I/O faster (avoids disk seek)


8 msec to seek, read .4 msec for contiguous block read

Can reorganize to make block chains contiguous


Algorithms and Data Structures

Disk Concepts - Blocks


Input/Output entire blocks Block size is O/S dependent (e.g., 4-8KB) Block size is usually much bigger than record size
Many records in each block Fill percentage percentage of filled space per block
compact

room to grow

50% filled
Algorithms and Data Structures

100% filled
6

Storing a Table
A table is typically stored on disk Several rows fit into one disk block Disks are slow:
accessing a random block is 8ms accessing the next consecutive block is 0.4ms RAM access time 8-10 ns (L1+L2 cache even faster) one disk access = 1,000,000 RAM accesses! Justifies count only I/Os model of complexity

Part of the table may be in RAM (buffer pool) The table can be stored in sorted order
Algorithms and Data Structures

Heap File
Records are stored in blocks, no particular order Example: Assume records are names
Leonardo Racquel Amanda Omar Tom Mufasa Arnold Jamie Gerard

blocks

Assume n records pr table and R records pr block


Search for 1 record: n/2R accesses Insertion: 2 Deletion (k records): n/R Modification (k records): n/R

O(n) - slow O(1) - fast O(n) - slow O(n) - slow


8

Algorithms and Data Structures

Sequential File
Suitable for applications that need sequential access. The records in the file are ordered by a search-

key.

Example: Each record is a name.


Amanda Arnold Gerard Jamie Leonardo Mufasa Omar Racquel Tom

blocks
Algorithms and Data Structures

Sequential File
Search for key value, O(log2(n)) cost Deletion: 1 disk I/O (after search) Insertion: locate where record is to be inserted
If there is free space, insert If no free space, insert the record in an overflow block or shift records to previous or next block

Reorganize
Restore block contiguity, fill percentage Remove empty blocks O(n) cost
Algorithms and Data Structures

10

Sorting a Table
In RAM, many sorting algorithms are available
typical time complexity is O(n log2(n))

Those can also be performed on disks


but they often perform poorly only I/O accesses need to be counted

Specialized versions of algorithms are needed

Algorithms and Data Structures

11

An Example Scenario
A 1 GB table with:
10,000,000 rows each row is 100 bytes

A disk with 4K blocks:


each holding 40 rows 250,000 blocks for the entire table

50MB RAM:
1/20 of the table

Algorithms and Data Structures

12

Recursive Merge Sort


The merge sort algorithm:
split the list into two sublists recursively sort the sublists merge the sorted sublists to get the sorted list time complexity is O(n log2(n))

Each row is read and written log2(107) = 23 times Time consumed:


23 2 10,000,000 8ms = 43 days

Algorithms and Data Structures

13

Two-Phase Multiway Merge Sort


Load RAM with 12,500 blocks = 500,000 rows
sort those (for free) using any RAM sorting algorithm

Do this 20 times to obtain 20 sorted sublists


store the 20 sorted sublists on 20 disks

Merge the sublists using a RAM buffer for each


only consecutive reads of blocks only consecutive writes of blocks

Each row is read and written 2 times Time consumed:


2 2 20 12,500 0.4ms = 6.7 minutes
Algorithms and Data Structures

14

Lessons Learned
Naive algorithms wont work The reality of storage must be considered:
use entire block contents read blocks consecutively buffer information in RAM

Algorithms and Data Structures

15

Selection
SELECT * FROM R WHERE condition; Full table scan:
read all rows in the table report those that satisfy the condition

Fine if many rows will actually be selected:


rule of thumb is 5-10%

Algorithms and Data Structures

16

Range Query
SELECT * FROM Meetings WHERE date >= 2008-08-25 AND date < 2008-12-24; Optimization if Meetings is sorted on date:
find first row with the start date report all rows until the end date (consecutive blocks)

Algorithms and Data Structures

17

Point Query
SELECT * FROM People WHERE userid = amoeller; We know that userid is a key Optimization if People is sorted on userid:
full table scan can stop sooner

Binary search not necessarily better:


random disk access vs. sequential access

So, what can help us?


Algorithms and Data Structures

18

Indexes
A table can be equipped with an index:
a data structure that helps you find rows quickly rows are identified by a subset of the attributes

A table may have several indexes:


whereas it can only be sorted on one criterion

Pros and cons of indexes:


make (certain) queries faster make all modifications slower
Algorithms and Data Structures

19

Indexes in SQL
CREATE INDEX DateIndex ON Meetings(date); CREATE INDEX ExamIndex ON Exams(vip,date,time); An index on several attributes also gives an index for any prefix of those attributes Think of this as a virtual sorting of the table Each primary key has by default an index
Algorithms and Data Structures

20

Using Indexes
CREATE INDEX Idx ON R(a1,a2,...,an); Some queries are now easy:
a range query or point query on a1 a point query on a1 combined with a range query on a2 a range query on a1, a2, and a3

Others are not really easier:


a range query on a17

In case of large modifications of the table:


DROP INDEX Idx; rebuild the index afterwards
Algorithms and Data Structures

21

Indexed File
Suitable for applications that require random

access

Usually combined with sequential file A single-level index is an auxiliary file of entries <search-key, pointer to record> ordered on the search-key. Index is separate from data file
Usually smaller
10-20% rule of thumb, take with a grain of salt!

Can have multiple indexes on same relation


Algorithms and Data Structures

22

Searching a Single-Level Index


Sequential search
Faster than linear search of main file.
Index is smaller than the main file

Worst-case search cost is still O(n).

Binary search
Key space:

Search cost is O(log2(n)) time (n = size of the index).


Algorithms and Data Structures

23

B-Trees
A data structure for indexes on table
a variation of search trees trades some extra space to gain better performance

Supports the necessary operations:


insert a new row delete an exisiting row search for a row given the index attributes

Perfect for disk storage


high fanout very robust to data changes, data volumes, etc. used by ALL RDBMSes
Algorithms and Data Structures

24

B-Tree Example
100

120 150 180 120 130 150 156 179

30

Each node is stored in one disk block Each row is pointed to by a leaf node
Algorithms and Data Structures

100 101 110

180 200

3 5 11

30 35

25

B-Tree Internal Node

to keys < 57

to keys 57 k< 81

to keys 81 k< 95

95
to keys 95

57
Algorithms and Data Structures

81

26

B-Tree Leaf Node

to next leaf node

to record with key 57

to record with key 81

95
to record with key 95

57
Algorithms and Data Structures

81

27

B-Tree Invariants
Assume each node (block) holds at most k keys
typically k is several hundreds

Each node must hold at least (k+1)/2 pointers


except for the root: may have down to 2 pointers

All leaves must be at the same level This ensures that the tree remains balanced:
its height with n rows is at most 1+logk/2(n) in practice the height is 3 or 4 (1-2 top levels in RAM)
Algorithms and Data Structures

28

B-Tree Point Query


100

120 150 180 120 130 150 156 179

30

Search path for key 101 Time proportional to the height of the tree
Algorithms and Data Structures

100 101 110

180 200

3 5 11

30 35

29

B-Tree Range Query


100

120 150 180 120 130 150 156 179

30

Subtree for keys between 101 and 166 Time proportional to height + size of range
Algorithms and Data Structures

100 101 110

180 200

3 5 11

30 35

30

B-Tree Insertion (1/4)


100

120 150 180 120 130 150 156 179

30

Inserting 33 (simple case)


Algorithms and Data Structures

100 101 110

180 200

3 5 11

30 35

31

B-Tree Insertion (1/4)


100

120 150 180 120 130 150 156 179

30

Inserting 33 (simple case)


Algorithms and Data Structures

100 101 110

180 200

3 5 11

30 33 35

32

B-Tree Insertion (2/4)


100

120 150 180 120 130 150 156 179

30

Inserting 7 (split leaf)


Algorithms and Data Structures

100 101 110

180 200

3 5 11

30 33 35

33

B-Tree Insertion (2/4)


100

120 150 180 120 130 150 156 179

7 30

Inserting 7 (split leaf)


Algorithms and Data Structures

100 101 110

180 200

7 11

30 33 35

3 5

34

B-Tree Insertion (3/4)


100

120 150 180 120 130 150 156 179

7 30

Inserting 160 (split internal node)


Algorithms and Data Structures

100 101 110

180 200

7 11

30 33 35

3 5

35

B-Tree Insertion (3/4)


100 160 120 150

150 156

160 179

120 130

180 180 200 36

7 30

Inserting 160 (split internal node)


Algorithms and Data Structures

100 101 110

7 11

30 33 35

3 5

B-Tree Insertion (4/4)

10 12

20 25

10 20 30

Insert 45 (split the root)


Algorithms and Data Structures

30 32 40

1 2 3

37

B-Tree Insertion (4/4)

10 20

30

10 12

20 25

30 32

40 40 45

Insert 45 (split the root) The height increases


Algorithms and Data Structures

1 2 3

38

B-Tree Deletion
Balanced deletion is also possible
there are similar case-based algorithms

Generally, deleted rows are left as tombstones


the overhead of deletion is too large

Most tables tend to grow with time


the tombstones quickly get reused

Otherwise, periodically rebuild the index


or perform online reorg of the index
Algorithms and Data Structures

39

Cluster Index
Generally, indexed rows are scattered in the table A clustered index has consecutive rows:

Equivalent to sorting the table


Algorithms and Data Structures

40

Clustering Index
At most one index can be the clustering index
but other indexes may happen to be clustered too attributes may be correlated

CREATE INDEX ExamIndex ON Exams CLUSTER(vip,date,time); A cluster index on a primary key is a bad idea
range queries are not often meaningful keys should not not carry information themselves
Algorithms and Data Structures

41

Index Queries
If the query only uses index attributes:
virtually constant time evaluation

SELECT date FROM Exams WHERE vip = amoeller; SELECT vip, COUNT(date) AS Dates FROM Exams GROUP BY vip;
Algorithms and Data Structures

42

Boolean Index Selection


SELECT * FROM R WHERE x=42 AND y>87; We have one index for x and another for y
use index scan to find row pointers for x=42 use index scan to find row pointers for y>87 compute the intersection of those pointer set

Similarly, OR corresponds to disjunction

Algorithms and Data Structures

43

Projection and Duplicates


Projection on a superkey:
no duplicates full table scan

Removing duplicates:
any index structure on the remaining attributes help otherwise, use a variation of multiway merge sort

Algorithms and Data Structures

44

Join
Many different join algorithms, in particular:
nested loop join often good for small joins merge scan join often good for large joins

Which to use depends on many factors:


sizes of the input tables expected size of the result table existence of indexes degree of clustering

Query plan selected based on cost estimates


Algorithms and Data Structures

45

Join Query Structure


SELECT a1, a2, ...,an FROM R,S WHERE localpred(R) AND localpred(S) AND joinpred(R,S); localpred(R) is local to R localpred(S) is local to S joinpred(R,S) uses both R and S attributes
Algorithms and Data Structures

46

Nested Loop Join


scan the outer R table for each row satisfying localpred(R) search the inner table S select rows satisfying localpred(S) and joinpred(R,S) if row(s) exist concatenate the rows from R and S else discard row for inner join pad with NULLs for left outer join

Algorithms and Data Structures

47

Nested Loop Join in Practice


Read R and S consecutively using blocks Store as much of R in RAM as possible
concatenate S rows with many R rows at once

If possible, use indexes for the local predicates Assume k rows in a block, m rows in RAM Time complexity is: O(|R|/k + |R||S|/k2m)
Algorithms and Data Structures

48

Merge Scan Join


sort R and S on attributes in joinpred(R,S) merge the two tables look at the rows with smallest value if it satisifies all predicates with the other row combine all rows with these values else read next row in that table if one table runs out of rows discard row for inner join pad with NULLs for full outer join

Algorithms and Data Structures

49

Merge Scan Join in Practice


Read R and S consecutively using blocks Store as much of R and S in RAM as possible
when combining rows in the two tables

If possible, use indexes for local predicates Assume k rows in a block, m rows in RAM Time complexity is: O(|R|/k + |S|/k +|R S|/k + |R S|/m)
Algorithms and Data Structures

50