Vous êtes sur la page 1sur 28

BBIT/SEM Advanced Databases

Clustering
Oracle Server Concepts Manual Database Systems Concepts Silberschatz/ Korth Sec. 10.7 Fundamentals of Database Systems Elmasri/Navathe Sec. 5.10

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Overview
Intra-file Clustering Definition
What types of clustering exist? When is it used? How is it implemented?

Clustering Index

How is it implemented in Oracle?

Clustering in Oracle

How do you decide to cluster data? How does clustering

Inter-file Clustering

How does clustering work?

Advantages & Disadvantages?

Applications compare to B+-Trees?

Criteria for Clustering

Clustering in Pages Advantages


Compare clustered and unclustered?

Comparison Disadvantages

Unclustered Relations

Clustered Relations 2

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Definition

Clustering means that records related to each other are stored physically beside each other.
Frank

Clustering is a method of storing data on a disc. A cluster is used to store tuples from one or more relations physically close to other tuples in the database. The purpose of clustering is to speed up the performance of certain types of queries. When tuples that are physically close to each other are retrieved they are retrieved more quickly than tuples that are not physically close to each other.

Because clustering affects how the data is actually stored on the disc, the decision to use clustering in the database is part of the physical database design process. Clustering does not affect the applications that access the relations which have been clustered. Clustered and unclustered relations appear the same to users of the system.

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Intra-file Clustering
Data items in a single file are stored together.

Supplier 1 Supplier 2 Supplier 3

Supplier n

Suppliers are stored in the order they are most often retrieved
4

In intra-file clustering records in a single file are stored close to related records in the same file. For example, if suppliers are normally ordered by their supplier number then each supplier would be stored to the supplier with the next highest supplier number.

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Inter-file Clustering
Data items in two or more files are stored together.
Supplier 1 Shipment A Shipment B Supplier 2 Shipment C Shipment D Shipment E Supplier 3 Shipment F

Shipment G

Shipments from one file are stored beside suppliers in another file. 5

In inter-file clustering records from one file are stored close to records from another file. For example, a shipment from a shipments file would be stored close to the supplier of the shipment.

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Overview
Intra-file Clustering Definition
What types of clustering exist? When is it used? How is it implemented?

Clustering Index

How is it implemented in Oracle?

Clustering in Oracle

How do you decide to cluster data? How does clustering

Inter-file Clustering

How does clustering work?

Advantages & Disadvantages?

Applications compare to B+-Trees?

Criteria for Clustering

Clustering in Pages Advantages


Compare clustered and unclustered?

Comparison Disadvantages

Unclustered Relations

Clustered Relations 6

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Clustering Data in Pages


Disc
These pages will be quicker to retrieve. The disc must rotate less to read each page.

These pages will be slower to retrieve. The disc must rotate further to read each page.

Data that is stored close together will be quicker to retrieve.


7

Clustering affects the physical position of data on the disc. When two data items are stored on the same page on the disc, they can be read with one page read operation. Because the computer reads one page at a time, data items stored on the same page will be read at the same time. When two data items are stored on pages that are close to each other on the disc, they can be read with two page read operations. Because the pages occur one after another there is no disc head movement between reads (no seek time). When two data items are stored in separate locations on the disc, they can be read with two page read operations and a seek operation. Because the pages occur at separate locations on the disc the disc head must move to a new position on the disc to read the second page.

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Unclustered Relations

Adapted from Oracle7 Concepts Server Manual

Unclustered relations are stored in their own pages on the disc. That is, each page will contain tuples from one relation only. The pages may be positioned anywhere on the disc. Therefore, to join two relations at least two pages must be read from the disc - one page for each relation. For example, in the above example, the emp relation (table) is stored at one location on the disc and the dept relation (table) is stored at another location.

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Clustered Relations

Adapted from Oracle7 Concepts Server Manual

Clustered relations are stored using a cluster key. Each relation belonging to the cluster has an attribute corresponding to the cluster key. Each block will store tuples with a particular cluster key value. For example, in the above example, the cluster key is deptno and all the departments and employees with deptno=10 are stored together. This type of cluster will improve the performance of queries that join the emp and the dept relations. Note that the cluster key value is only stored once for each distinct value. For example, the value deptno=10 is only stored once and all tuples with deptno=10 are stored together.

Stephen Mc Kearney, 2001.

BBIT/SEM Advanced Databases

Overview
Intra-file Clustering Definition
What types of clustering exist? When is it used? How is it implemented?

Clustering Index

How is it implemented in Oracle?

Clustering in Oracle

How do you decide to cluster data? How does clustering

Inter-file Clustering

How does clustering work?

Advantages & Disadvantages?

Applications compare to B+-Trees?

Criteria for Clustering

Clustering in Pages Advantages


Compare clustered and unclustered?

Comparison Disadvantages

Unclustered Relations

Clustered Relations 10

Stephen Mc Kearney, 2001.

10

BBIT/SEM Advanced Databases

Advantages
Advantages
Speeds up some queries Uses less space
Supplier 1 Shipment A

These shipments are for supplier 1.

Shipment B Supplier 2 Shipment C Shipment D Shipment E Supplier 3 Shipment F

A query for all shipments of supplier 1 will be quick because all the shipments for supplier 1 follow immediately after supplier 1.

Shipment G

11

Clustering will speed up some database queries. For example, a cluster consisting of suppliers and shipments will speed up queries that request all the shipments for a particular supplier. The cluster improves the supplier/shipment query because the data for each shipment is stored on the same page as the corresponding supplier. Hence, when the supplier record is read the set of shipments is also read. The cluster key value that is used to cluster relations is only stored once in each page. This may save disc space.

Stephen Mc Kearney, 2001.

11

BBIT/SEM Advanced Databases

Disadvantages
Disadvantages
Slows down some queries Slows down writes
Supplier 1 Shipment A

To read all the shipment records the supplier records must also be read.

Shipment B Supplier 2 Shipment C Shipment D Shipment E Supplier 3 Shipment F

A query for all shipments will be slow because the shipments are not stored together on the disc.

Shipment G

12

Clustering will slow down certain types of queries. For example, the cluster on suppliers and shipments will slow down queries that ask for all shipments. The cluster slows down the all shipments query because the shipments are stored with each supplier. To read all the shipments the DBMS must also read the supplier data. Inserting new records into a cluster may also be slow. For example, adding a new shipment for supplier 1 will involve making space after shipment B.

Stephen Mc Kearney, 2001.

12

BBIT/SEM Advanced Databases

Overview
Intra-file Clustering Definition
What types of clustering exist? When is it used? How is it implemented?

Clustering Index

How is it implemented in Oracle?

Clustering in Oracle

How do you decide to cluster data? How does clustering

Inter-file Clustering

How does clustering work?

Advantages & Disadvantages?

Applications compare to B+-Trees?

Criteria for Clustering

Clustering in Pages Advantages


Compare clustered and unclustered?

Comparison Disadvantages

Unclustered Relations

Clustered Relations 13

Stephen Mc Kearney, 2001.

13

BBIT/SEM Advanced Databases

Applications 1 - Hierarchies
ER Diagram
Customer Order Order Line

Cluster
Customer 1 Order 1 Order Line 1

ER Instance
Customer 1

Order Line 2 Order 2 Order Line 1 Order Line 2 Order 3

Order 1

Order 2

Order 3

Order Line 1

Customer 2

Order Line 1

Order Line 2

Order Line 1

Order Line 2

Order Line 1

Order Line 2

A hierarchy of customer to orders to order lines.


14

Clustering is used when the data has a hierarchical structure. For instance, in the example above, the cluster would be used when the most common queries will retrieve all the orders and order lines for a customer. A cluster to store the above structure would cluster all the order lines with their corresponding orders and then the orders and order lines would be stored with their corresponding customer.

Stephen Mc Kearney, 2001.

14

BBIT/SEM Advanced Databases

Applications 2 - Lists

List of Products
Product 1 Product 2 Product 3

Cluster
Product 1 Product 2 Product 3

15

A cluster may be used when queries will retrieve lists of data items. For example, in the above example, the cluster of products will improve queries requesting all the products.

Stephen Mc Kearney, 2001.

15

BBIT/SEM Advanced Databases

Applications 3 - SQL Joins


Equi-joins
SELECT name, address, deptname FROM emp, dept WHERE emp.deptno = dept.deptno

The emp and dept relations may be clustered on the deptno attribute.

16

A cluster may be used to cluster relations that are frequently joined together. In the above example, the relations emp and dept may be clustered on the deptno attribute. The value of each deptno will be stored once together with all the corresponding emp and dept tuples.

Stephen Mc Kearney, 2001.

16

BBIT/SEM Advanced Databases

Overview
Intra-file Clustering Definition
What types of clustering exist? When is it used? How is it implemented?

Clustering Index

How is it implemented in Oracle?

Clustering in Oracle

How do you decide to cluster data? How does clustering

Inter-file Clustering

How does clustering work?

Advantages & Disadvantages?

Applications compare to B+-Trees?

Criteria for Clustering

Clustering in Pages Advantages


Compare clustered and unclustered?

Comparison Disadvantages

Unclustered Relations

Clustered Relations 17

Stephen Mc Kearney, 2001.

17

BBIT/SEM Advanced Databases

Clustering Index
Deptno 10 Records Dept Employee Employee Employee Index on Deptno 10 20 30 Employee Employee Employee 20 Dept

Page P1

All records with deptno=10 Page P2

All records with deptno=20 Page P3

30

Dept Employee Employee Employee

All records with deptno=30

18

The DBMS uses a clustering index when it implements a cluster. The clustering index is used to index the cluster key. This allows the DBMS to efficiently access the data in the cluster. The cluster index contains an entry for each cluster key value. The index may be a B+-Tree
Ref: Elmasri, sec 6.1.2

Stephen Mc Kearney, 2001.

18

BBIT/SEM Advanced Databases

Clustering in Oracle
Create a cluster
CREATE CLUSTER emp_dept (deptno NUMBER(3));

Create a cluster index


CREATE INDEX emp_dept_index ON CLUSTER emp_dept;

Create Tables
CREATE TABLE dept (deptno NUMBER(3), ) CLUSTER emp_dept (deptno) PRIMARY KEY (deptno); CREATE TABLE emp (empno NUMBER(5), deptno NUMBER(3), ) CLUSTER emp_dept (deptno) FOREIGN KEY (deptno) REFERENCES dept;
19

There are three steps required to create a cluster in Oracle: 1. Create the cluster The space for the cluster is allocated on the disc. 2. Create the cluster index Oracle requires a cluster index to be able to access the cluster. Therefore, the cluster index must exist before data can be added to the cluster. 3. Create the tables When the tables are created a parameter is added to the CREATE TABLE command indicating the cluster to which the table will belong. Once the cluster has been created the normal data manipulation commands (INSERT, DELETE, UPDATE, SELECT) may be used. Therefore, using a cluster to improve the performance of a database does not affect the application programs that access the data.

Stephen Mc Kearney, 2001.

19

BBIT/SEM Advanced Databases

Overview
Intra-file Clustering Definition
What types of clustering exist? When is it used? How is it implemented?

Clustering Index

How is it implemented in Oracle?

Clustering in Oracle

How do you decide to cluster data? How does clustering

Inter-file Clustering

How does clustering work?

Advantages & Disadvantages?

Applications compare to B+-Trees?

Criteria for Clustering

Clustering in Pages Advantages


Compare clustered and unclustered?

Comparison Disadvantages

Unclustered Relations

Clustered Relations 20

Stephen Mc Kearney, 2001.

20

BBIT/SEM Advanced Databases

Criteria for Clustering


Query Requirements
Joins Lists Hierarchies

Space Requirements
Clustering may save space

Update Requirements
Clustering may slow updates

21

Deciding to cluster a set of relations depends on three factors: Query requirements Clustering improves joins between relations because it stores related tuples together in the same page. When the most common queries involve joining two relations, a cluster may improve performance. Space requirements Because each cluster key value is only stored once, storing relations in a cluster can use less storage space than storing the same relations separately. If storage space is restricted clustering the data may save space. Update requirements Cluster are difficult to update because space must be left to allow for additional clustered tuples. If space is not available, it may be necessary to move tuples between pages.

Stephen Mc Kearney, 2001.

21

BBIT/SEM Advanced Databases

Comparison with Other Techniques


B+-Tree
Fast access to individual tuples Does not affect the order of data Can be ignored if not useful Easy to create and delete

Cluster
Fast access across relations Changes the order of the data Must be searched to access data Difficult to create and delete

22

A B+-Tree is designed to provide fast access to individual tuples in a relation. A cluster is designed to improve the performance of queries that join two or more relations together. A B+-Tree does not affect the order of the actual data. Although the index may be ordered, the actual data remains unordered. A cluster orders the actual data. A B+-Tree does not have to be used to answer a query. It is possible to access the data directly if using the B+-Tree is too inefficient. As a cluster affects the physical ordering of the data, the cluster must be accessed to retrieve the data. Hence, a cluster will slow down certain queries. A B+-Tree index is easy to create and delete because it is separate from the data. A cluster is difficult to create or change because it must be created before the data is added to the database. Deleting a cluster will destroy the data.

Stephen Mc Kearney, 2001.

22

BBIT/SEM Advanced Databases

Partitioned Table
CREATE TABLE sales ( acct_no NUMBER(5), acct_name CHAR(30), amount_of_sale NUMBER(6), week_no INTEGER ) PARTITION BY RANGE ( week_no ). (PARTITION sales1 VALUES LESS THAN ( 4 ) TABLESPACE ts0, PARTITION sales2 VALUES LESS THAN ( 8 ) TABLESPACE ts1, ... PARTITION sales13 VALUES LESS THAN ( 52 ) TABLESPACE ts12 );

Oracle Concepts Manual

23

Stephen Mc Kearney, 2001.

23

BBIT/SEM Advanced Databases

Partitioned Index 1

Oracle Concepts Manual

24

Stephen Mc Kearney, 2001.

24

BBIT/SEM Advanced Databases

Partitioned Index 2

Oracle Concepts Manual

25

Stephen Mc Kearney, 2001.

25

BBIT/SEM Advanced Databases

Partitioned Index 3

Oracle Concepts Manual

26

Stephen Mc Kearney, 2001.

26

BBIT/SEM Advanced Databases

Equipartitioned Tables

Oracle Concepts Manual

Better availability and reliability

27

Stephen Mc Kearney, 2001.

27

BBIT/SEM Advanced Databases

Disc Striping

Oracle Concepts Manual

28

Stephen Mc Kearney, 2001.

28

Vous aimerez peut-être aussi