Vous êtes sur la page 1sur 23

CPS 216: Advanced Database Systems

Shivnath Babu

Outline for Today


What this class is about: Data management What we will cover in this class Logistics

What does a Database System mean to you?


(Hint: What are they used for? Give examples)

Data Management
User/Application

Query

Query

Query

Data

DataBase Management System (DBMS)

Example: At a Company
Query 1: Is there an employee named Nemo? Query 2: What is Nemos salary? Query 3: How many departments are there in the company? Query 4: What is the name of Nemos department? Query 5: How many employees are there in the Accounts department? Employee
ID
10 20 40 52

Department
DeptID
12 156 89 34

Name
Nemo Dory Gill Ray

Salary
120K 79K 76K 85K

ID
12 34 89 156

Name
IT Accounts HR Marketing

DataBase Management System (DBMS)


High-level Query Q

Answer Translates Q into best execution plan for current conditions, runs plan

DBMS

Data

Example: Store that Sells Cars


Make Model OwnerID ID Name Owners of Honda Accords Honda Accord 12 12 Nemo who are <= Honda Accord 156 156 Dory 23 years old Join (Cars.OwnerID = Owners.ID) Filter (Make = Honda and Model = Accord) Age 22 21

Filter (Age <= 23)

Cars
Make Honda Model Accord OwnerID 12

Owners
ID 12 Name Nemo Age 22

Toyota
Mini Honda

Camry
Cooper Accord

34
89 156

34
89 156

Ray
Gill Dory

42
36 21

DataBase Management System (DBMS)


High-level Query Q

Answer Translates Q into best execution plan for current conditions, runs plan

DBMS
Keeps data safe and correct despite failures, concurrent updates, online processing, etc.

Data

DBMS is multi-user
Example
Get account balance from database; If balance > amount of withdrawal then balance = balance - amount of withdrawal; dispense cash; store new balance into database;

Homer at ATM1 withdraws $100 Marge at ATM2 withdraws $50 Initial balance = $400, final balance = ? Should be $250 no matter who goes first

Final balance = $250


Homer withdraws $100:
read balance; $400 if balance > amount then balance = balance - amount; $300 write balance; $300

Marge withdraws $50:


read balance; $300 if balance > amount then balance = balance - amount; $250 write balance; $250

Final balance = $300


Homer withdraws $100:
read balance; $400 read balance; $400 If balance > amount then balance = balance - amount; $350 write balance; $350

Marge withdraws $50:

if balance > amount then balance = balance - amount; $300 write balance; $300

Final balance = $350


Homer withdraws $100:
read balance; $400 read balance; $400 if balance > amount then balance = balance - amount; $300 write balance; $300

Marge withdraws $50:

if balance > amount then balance = balance - amount; $350 write balance; $350

Concurrency control in DBMS


Similar to concurrent programming problems But data is not all in main-memory Appears similar to file system concurrent access? Approach taken by MySQL initially; now MySQL offers better alternatives But want to control at much finer granularity Or else one withdrawal would lock up all accounts!

Recovery in DBMS
Example: balance transfer decrement the balance of account X by $100; increment the balance of account Y by $100;

Scenario 1: Power goes out after the first instruction Scenario 2: DBMS buffers and updates data in memory (for efficiency); before they are written back to disk, power goes out Log updates; undo/redo during recovery

DataBase Management System (DBMS)


High-level Query Q

Answer Translates Q into best execution plan for current conditions, runs plan

DBMS
Keeps data safe and correct despite failures, concurrent updates, online processing, etc.

Data

Summary of modern DBMS features


Persistent storage of data Logical data model; declarative queries and updates ! physical data independence Multi-user concurrent access Safety from system failures Performance, performance, performance Massive amounts of data (terabytes ~ petabytes) High throughput (thousands ~ millions transactions per minute) High availability ( 99.999% uptime)

Modern DBMS Architecture


Applications SQL DBMS
Parser Logical query plan Query Optimizer Physical query plan Query Executor Access method API calls Storage Manager

Storage system API calls File system API calls OS Disk(s)

Course Outline
40% of the class is about core DBMS concepts
Query execution, query optimization, transactions, recovery, etc. Textbook material

60% of the class is on what is happening today in data management


New developments on textbook material Data streams Web search Google, Yahoo! Data integration (structured data + unstructured data) Data mining Unsolved challenges

Using a Traditional DBMS


User/Application Query Query Result Result

Loader

Table R

Table S

New Approach for Data Streams


User/Application Register Continuous Query (Standing Query)

Result

Input streams

Stream Query Processor

Example Continuous (Standing) Queries


Web Amazons best sellers over last hour Network Intrusion Detection Track HTTP packets with destination address matching a prefix in given table and content matching *\.ida Finance Monitor NASDAQ stocks between $20 and $200 that have moved down more than 2% in the last 20 minutes

New Challenges in DBMSs


High-level Query Q

Answer

DBMS

TeraBytes PetaBytes

Data

<CD> <TITLE>Empire B.</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia </COMPANY> <PRICE>10.90</PRICE> </CD>

Course Logistics
Reference: Database Systems: The Complete Book, by H. Garcia-Molina, J. D. Ullman, and J. Widom Web site: http://www.cs.duke.edu/courses/fall07/cps216

Grading:
Project 30% Homework Assignments 20% Midterm 20% Final 30%

Summary: Data Management is Important


Core aspect of most sciences and engineering today Core need in industry Cool mix of theory and systems Chances are you will find something interesting even if you primary interest is elsewhere

Vous aimerez peut-être aussi