Vous êtes sur la page 1sur 8

Database Introduction Chapter 1

Acknowledgements
These slides were written by Richard T. Snodgrass (University of Arizona), Curtis Dyreson (Utah State University) and Christian S. Jensen (Aalborg University). Kristian Torp (Aalborg University) converted the slides from Island Presents to Powerpoint. Sabah Currim added some slides.

CS 5800

Introduction

I-2

Prevalence of Databases
Behind every successful website, there is a powerful database. Examples:
UPS / FedEx tracking Amazons website Wal-Marts inventory system Dells ordering system Googles search engine

Data Management Example


Scenario
You are a Netflix competitor. Customers rent DVD copies of movies. Several copies of each movie.

Needs
Which DVDs has a customer rented? Are any DVDs overdue? When will a DVD become available?

Introduction

I-3

Introduction

I-4

Solution: A File-based System


Edit rented.txt file
Customer: Jane Doe, Rented: Babe, Due: Jan. 19, 2000

Complication: Queries
Does not address needs
Query: Which movies has Joe Jenkins rented? Execute (not quite right): Search for Joe Jenkins. Execute: Search for ^\s+Customer:\s*Joe\s+Jenkins\s*,\s+Rented:. Query: Are any DVDs overdue? Execute: ???

Requirements Advantages
Text editors are easy to use Simple to insert a record Simple to delete a record Robust, sophisticated query language Clear separation between data organization (schema) and data DBMS Concepts Schema DML SQL
I-5 Introduction I-6

Introduction

Complication: Integrity
Lacks data integrity, consistency
Clerk misspells value/field
Customer: Jane Doek, Rented: Eraserhead, Deu: Jan. 19, 2010

Complication: Update
Add/delete/update fields in every record
Record store location.
Customer: Jane Doe, Rented: Babe, Due: Jan. 19, 2011, Store: Paradise

Inputs improper value, same value differently


Customer: Jane Doe, Rented: The Eraserhead, Due: Feb. 29, 2010

Modify customer to first and surname.


First: Jane, Surname: Doe, Rented: Eraserhead, Due: Jan. 19, 2011

Forgets/adds/reorders field
Terms: weekly special Due: Jan. 19, 2010, Rented: Eraserhead

Add/delete/update new information collections


customer.txt file to record information
Customer: Jane Doe, Phone: 557-3344

Requirements
Enforce constraints to permit only valid information to be input. DBMS Concepts Integrity constraints Types

Requirements
Ability to manipulate the way data is organized. DBMS Concepts DDL

Introduction

I-7

Introduction

I-8

Complication: Multiple Users


Two clerks edit rented.txt file at the same time.
1) Ben starts to edit rented.txt, reads it into memory. 2) Sarah starts to edit rented.txt. 3) Ben adds a record. 4) Ben saves rented.txt to disk. 5) Sarah saves rented.txt to disk. Bens added record disappears!

Complication: Crashes
Crash during update may lead to inconsistent state.
Ben makes 250 of 500 edits to change Jane Doe to her preferred name Jan Doe. Before he saves it, Windows crashes!

Requirements
Must update on all or none basis. Implemented by commit or rollback if necessary. DBMS Concepts Transactions Commit Rollback Recovery

Requirements
Must support multiple readers and writers. Updates to data must (appear to) occur in serial order. DBMS Concepts Serializability Concurrency control
Introduction I-9 Introduction

I-10

Complication: Data Physically Separate


Wants
Want to advise Austin Powers fans about new Austin Powers movie.

Complication: Security
Customers want to know how many times a movie has been rented.
Provide access to rented.txt, but not to customer field, how to I do that in an editor?

Method
customer.txt contains addresses of customers. Must merge with rented.txt to create mailing list.

Underage clerks should not see history of R-rated rentals.


Keep two lists of rentals?

Problems
Text editors incapable of such a merge (write a program) Several Joe Jenkins DBMS Concepts No information on some customers!? Joins Keys Foreign keys Requirements Referential integrity Uniquely identify each customer. Make sure we have information on customers that rent DVDs.

Requirements
Ability to control who has access to what information.

DBMS Concepts Security Views


I-11 Introduction I-12

Introduction

Complication: Efficiency
All video store owners in the US West get together.
rented.txt file gets huge (gigabytes of data). Slow to edit. Slow to query for customer information.

Complication: New Needs


All video store owners in USA get together.
What pairs of movies are often rented together?
Calculate probability of movie combinations.

Do we need more copies of the Austin Powers movie anywhere?


Plot rental history of Austin Powers by store area.

Requirements
New data structures to improve query performance. System automatically modifies queries to improve speed. Ability of system to scale to handle huge datasets. DBMS Concepts Indexes Query optimization Database tuning
Introduction I-13 Introduction

Requirements
Collect and analyze summary data. Use computer to mine for interesting trends. Support access to data by sophisticated programs. DBMS Concepts Data warehouses Data mining Database API
I-14

Outline
Database System Overview File-based Approach vs. Database Approach Time Line for Database Technologies Architecture of Database Systems

Basic Definitions (continued)


Database (DB): A collection of related data Database Management System (DBMS): A software package to facilitate the process of
Defining - specifying types, organization (schema) Constructing - loading the data Manipulating - querying the data One DBMS, many DBs, many applications

Database system: A database and a DBMS What do we build in this course: database or DBMS or Database system?

Introduction

I-15

Introduction

I-16

A Database System

RDBMS Price Comparison


Basic DB Server (no management tools, in other words, this is cheapest) Number of CPUs 1 Oracle 9i Enterprise. Ed. $40,000 US $80,000 US $160,000 US $320,000 US $640,000 US $1,280,000 US SQL Server Ent. Ed $19,999 US $39,998 US $79,996 US $159,992 US $319,984 US $639,968 US

application program

DBMS

user

4 8 16 32

database

database

Source: Microsoft, but based on Oracle provided data.

persistent storage
Introduction I-17 Introduction I-18

Outline
Database System Overview File-based approach vs. Database Approach Time Line for Database Technologies Architecture of Database Systems

Basic Definitions
Miniworld: Some part of the real world about which information is stored. Also called the Universe of Discourse (UoD). Data: Known facts about the miniworld
recorded have an implicit meaning

Information: data processed to be useful in decision making Metadata: data that describes the properties or characteristics of other data, e.g. the header of a table

Introduction

I-27

Introduction

I-28

Example
(From Modern Database Management)

File-based Systems - Review


Information rented tape file inventory master file customer file

Data

Metadata customer name, movie name, copy number, due date, movie name, upc, copy number, customer name, phone, address,

Identify data, information and metadata in video store example


Introduction I-29 Introduction I-30

Maximal File-based System


In how many places do a customer name, movie name and copy number appear in the system?
tape rental check-in program new tape ordering program customer mailing program

Minimal File-based System


tape rental check-in program new tape ordering program customer mailing program

rented tape file

inventory master file

customer file

inventory master file movie name, upc, copy number,

rented tape file

customer file

rented tape file

inventory master file persistent storage

customer file

In how many different places does a customer name, movie name and copy number appear in the system?

customer name, movie name, copy number, due date, movie name, upc, copy number, customer name, phone, address,...
Introduction

customer name, movie name, copy number, due date, customer name, phone, address,
I-31

customer name, movie name, copy number, due date, movie name, upc, copy number, customer name, phone, address,...
Introduction I-32

Complications for File-based Approach


What were the seven complications? Queries Integrity Update Multiple users Data physically separate Security Evolving needs

Limitations of File-based Systems


Program must implement
Security Concurrency control Support for schema reorganization

Observation
Many applications need these services.

Solution
Build and sell a software system to provide services! i.e. Database Management Systems

Introduction

I-33

Introduction

I-34

Equivalent Database System


tape rental check-in program new tape ordering program customer mailing program

Characteristics of Database Approach


Self-describing nature
System catalog stores schema

Data abstraction DBMS


In how many different places does a customer name, movie name and copy number appear in the system?
In O-O design method (interface) vs. operation (implementation) In database design, data model vs. implementation

Multiple views
A (virtual) view is
x x

result of a query same data, rearranged differently

inventory

rented

customer

Sharing of data by multiple readers/writers


OLTP concurrent transactions - need concurrency control
I-35 Introduction I-36

persistent storage
Introduction

Pictorial Representation
Users/Programs DATABASE SYSTEM DBMS SOFTWARE Application Programs/Queries

Functions of a DBMS
Provides persistent, shared storage
Objects live beyond program execution Shared by multiple applications
x

Multiple copies of data leads to


inconsistency duplicated effort

Software to Process Queries/Programs

x x

Common backup and recovery Integrity constraints Development and maintenance

DBMS reduces redundancy in


Software to Access Stored Data
x

Provides multiple interfaces


Query language, embedded query language, APIs, GUIs

Protects against
Stored Database Definition (Meta-Data)
Introduction

Stored Database

Software/hardware failure Security breaches


I-37 Introduction I-38

When not to use a DBMS


Main costs of using a DBMS
High initial investment May need additional hardware Overhead Training

Classes of DB Users Workers On the Scene


Persons whose job involves daily use of a large database. Database administrators (DBAs)
Responsible for managing the database system.

Database designers End users


The people that use the database for querying, updating, generating reports Interactive users: Use full DBMS capabilities directly via a DML. Parametric (or naive) end users: They use pre-programmed canned transactions to interact continuously with the database. For example, bank tellers or reservation clerks.

When a DBMS may be unnecessary


Application is simple, well-defined, and not expected to change. Stringent real-time requirements (Write) access to data by multiple users is not needed

Application programmers
Design and implement canned transactions for parametric users.

Introduction

I-39

Introduction

I-40

DBA - Duties
Chooses
Information content of the database Storage structure and access strategy Performance-enhancing data structures

Database People
People who design and develop the DBMS software DBMS designers and implementers Tool developers
Design and implement tools that facilitate the use of DBMS software. Tools include design tools, performance tools, special interfaces, etc.

Acts as liaison with users Security czar


Defines authorization checks and validation procedures

Responsible for
Backups and recovery Monitoring performance Updates (to schema)

Operators and maintenance personnel


Run and maintain the hardware and software environment for the database system.

Introduction

I-41

Introduction

I-42

Interfaces
Menu vs. form-based GUI
Canned interfaces for parametric users DBA Application

Input (Using a Form)

Natural language
Web search engines

Shell

Introduction

I-43

Introduction

I-44

Output (Using a Report)


Loans for Ages 2030
SID 9735 NAME Allen AMOUNT $1,200 $1,000 $2,000 $3,200 $1,900 $1,900 $0 $0 $2,500 $3,000 $5,500

Data Models
A data definition language (DDL) describes database schemas.
Data relationships Data semantics Integrity constraints

Database schemas vs. instances


Similar to types and variables in programming languages

8767 2368 3749

TOTAL Cabeen TOTAL Jones TOTAL Watson TOTAL

A data manipulation language (DML) is used for querying and updating database instances. A data model is a data definition language along with a data manipulation language.
Conceptual Representational Physical

page 1

Introduction

I-45

Introduction

I-46

Schema vs. Instance


Schema - Description of how data is organized and constrained.

Outline
Database System Overview
What is Database System? Components of Database System

Instance - The data in a database (conforms to a schema).


Snapshot - Database state at a particular point in time. Initially empty Database is populated or loaded DBMS ensures every state is a valid state. Schema evolution vs. data update

File-based Approach vs. Database Approach Time Line for Database Technologies Architecture of Database Systems

Introduction

I-47

Introduction

I-48

Time Line for Database Technologies (before 1950)


Papyrus, Parchment Clay tablets 6000 years ago! Paper 1890, Punch cards

Evolution of Database Technologies


(From Modern Database Management)

Introduction

I-49

Introduction

I-50

Types of data models


Record-based conceptual models
(1960s) hierarchical model (e.g., IBM IMS) (1970s) network model (e.g., CA-IDMS) (1980s - current) relational model (e.g., Oracle, Microsoft Access, Microsoft SQLServer, IBM DB2)

ANSI Three-Schema Architecture


Supports DBMS characteristics of:
Program-data independence. Support of multiple views of the data.

Defines DBMS schemas at three levels:


Physical
x x x

Object-based conceptual models


(late 1980s - current) entity-relationship model (late 1980s - current) object-oriented model
x

How data is stored on disk Data storage structures Access paths to the data How we think the data is organized Conceptual structure Integrity constraints What a user sees of the data View is often limited by security
I-52

Logical
x x x

object-relational

Text-based conceptual models


(late 1990s - current) XML

External (view)
x x

Introduction

I-51

Introduction

The Three-Level Architecture


View Level

Schema and Mappings


external level First Name Last Name Salary StaffID Name Birthdate

conceptual level

staff_no fname lname dob branch_no salary

Logical Level Physical Level


Introduction I-53

physical level

Struct STAFF { int staff_no; int branch_no; char fname [15]; char lname [15]; struct date dob; float salary; struct STAFF *next; }; index staff_no; index branch_no;
I-54

Introduction

Data Independence
Each level is independent in the sense that a completely different organization can be used. Physical data independence - Physical level can change without having to change the logical level. Logical data independence - Logical level can change without having to change the external level.

Special Kinds of Databases


Temporal databases
Special handling for time Few commercial temporal databases

Spatial databases
Maps, cadastral applications Many commercial products (GIS)

Text databases
Special text search capabilities Library collections

Statistical databases
Census data OLAP, data warehousing

Federated, heterogeneous, distributed databases


Introduction I-55 Introduction I-56

Vous aimerez peut-être aussi