Vous êtes sur la page 1sur 62

RDBMS Concepts

BY KINGSHUK SRIVASTAVA

Agenda
Introduction

Database Management Systems


Normalization Codds Rules

SQL

Introduction

Why Study Databases??

Shift from computation to information always true for corporate computing Web made this point for personal computing more and more true for scientific computing Need for DBMS has exploded in the last years Corporate: retail swipe/clickstreams, customer relationship mgmt, supply chain mgmt, data warehouses, etc. Scientific: digital libraries, Human Genome project, NASA Mission to Planet Earth, physical sensors, grid physics network DBMS encompasses much of CS in a practical discipline OS, languages, theory, AI, multimedia, logic Yet traditional focus on real-world apps

Whats the intellectual content?


representing information

modeling languages and systems for querying data complex queries with real semantics* over massive data sets concurrency control for data manipulation controlling concurrent access ensuring transactional semantics reliable data storage maintain data semantics even if you pull the plug
* semantics: the meaning or relationship of meanings of a sign or set of signs

data

Data? Information?

Initial Data Storage Methods


Data is represented in one or more flat files

Flat file is nothing but electronic representation of

cardboard file. Every business group has its own set of files

Disadvantages of Flat File Systems


No centralized control.

Data Redundancy
Data Inconsistency Data can not be shared

Standards can not be enforced


Security issues Integrity can not be maintained Data dependence

Database Management Systems

Database Management Systems


A system whose overall purpose is to record and

maintain information
A database is a repository for stored data and

programs to manipulate it.

Advantages of DBMS
Centralized control.

No Data Redundancy
Data Consistency Data can be shared

Standards can be enforced


Security can be enforced Integrity can be maintained Data independence

Data Models
A data model is a collection of concepts for

describing data A Schema is a description of a particular collection of data using the given data model The relational model is the most widely used model today

Levels of Abstraction

Many Views and single Conceptual and Physical Schema Views Describe how users see the data Conceptual Schema defines the logical structure Physical Schema defines the physical files and Indexes

Example University Database


Conceptual Schema Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Physical Schema Relations stored as unordered files. Index on first column of Students. External Schema (View): Course_info(cid:string,enrollment:integer)

Data Independence
Applications insulated from how data is structured

and stored

Logical Data Independence: Protection from changes

in logical structure of data

Physical Data Independence: Protection from

changes in physical structure of data

Structure of a DBMS

ACID Test
Atomicity

Consistency
Isolation Durability

Transactions: ACID Properties


Key concept is a transaction: a sequence of database actions

(reads/writes). DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. Each transaction, executed completely, must take the DB between consistent states or must not run at all. DBMS ensures that concurrent transactions appear to run in isolation. DBMS ensures durability of committed Xacts even if system crashes.

Note: can specify simple integrity constraints on the data. The DBMS

enforces these. Beyond this, the DBMS does not understand the semantics of the data. Ensuring that a single transaction (run alone) preserves consistency is largely the users responsibility!

Types of DBMS

Hierarchical
Network Relational

Hierarchical DBMS (Contd.)


Can not handle Many-Many relations Can not reflect all real life situations

Anomalies in insert, delete and update operations.

Network DBMS
Data is represented by records and pointers

Addresses Many-Many relations


Insert,delete,update operations possible Complex in design

Relational DBMS
Based on Relational Mathematics principles

Data is represented in terms of rows and columns of

a table Addresses all types of relations Easy to design No anomalies for insert/delete/update

Relational Terminology
Tuple (Row) Attribute (Column) Relation (Table) Integrity Constraints
Primary Key Alternate Key Foreign Key

Normalization

Normalization
Normalization - process of removing data

redundancy by decomposing relations in a Database.


De normalization - carefully introduced redundancy

to improve query performance.

Normalization through decomposition


The decomposition approach starts with one relation

and the relation is decomposed into more number of relations to remove insert, delete and update anomalies.

1NF, 2NF, 3NF and BCNF can be achieved by this

approach.

Un normalized Form
A relation is said to be in Un normalized Form (0NF) if the values of any of its attributes are non-atomic. In other words more than one value is associated with each instance of the attribute.

Un normalized Relation

S#
S1

PQ
P# P1 P2 P3 P4 P1 P2 P2 QTY 300 200 400 200 300 400 200

S2

S3

First Normal Form


A Relation is said to be in First Normal Form (1 NF) if the values of each attribute of the relation are atomic. In other words, only one value is associated with each attribute and the value is not a set or a list of values.

Functional Dependency
Given a relation R, attribute Y of R is functionally dependent on attribute X if and only if each X-value in R has associated with it precisely one Y-value in R (at any one time)

Full Functional Dependency


Attribute Y is fully functionally dependent on attribute X if it is functionally dependent on X And Functionally dependent on any proper subset of X

Second Normal Form


A relation R is in Second Normal Form (2 NF) if it is in the 1NF and every non key attribute is full functionally dependent on the primary key.

Third Normal Form


A relation R is in Third Normal Form (3 NF) if and only if it is in the 2NF and every non-key attribute is non-transitively dependent on the primary key.

Boyce Codd Normal Form


A relation R is in Boyce/Codd Normal Form (BCNF) if and only if every determinant is a candidate key. An attribute, possibly composite, on which some other attribute is fully functionally dependent is a determinant.

Student Smith Smith Jones Jones

Subject Maths Physics Maths Physics

Teacher Prof. White Prof. Green Prof. White Prof. Brown

A subject can be taught to a student by only one

teacher. Each teacher teaches only one subject. Each subject is taught by several teachers.

Student Smith Smith Jones Jones

Subject Maths Physics Maths Physics

Position 1 2 2 1

No two students can get same position in same

subject.

Codds Rules

Codds Rules

1985 Proposed to test DBMSs for confirmation to concept of Codds Relational model Hardly any commercial product follows all

Oracle = 81 out of 12. 2

Rule Zero

For a system to qualify as an RDBMS it must be able to manage its databases entirely through its Relational capabilities The other 12 rules derive from this rule

Rule 1: Information Rule

All Information (inlcuding metadata) is to be represented as data stored in cells of tables.

The rows and columns have to be strictly unordered.

Rule 2: Guaranteed Access


Each unique piece of data (atomic value) should be accesible by : TableName + Primary Key (Row) + Attribute (Column) Violation: Ability to directly access via pointers

Rule3: Systematic treatment of NULL

NULLs may mean: Missing data, Not applicable, No value


Should be handled consistently - Not Zero or Blank Primary keys Not NULL

expressions on NULL should give NULL

Rule4: Active On-Line Catalogue

Database dictionary (Catalogue) to have description of the Database


Catalogue to be governed by same rules as rest of the database

The same query language to be used on catalogue as on the application database

Rule5: Powerful language

One well defined language to provide all manners of access to data

Example: SQL If file supporting table can be accessed by any manner except a SQL Interface, then a violation

Rule6: View Updating Rule

All views that are theoretically updatable should be updatable View = "Virtual table", temporarily derived from base tables Example: If a view is formed as join of 3 tables, changes to view should be reflected in base tables Not updatable: View does not have NOT-NULL attribute of base table Problems with computed fields in view e.g. Total Income = White income + Black income

Rule7: Relational level operations There must be insert, update, delete operations at the level of Relations

Set operations like Union,Intersection and Minus should be supported

Rule8: Physical Data Independence

The physical storage of data should not matter to the system


If say, some file supporting table was renamed or moved from one disk to another, it should not effect the applications.

Rule9: Logical Data Independence


If there is change in the logical structure (table structures) of the database the user view of the data should not change implemented through views. Say, if a table is split into two tables, a new view should give result as the join of the two tables

Difficult rule to satisfy

Rule10: Integrity Independence

The database should be able to enforce its own integrity rather than using other programs Integrity rules = Filter to allow correct data, should be stored in Data Dictionary Key and check constraints, triggers etc should be stored in Data Dictionary This also makes RDBMS independent of front end

Rule11: Distribution Independence

A database should work properly regardless of its distribution across a network


This lays foundation of Distributed databases

Similar to Rule8 only that applies to distribution on a local Disk

Rule12: Non-subversion Rule

If low level access is allowed to a system it should not be able to subvert or bypass integrity rules to change data This may be achieved by some sort of locking or encryption Some low level access tools are provided by vendors that violate these rules for extra speed

Structured Query Language

Structured Query Language


DDL

Data Definition Language DML Data Manipulation language DCL Data Control Language

DDL
Create

Alter
Drop Truncate

DML
Insert

Update
Delete Select

DCL
Commit

Rollback
Save point Set transaction

Integrity Constraints
Primary key (PK)

Foreign Key (FK)


Unique key (UK) Not Null

Check

Data Types
Character

Varchar2
Number Date

BLOB
BFILE

Arithmetic Operator
+

* /

Mod
ABS

AND OR IN NOT IN < > <= >= <> BETWEEN

Set Operators
UNION

UNION ALL
INTERSECTION MINUS

Thank You

Vous aimerez peut-être aussi