Vous êtes sur la page 1sur 27

Database Concepts

Edward McNeil

Database Concepts
PART I:

Models Terminology
Entities, Relationships, Attributes

PART II:

Entity Integrity Referential Integrity Normalization Forms Validation Rules Setting up a Database

Database Models
There are several models for databases: Tabular ("flat file") data in a single table, eg. spreadsheet Hierarchical eg. company departments Relational
Each row is a record (patient, child etc.) Each column is an attribute (age, sex etc.)

Terminology
Record a collection of fields describing a larger unit. Table a collection of records. Database a collection of table(s). Also includes forms for entering
data, rules for checking data, queries for selecting subsets of data and reports for showing the data.

Entity Something that exists, any distinguishable object. Relationship - A connection between two entities.
One-One Relationship - For each entity in one table there is at most one associated entity in the other table. For example, in a study of twins, for each twin there can be only one matching twin. Mainly used to separate data into more than one table, especially for large data sets. One-Many Relationship - One entity in table A is associated with zero or more entities in table B, but each entity in table B is associated with at most one entity in table A. For example, a mother may have many children but a child has only one mother. Many-Many Relationship - There are no restrictions on how many entities in either table are associated with a single entity in the other. An example would be students taking classes. Each student takes many classes. Each class has many students. Third table needed.

Index used to speed up the performance of queries.


Records do not have to be searched sequentially.

Primary Key used to uniquely identify one particular entity.


Can be one or more fields. (One is better). Must be unique. (Computer generated may be better). Value should not change over time. Should be meaningless (independent of the record). Must always have a value (non null).

Entity Integrity Rule


All tables should have a primary key, which cannot be null. The Primary Key uniquely identifies each record from all others in a table.

Database Concepts
Term Entity Tuple Attribute Relationship Database Table, file Row, record Column, field Join, link Example Students Record 23 Age, sex One-many

Relational Database
A relational database is a set of related tables, each concerning a specific topic. Advantages:
Reduction of data redundancy Eliminates inconsistencies (errors). High data integrity and quality Data can be descriptive Allows implementation of security

Database Designs

This is an example of a bad design.

Setting up a Relational Database


Mother Table Mother ID* Mother Name Date of Birth Address Nationality Blood Group * = primary key
+

Child Table Mother ID+ Child ID Child Name Date of Birth Sex Blood Group
= foreign key primary key

10

Referential Integrity
Refers to a foreign key, which is a field in a table that corresponds to a primary key field in another table containing the same information. The fields must be of the same type. Referential Integrity means that the foreign key cannot have a value that is not present in the corresponding table. Referential Integrity prevents entering inconsistent data. Breaking the referential integrity link creates orphans.

11

Referential Integrity
Database

Table A RI link You cannot enter a record in Table B without a corresponding matching record in Table A.

Table B

12

Cascades
Referential integrity prevents you from adding a record to Table B that cant be linked to Table A. Cascading delete
Whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted.

Cascading update
Whenever you modify the value of a linked field in Table A, all records in Table B that are linked to it will also be modified accordingly.

13

14

Cascade Delete
Mothers Table
MotherID 1 2 3 Birth 12/2/1960 30/1/1945 31/3/1973 Blood A B AB

Example: Delete FROM Mothers WHERE MotherID = 1;

Child Table
MotherID 1 1 2 2 ChildID 1 2 1 2 Birth 1/11/2002 1/11/2002 12/7/2002 21/9/2003 Blood A A B A

The Delete action will delete the mother record and all matching records in the child table.

15

Cascade Update
Mothers Table
MotherID 9 1 2 3 Birth 12/2/1960 30/1/1945 31/3/1973 Blood A B AB

Example: Update Mothers SET MotherID = 9 WHERE MotherID = 1;

Child Table
MotherID 9 1 9 1 2 2 ChildID 1 2 1 2 Birth 1/11/2002 1/11/2002 12/7/2002 21/9/2003 Blood A A B A

The Update action will update the mother record and all matching records in the child table.

16

Normalization
Normalization is the method of structuring your database so that redundancy is minimized and data quality is maximized. Normalization is the method of organizing data as efficiently as possible.

The above design is un-normalized.

17

Normalization Goals
1. To contain all the data and only the data necessary for the purposes that the database is to serve. 2. To have as little redundancy as possible. 3. To accommodate multiple values for types of data that require them. 4. To permit efficient updates of the data in the database. 5. To avoid the possibility of losing data unknowingly.

18

Normalization Rules
1. 2. 3. 4. 5. Eliminate repeating groups in individual tables. Create a separate table for each set of related data. Identify each set of related data with a primary key. Eliminate redundant data. Eliminate columns not dependent on the primary key. In the Mother-Child example, each mother should contain information related to the mother only. Child data would be stored in a separate table and the two tables should be linked using a one-many join.
19

Puppies and Tricks


Task: Set up a normalized database dealing with puppies, kennels, and tricks performed by the puppies.
Unnormalized Data Items for Puppies
Puppy ID Puppy Name Puppy Age Kennel Code Kennel Name Kennel Location Trick ID 1...n Trick Name 1...n Skill Level 1...n Trick Where Learnt 1...n

20

Puppies and Tricks


Puppy Fifi Blackie Lassie Spot Rover Tricks roll over, roll over, play dead roll over, play dead sit up and beg

play dead,

fetch stick

Question: Can Fifi roll over?

21

Can Fifi roll over?


SELECT puppies.puppyID, puppies.name FROM puppies WHERE puppies.trick1="roll over" OR puppies.trick2="roll over" OR puppies.trick3="roll over"

22

Problem:
Some puppies know more tricks than others and some puppies dont know any. What if we want a list of puppies that can play dead ? What if Rover learnt a new trick ? Querying the data is untidy and awkward.

Solution:
Put the tricks data into a separate table.

23

Modifications
1. Eliminate repeating groups. 2. Make a separate table for each set of related attributes. 3. Give each table a primary key.
Puppies table puppy ID puppy name puppy age kennel code kennel name kennel location Tricks table puppy ID trick ID trick name skill level where learnt

24

Anomalies
The trick name (e.g., roll over) appears redundantly for every puppy that knows it. Just trick ID would do. Also, suppose you want to reclassify a trick, i.e., to give it a different skill level. The change has to be made for every puppy that knows the trick. If you miss some of the changes, you will have several puppies with the same trick with different skill levels. This is an update anomaly. Or suppose the last puppy knowing a particular trick gets run over by a car. Its records will be removed from the database, and if this puppy is the only puppy to know the trick then this trick will not be stored anywhere! This trick will have been lost forever. This is a delete anomaly.

25

Redundancies
Rule 4. Eliminate redundant data.
Puppies table puppy ID puppy name puppy age kennel code kennel name kennel location Puppy tricks table puppy ID trick ID where learnt Tricks table trick ID trick name skill level

Now every trick is stored separately. Deleting a puppy will not delete any tricks (only the fact that it knew the trick).

26

27

28

Can Fifi roll over?

Answer: Yes.
SELECT puppies.puppyID, puppies.name FROM puppies INNER JOIN [puppy tricks] ON puppies.puppyID = [puppy tricks].puppyID WHERE [puppy tricks].trickID = 1

29

Kennels
Rule 5. Eliminate columns not dependent on key.
If attributes do not contribute to a description of the key, remove them to a separate table. In the puppies table the key is Puppy ID, and the kennel name and kennel location describe only a kennel, not a puppy. They should thus be moved into a separate table. Since they describe a kennel, Kennel code would become the key of the new Kennels table. Eg. suppose there are no puppies from the Daisy Hill Puppy Farm currently stored in the database. In the current database design, this kennel would never exist.

30

Final Database Design

31

Review
One-Many: Many-Many:
A kennel can hold many puppies. A puppy can know many tricks and a trick can be known by many puppies.

32

Normal Forms
Normalization is the process of simplifying the design of a database so that it achieves the optimum efficiency. Normalization theory gives us the concept of normal forms to assist us in achieving that optimum efficiency. Normal forms are a linear progression of rules that you apply to your database, with each higher normal form achieving a better, more efficient design.

33

Normal Forms
1st Normal form - all column values must be atomic, and no repeating groups. 2nd Normal form- every non-key column is fully dependent on the primary key. 3rd Normal form - all non-key columns are mutually independent BCNF - candidate keys 4th Normal form - multi-valued dependencies (pairwise only) 5th Normal form - multi-valued dependencies

34

1st Normal Form


For every field in a given table, there exists only one value, not an array or list of values. The table below would violate 1st Normal Form, since the values in the Items column are not atomic.

35

1st Normal Form


We could improve this table by replacing the single Items column with 6 columns. Eg. Quant1, Item1, Quant2, Item2, Quant3, Item3... ...but we still have repeating groups!

36

1st Normal Form


The table below is now in 1st Normal Form since we now have no repeating groups. Constructing a query is now very simple.

37

2nd Normal Form


Tables should only store data relating to one entity and that entity should be described by its primary key. The table below is in 1st Normal Form, but violates 2nd Normal Form, since knowing OrderId also implies knowing CustomerID and OrderDate.

38

3rd Normal Form


Exclude fields that are dependent on other fields, including calculated fields. The tables below are in 2nd Normal Form, but table tblOrderDetail violates 3rd Normal Form, since ProductId and ProductDescription are mutually dependent.

39

3rd Normal Form


The tables below are in 3rd Normal Form, since they are in 2nd Normal Form and there are no dependent columns.

40

Higher Normal Forms


Every higher normal form is a superset of all lower forms. Thus, if your design is in 3NF, by definition it is also in 1NF and 2NF If you've normalized your database to 3NF, you've likely also achieved Bocye-Codd Normal Form (and maybe even 4NF or 5NF). The principles of database design are nothing more than formalized common sense. - C.J. Date. Database design is more art than science

41

Why Normalize?
Minimizes disk space
speeds up queries, no need to restructure database

Improves data quality


reduces anomalies and inconsistencies

Makes statistical analysis easier


more time to write the report

42

PART II Setting up a Relational Database

43

PART II Setting up a Relational Database


1. 2. 3. 4. 5. 6. 7. Define the objective Research the current topic Design the table structures Construct relationships Create form(s) and report(s) Implement validation rules and constraints Enter data

44

1. Define the Objective


What data needs to be stored in the database? Will this data be able to answer your research questions or address your problems?

Stating a clear objective at the start will ensure that all the following steps are done to advance that objective.

45

2. Research the Current Topic


Have other people done similar research? Are there other databases already in existence? How was the data collected and how was it presented? Will the data be entered manually or imported?

46

3. Design the Table Structures


Identify fields and put them into distinct groups (tables). Develop the specifications for every field in every table.
Numeric: any value that can be used in mathematical computations. Boolean: yes/no, true/false, on/off ... Date/Time: any variation of this can be stored. Text: fixed length or variable strings (use sparingly).

47

3. Design the Table Structures


Use descriptive names that reflect the subject. Avoid abbreviations and acronyms. Dont use punctuation or spaces. Table names should be plural, fields singular. Linked tables should be a combination of the two tables used.

48

4. Construct Relationships
Make sure each table has a unique key (single or composite). Take advantage of the computers automatic number field. Enforce Referential Integrity whenever possible (with or without cascading).

49

5. Create the Forms


Tables are ok for entering single lists. Forms are better, especially for larger, more complex tables. Forms enable: Validation rules across multiple tables. A friendly user interface. The use of powerful programming languages to perform complex tasks. Try to make data entry forms resemble paper forms as much as possible.

50

6. Introduce Validation Rules


Rules and constraints typically lead to cleaner data entry and thus high quality information. Dont overly limit the range of data to be entered. Outliers can occur perhaps give a warning message. Combo boxes can limit choices for categorical data, however data entry time is increased due to use of the mouse.

51

7. Enter the Data

52

Common Database Mistakes


Spreadsheet design Too much data Not enough data Compound fields (Name + Surname) Missing keys Bad keys Missing relations Unnecessary relations Incorrect relations Duplicate field names

53

Summary
Research your topic before designing a database. Normalize your database wherever possible.
records are free, new fields are expensive.

Design data entry forms that enable quick data entry (limit use of mouse). Ensure the data you enter is valid (double entry).

54

Vous aimerez peut-être aussi