Académique Documents
Professionnel Documents
Culture Documents
Edward McNeil
Database Concepts
PART I:
Models Terminology
Entities, Relationships, Attributes
PART II:
Entity Integrity Referential Integrity Normalization Forms Validation Rules Setting up a Database
Database Models
There are several models for databases: Tabular ("flat file") data in a single table, eg. spreadsheet Hierarchical eg. company departments Relational
Each row is a record (patient, child etc.) Each column is an attribute (age, sex etc.)
Terminology
Record a collection of fields describing a larger unit. Table a collection of records. Database a collection of table(s). Also includes forms for entering
data, rules for checking data, queries for selecting subsets of data and reports for showing the data.
Entity Something that exists, any distinguishable object. Relationship - A connection between two entities.
One-One Relationship - For each entity in one table there is at most one associated entity in the other table. For example, in a study of twins, for each twin there can be only one matching twin. Mainly used to separate data into more than one table, especially for large data sets. One-Many Relationship - One entity in table A is associated with zero or more entities in table B, but each entity in table B is associated with at most one entity in table A. For example, a mother may have many children but a child has only one mother. Many-Many Relationship - There are no restrictions on how many entities in either table are associated with a single entity in the other. An example would be students taking classes. Each student takes many classes. Each class has many students. Third table needed.
Database Concepts
Term Entity Tuple Attribute Relationship Database Table, file Row, record Column, field Join, link Example Students Record 23 Age, sex One-many
Relational Database
A relational database is a set of related tables, each concerning a specific topic. Advantages:
Reduction of data redundancy Eliminates inconsistencies (errors). High data integrity and quality Data can be descriptive Allows implementation of security
Database Designs
Child Table Mother ID+ Child ID Child Name Date of Birth Sex Blood Group
= foreign key primary key
10
Referential Integrity
Refers to a foreign key, which is a field in a table that corresponds to a primary key field in another table containing the same information. The fields must be of the same type. Referential Integrity means that the foreign key cannot have a value that is not present in the corresponding table. Referential Integrity prevents entering inconsistent data. Breaking the referential integrity link creates orphans.
11
Referential Integrity
Database
Table A RI link You cannot enter a record in Table B without a corresponding matching record in Table A.
Table B
12
Cascades
Referential integrity prevents you from adding a record to Table B that cant be linked to Table A. Cascading delete
Whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted.
Cascading update
Whenever you modify the value of a linked field in Table A, all records in Table B that are linked to it will also be modified accordingly.
13
14
Cascade Delete
Mothers Table
MotherID 1 2 3 Birth 12/2/1960 30/1/1945 31/3/1973 Blood A B AB
Child Table
MotherID 1 1 2 2 ChildID 1 2 1 2 Birth 1/11/2002 1/11/2002 12/7/2002 21/9/2003 Blood A A B A
The Delete action will delete the mother record and all matching records in the child table.
15
Cascade Update
Mothers Table
MotherID 9 1 2 3 Birth 12/2/1960 30/1/1945 31/3/1973 Blood A B AB
Child Table
MotherID 9 1 9 1 2 2 ChildID 1 2 1 2 Birth 1/11/2002 1/11/2002 12/7/2002 21/9/2003 Blood A A B A
The Update action will update the mother record and all matching records in the child table.
16
Normalization
Normalization is the method of structuring your database so that redundancy is minimized and data quality is maximized. Normalization is the method of organizing data as efficiently as possible.
17
Normalization Goals
1. To contain all the data and only the data necessary for the purposes that the database is to serve. 2. To have as little redundancy as possible. 3. To accommodate multiple values for types of data that require them. 4. To permit efficient updates of the data in the database. 5. To avoid the possibility of losing data unknowingly.
18
Normalization Rules
1. 2. 3. 4. 5. Eliminate repeating groups in individual tables. Create a separate table for each set of related data. Identify each set of related data with a primary key. Eliminate redundant data. Eliminate columns not dependent on the primary key. In the Mother-Child example, each mother should contain information related to the mother only. Child data would be stored in a separate table and the two tables should be linked using a one-many join.
19
20
play dead,
fetch stick
21
22
Problem:
Some puppies know more tricks than others and some puppies dont know any. What if we want a list of puppies that can play dead ? What if Rover learnt a new trick ? Querying the data is untidy and awkward.
Solution:
Put the tricks data into a separate table.
23
Modifications
1. Eliminate repeating groups. 2. Make a separate table for each set of related attributes. 3. Give each table a primary key.
Puppies table puppy ID puppy name puppy age kennel code kennel name kennel location Tricks table puppy ID trick ID trick name skill level where learnt
24
Anomalies
The trick name (e.g., roll over) appears redundantly for every puppy that knows it. Just trick ID would do. Also, suppose you want to reclassify a trick, i.e., to give it a different skill level. The change has to be made for every puppy that knows the trick. If you miss some of the changes, you will have several puppies with the same trick with different skill levels. This is an update anomaly. Or suppose the last puppy knowing a particular trick gets run over by a car. Its records will be removed from the database, and if this puppy is the only puppy to know the trick then this trick will not be stored anywhere! This trick will have been lost forever. This is a delete anomaly.
25
Redundancies
Rule 4. Eliminate redundant data.
Puppies table puppy ID puppy name puppy age kennel code kennel name kennel location Puppy tricks table puppy ID trick ID where learnt Tricks table trick ID trick name skill level
Now every trick is stored separately. Deleting a puppy will not delete any tricks (only the fact that it knew the trick).
26
27
28
Answer: Yes.
SELECT puppies.puppyID, puppies.name FROM puppies INNER JOIN [puppy tricks] ON puppies.puppyID = [puppy tricks].puppyID WHERE [puppy tricks].trickID = 1
29
Kennels
Rule 5. Eliminate columns not dependent on key.
If attributes do not contribute to a description of the key, remove them to a separate table. In the puppies table the key is Puppy ID, and the kennel name and kennel location describe only a kennel, not a puppy. They should thus be moved into a separate table. Since they describe a kennel, Kennel code would become the key of the new Kennels table. Eg. suppose there are no puppies from the Daisy Hill Puppy Farm currently stored in the database. In the current database design, this kennel would never exist.
30
31
Review
One-Many: Many-Many:
A kennel can hold many puppies. A puppy can know many tricks and a trick can be known by many puppies.
32
Normal Forms
Normalization is the process of simplifying the design of a database so that it achieves the optimum efficiency. Normalization theory gives us the concept of normal forms to assist us in achieving that optimum efficiency. Normal forms are a linear progression of rules that you apply to your database, with each higher normal form achieving a better, more efficient design.
33
Normal Forms
1st Normal form - all column values must be atomic, and no repeating groups. 2nd Normal form- every non-key column is fully dependent on the primary key. 3rd Normal form - all non-key columns are mutually independent BCNF - candidate keys 4th Normal form - multi-valued dependencies (pairwise only) 5th Normal form - multi-valued dependencies
34
35
36
37
38
39
40
41
Why Normalize?
Minimizes disk space
speeds up queries, no need to restructure database
42
43
44
Stating a clear objective at the start will ensure that all the following steps are done to advance that objective.
45
46
47
48
4. Construct Relationships
Make sure each table has a unique key (single or composite). Take advantage of the computers automatic number field. Enforce Referential Integrity whenever possible (with or without cascading).
49
50
51
52
53
Summary
Research your topic before designing a database. Normalize your database wherever possible.
records are free, new fields are expensive.
Design data entry forms that enable quick data entry (limit use of mouse). Ensure the data you enter is valid (double entry).
54