Vous êtes sur la page 1sur 78

File and Database Design

1. Data modeling with Entity


Relationship Diagrams
2. File Access and Organization

CSC 2202 Systems Analysis and Design 1


File and Database Design -
Objectives
Define entity, file, record, and attribute and
discuss the various types of keys
Explain the concept of entity relationships
diagram
Define cardinality and its notation.
Explain normalization
Describe various types of files, including
master, transaction, table, work, history, and
security.

CSC 2202 Systems Analysis and Design 2


File and Database Design -
Objectives
Evaluate methods of file organization,
including sequential, direct, and indexed
Calculate file sizes and storage
requirements
Discuss file and database control
measures

CSC 2202 Systems Analysis and Design 3


Introduction
Data terminology and concepts
Relationships among data objects
Entity-relationship diagrams (ERDs)
Normalization
File organization and access methods
File types
File media, sizing, and controls
CSC 2202 Systems Analysis and Design 4
Data Modeling with Entity
Relationship Diagrams

CSC 2202 Systems Analysis and Design 5


Key Definitions
Data model
 A formal way of representing the data that are
used and created by a business system
 Shows the people, places and things about which
data is captured and the relationships among
them.
Logical data model
 shows the organization of data without indicating
how it is stored, created, or manipulated.

CSC 2202 Systems Analysis and Design 6


Key Definitions
Physical data model
 shows how the data will actually be
stored in databases or files.
Normalization is the process
analysts use to validate data models.
Data models should balance with
process models

CSC 2202 Systems Analysis and Design 7


What Is an ERD?
A picture showing the information created,
stored, and used by a business system.
Entities generally represent similar kinds of
information
Lines drawn between entities show
relationships among the data
High level business rules are also shown

CSC 2202 Systems Analysis and Design 8


Using the ERD to Show
Business Rules
Business rules are constraints that are
followed when the system is in
operation.
ERD symbols can show when one
instance of an entity must exist for an
instance of another to exist
A doctor must exist before appointments
the doctor can be made
CSC 2202 Systems Analysis and Design 9
Using the ERD to Show
Business Rules
ERD symbols can show when one instance of
an entity can be related to only one or many
instances of another entity
 One doctor can have many patients; each patient
may have only one primary doctor
ERD symbols show when the existence of an
entity instance is optional for a related entity
instance
 A patient may or may not have insurance
coverage

CSC 2202 Systems Analysis and Design 10


An ERD Example

CSC 2202 Systems Analysis and Design


Entity
A person, place, event, or thing about which
data is collected
Must be multiple occurrences to be an entity
 Example: If a firm has only one warehouse, the
warehouse is not an entity. However, if the firm
has several warehouses, the warehouse could be
an entity if the firm wants to store data about each
warehouse instance.

CSC 2202 Systems Analysis and Design 12


Entities and Instances

CSC 2202 Systems Analysis and Design 13


Attributes
Information captured about an entity
Only those used by the organization
should be included in the model
Attribute names are nouns
Sometimes entity name is added at the
beginning of the attribute name for
clarity

CSC 2202 Systems Analysis and Design 14


Identifiers
One or more attributes can serve as the entity
identifier, uniquely identifying each entity
instance
Concatenated identifier consists of several
attributes
An identifier may be ‘artificial,’ such as
creating an ID number
Identifiers may not be developed until the
Design Phase

CSC 2202 Systems Analysis and Design 15


Identifier Types

CSC 2202 Systems Analysis and Design 16


Relationships
Associations between entities
The first entity in the relationship is the
parent entity; the second entity in the
relationship is the child entity
Relationships should have active verb
names
Relationships go in both directions

CSC 2202 Systems Analysis and Design 17


Cardinality
Cardinality
 refersto the number of times instances in
one entity can be related to instances in
another entity
 One instance in an entity refers to one and only one
instance in the related entity (1:1)
 One instance in an entity refers to one or more instances

in the related entity (1:N)


 One or more instances in an entity refer to one or more

instances in the related entity (M:N)

CSC 2202 Systems Analysis and Design 18


Modality
Modality
Refers to whether or not an instance
of a child entity can exist without a
related instance in the parent entity
 Not Null means that an instance in the related
entity must exist for an instance in another
entity to be valid
 Null means that no instance in the related entity
is necessary for an instance in another entity to
be valid
CSC 2202 Systems Analysis and Design 19
The Data Dictionary and
Metadata
Metadata is information stored about
components of the data model
Metadata is stored in the data dictionary so it
can be shared by developers and users
throughout the SDLC
A complete, shareable data dictionary helps
improve the quality of the system under
development

CSC 2202 Systems Analysis and Design 20


Metadata Examples
An Entity An Attribute A Relationship
can be can be can be
described by: described by: described by:

Name Name Verb phrase


Definition Description Parent entity
Special Notes Alias Child entity
User Contact Sample Values Definition
Analyst Contact Acceptable Values Cardinality
Format Modality
Type Notes
Special Notes

CSC 2202 Systems Analysis and Design 21


CREATING AN ENTITY-
RELATIONSHIP DIAGRAM

CSC 2202 Systems Analysis and Design 22


ERD Basics
Drawing the ERD is an iterative process
of trial and revision
ERDs can become quite complex

CSC 2202 Systems Analysis and Design 23


Steps in Building ERDs
Identify the entities
Add appropriate attributes for each
entity
Draw the relationships that connect
associated entities

CSC 2202 Systems Analysis and Design 24


Identify the Entities
Identify major categories of information
 If available, check the process models for data
stores, external entities, and data flows
 Check the major inputs and outputs from the use
cases
Verify that there is more than one instance of
the entity that occurs in the system

CSC 2202 Systems Analysis and Design 25


Add Appropriate Attributes
Identify attributes of the entity that are
relevant to the system under development
 Check the process model repository entries for
details on data flows and data stores
 Check the data requirements of the requirements
definition
 Interview knowledgeable users
 Perform document analysis on existing forms and
reports
Select the entity’s identifier

CSC 2202 Systems Analysis and Design 26


Draw the Relationships
Start with an entity and identify all entities
with which it shares relationships
Describe the relationship with the appropriate
verb phrase
Determine the cardinality and modality by
discussing the business rules with
knowledgeable users

CSC 2202 Systems Analysis and Design 27


ERD Building Tips
Data stores of the DFD should
correspond to entities
Only include entities with more than one
instance of information
Don’t include entities associated with
implementation of the system (they will
be added later)

CSC 2202 Systems Analysis and Design 28


Data Terminology and
Concepts
Data design includes
 Entities

 Fields

 Records

 Files

 Keys

CSC 2202 Systems Analysis and Design 29


Data Terminology and
Concepts
Definitions
 Entity:
a person, place, thing, or event for
which data is collected and maintained
 Field(attribute): a single characteristic or fact
about an entity
 Record: a collection of fields that describes
one instance of an entity
 File:
a set of records that contains data about
a specific entity
CSC 2202 Systems Analysis and Design 30
Data Terminology and
Concepts
Definitions
 Key field: a field used to locate, retrieve, or
identify a specific record
 Primary key: a key that uniquely identifies
each record

CSC 2202 Systems Analysis and Design 31


Data Terminology and
Concepts
Key fields
 Primary keys
A field or combination of fields that uniquely and
minimally identifies each member of an entity
 A primary key composed of more than one field is
called a multivalued key
 Candidate keys
 Any field that could serve as primary key
 Any field that is not a primary key or candidate
key is called a nonkey field

CSC 2202 Systems Analysis and Design 32


Data Terminology and
Concepts
Key fields
 Foreign keys
A field in one file that matches a primary key
value in another file
 Example: the advisor number is a foreign key in
the STUDENT file that matches a primary key
value in the ADVISOR file
 A foreign key need not be unique
 A combination of two or more foreign keys can
form a unique primary key value
 Referential integrity ensures that a foreign key
value cannot be entered unless it matches a
primary key value in another file

CSC 2202 Systems Analysis and Design 33


Data Terminology and
Concepts
Key fields
 Secondary keys
A field or combination of fields that can be used to
access or retrieve records
 Secondary keys do not need to be unique

CSC 2202 Systems Analysis and Design 34


The goal of data Modeling
To determine the required data
contents of the main computer files in
the new system
OR (more likely) to determine
the contents of data base tables
i.e. to eliminate redundancy
Data model forms the basis of
the physical storage

CSC 2202 Systems Analysis and Design 35


Entity-Relationship Diagrams
Entity-relationship diagrams (ERDs)
 An ERD is a graphical model that shows
relationships among system entities
 Each entity is a rectangle, labeled with a
noun
 Each relationship is a diamond, labeled with
a verb

CSC 2202 Systems Analysis and Design 36


Entity-Relationship Diagrams
Entity-relationship diagrams (ERDs)
 Types of relationships
 One-to-one(1:1)
 One-to-many (1:M)

 Many-to-many (M:N)

A full ERD shows all system relationships

CSC 2202 Systems Analysis and Design 37


The need for a data model
Relationships and data structures in the
system may be very complex
 Need to examine them independently of the
processing.
The entity-relationship data model was first
proposed by P. Chen (1976) - it has become
widely used in database design.
The resulting E-R diagram provides a
graphical notation for Modeling the data
CSC 2202 Systems Analysis and Design 38
Entity-relationship diagram
components
There are two main components in
an E-R diagram:
 entities(object types)
 relationships.

Entity is an object about which you


want to store data
Entities are represented by a
rectangle

CSC 2202 Systems Analysis and Design 39


Examples of ER Diagrams

CSC 2202 Systems Analysis and Design 40


Examples of ER Diagrams
(cont)

CSC 2202 Systems Analysis and Design 41


Data Relationships and ERDs
One-to-one (1:1) relationship
 Existswhen exactly one of the second entity
occurs for each instance of the first entity
 Examples
 One office manager heads one office
 One vehicle ID number is assigned to one vehicle

 One driver drives one delivery truck

 One faculty member is chairperson of one


department

CSC 2202 Systems Analysis and Design 42


SC
Fig 8.5
(1:1)

CSC 2202 Systems Analysis and Design 43


Data Relationships and ERDs
One-to-many (1:M) relationship
 Exists
when one occurrence of the first entity
can be related to many occurrences of the
second entity, but each occurrence of the
second entity can be associated with only
one occurrence of the first entity- Examples:
 One individual owns many automobiles
 One customer places many orders

 One department employs many employees

 One faculty advisor advises many students

CSC 2202 Systems Analysis and Design 44


SC
Fig 8.6
(1:M)

CSC 2202 Systems Analysis and Design 45


Data Relationships and ERDs
Many-to-many (M:N) relationship
 Exists when one instance of the first entity
can be related to many instances of the
second entity, and one instance of the
second entity can be related to many
instances of the first
 Examples
A student enrolls in one or more classes, and
each class has one or more students registered
 A passenger buys tickets for one or more flights,
and each flight has one or more passengers
 An order lists one or more products, and each
product is listed on one or more orders

CSC 2202 Systems Analysis and Design 46


(M:N)

CSC 2202 Systems Analysis and Design 47


Data Relationships and ERDs
A full ERD shows all system relationships
 Examples
A sales rep serves one or more customers, but
each customer has only one sales rep
 A customer places one or more orders, but each
order has only one customer
 An order lists one or more products, and each
product can be listed in one or more orders
 A warehouse stores one or more products, and
each product can be stored in one or more
warehouses

CSC 2202 Systems Analysis and Design 48


SC
Fig 8.8
Example

CSC 2202 Systems Analysis and Design 49


Data Relationships and ERDs
Cardinality
 Describes how instances of one entity relate
to another
 Mandatory vs. optional relationships
 Crow’s foot notation is one method of
showing cardinality
 Most CASE products support the drawing of
ERDs

CSC 2202 Systems Analysis and Design 50


SC
Fig 8.9
Crow’s
foot
notation

CSC 2202 Systems Analysis and Design 51


ERD from CASE tool

CSC 2202 Systems Analysis and Design 52


Example E-R diagrams
Assignment management system example
Alternative method for showing relationships
... replace single and double headed arrows,
or 0, 1, “crows foot” with the labels:
 (1) for one
 (M) or (N) for many

CSC 2202 Systems Analysis and Design 53


Example E-R diagram

CSC 2202 Systems Analysis and Design 54


Data Relationships and ER
Diagrams
Creating an ERD
1. Identify the entities
2. Determine all significant events or
activities for two or more entities
3. Analyze the nature of the interaction
4. Draw the ERD

CSC 2202 Systems Analysis and Design 55


Connection between DFD and
E-R
Consider the following example
Inventory data store on DFD
E-R diagram showing how the Inventory
entity is related to other entities
Same E-R diagram showing attributes
(data elements)

CSC 2202 Systems Analysis and Design 56


VALIDATING AN ERD

CSC 2202 Systems Analysis and Design 57


Design Guidelines
Best practices rather than rules
Entities should have many occurrences
Avoid unnecessary attributes
Clearly label all components
Apply correct cardinality and modality
Break attributes into lowest level needed
Labels should reflect common business terms
Assumptions should be clearly stated
CSC 2202 Systems Analysis and Design 58
Normalization

Technique used to validate data


models
Series of rules applied to logical
data model to improve its
organization
Three normalization rules are
common

CSC 2202 Systems Analysis and Design 59


Normalization Steps

CSC 2202 Systems Analysis and Design 60


Unnormalized Entity
Begin with an entity from
the logical data model

CSC 2202 Systems Analysis and Design 61


First Normal Form (1NF)
Look for repeating groups of attributes and
remove them into separate entities

CSC 2202 Systems Analysis and Design 62


Second Normal Form (2NF)
entity has a concatenated identifier, look for attributes that
nd only on part of the identifier. If found, remove to new ent

CSC 2202 Systems Analysis and Design 63


Third Normal Form (3NF)
or attributes that depend only on another nonidentifying att
nd, remove to new entity. Also remove any calculated attribu

CSC 2202 Systems Analysis and Design 64


ERD after normalisation
*

CSC 2202 Systems Analysis and Design 65


Balancing ERDs with DFDs
All analysis activities are interrelated
Process models contain two data components
 Data flows and data stores

The DFD data components need to balance the ERD’s data


stores (entities) and data elements (attributes)
Many CASE tools provide features to check for imbalance
Check that all data stores and elements correspond between
models
 Data that is not used is unnecessary
 Data that has been omitted results in an incomplete system
Do not follow thoughtlessly -- check that the models make
sense!

CSC 2202 Systems Analysis and Design 66


Steps in Database Design
Four steps in database design
1. Create the initial ERD
2. Assign all data elements to entities
3. Create 3NF designs for all records, taking
care to identify all primary, secondary, and
foreign keys
4. Verify all data dictionary entries

CSC 2202 Systems Analysis and Design 67


Database Management
Potential problems in a file processing
environment
 Data redundancy
 Inconsistent data
 Inefficiency

Database management design


A database is a structure that can store data
about many entities and the relationships
among them
CSC 2202 Systems Analysis and Design 68
Database Management
Elements of database management
systems
A database management system (DBMS) is
used to create, access, and control a
database
 Data definition language (DDL)
 Data manipulation language (DML)
 Query language
 Data dictionary
 Utility programs
CSC 2202 Systems Analysis and Design 69
Database Management
Characteristics of database management
A DBMS is more powerful, complex, and
expensive than a file processing system
 Effective security, backup, and recovery
procedures are essential
 Data is a company-wide resource
 Data mining enables users to extract
information from anywhere in the
organization

CSC 2202 Systems Analysis and Design 70


File Access and Organization
Types of files
A file is a set of logical records that
contains data about an entity
 Master files
 Table files
 Transaction files
 Work files
 Security files
 History files

CSC 2202 Systems Analysis and Design 71


File Access
The file access method determines how
programs read and write records
Two main methods are used
 Sequential access method
 Program reads all records in order until the end of
the file is reached
 Random access method
 Program can read any logical record without
having to access all preceding records

CSC 2202 Systems Analysis and Design 72


File Media Types
Technology has enabled many
alternatives, each with advantages and
disadvantages
 Magnetic tape
 Floppy disk
 Hard disk (fixed disk, direct access storage
device or DASD)
 RAID (Redundant Array of Inexpensive
Disks)

CSC 2202 Systems Analysis and Design 73


File Media Types
Technology has enabled many
alternatives, each with advantages and
disadvantages
 Removable disk cartridge
 Optical disc (CD-ROM)
 WORM
 Mass storage system

CSC 2202 Systems Analysis and Design 74


File and Database Control
Control must include measures to ensure
that data is correct, complete, and secure
 Passwords

 Encryption

 Backup and recovery procedures


 Audit trail files

 Audit fields

CSC 2202 Systems Analysis and Design 75


Summary
The ERD is the most common technique for
drawing data models. The building blocks of
the ERD are:
 Entities describe people, places, or things
 Attributes capture information about the entity

 Relationships associate data across entities

The ERD must be balanced with the DFD.

CSC 2202 Systems Analysis and Design 76


Summary (cont)
E-R Modeling is used extensively in
database design.
Data analysis is often considered more
important than functional analysis
- in a Computer Based Information
System it is easier to change a
program (which represents a process)
than to change the underlying data
structures (i.e. file or database)
CSC 2202 Systems Analysis and Design 77
Review Questions
1. What is an entity-relationship diagram and what is
the role of this diagram in the systems development
process?
2. “Data is a valuable organisational resource and
needs to carefully analysed and stored
commensurate with organisational needs.” Discuss
this statement.
3. What is normalisation and what role does it play in
the data design process?
4. Describe the attributes of a data structure which is
in 3rd Normal Form (3NF).

CSC 2202 Systems Analysis and Design 78

Vous aimerez peut-être aussi