2 - Physical DB Design

From Modern Database Management
11th Ed Hoffer, Ramesh, Topi

Conceptual, Logical, and Physical Data Model
Conceptual Data Model
Logical Data Model (LDM) Physical Data Model (PDM)
(CDM)
Includes tables, columns,
Includes entities (tables),
Includes high-level data keys, data types, validation
attributes (columns/fields)
constructs rules, database triggers, and
and relationships (keys)
access constraints
Uses more defined and less
Non-technical names, so generic specific names for
that executives and tables and columns, such as
managers at all levels can Uses business names for abbreviated column names,
understand the data basis entities & attributes limited by the database
of Architectural management system
Description (DBMS) and any company
defined standards
Is independent of
Represent data from the technology platform; Requires a knowledge of
viewpoint of the describes the data in the specific DBMS that will
organization, independent terms of data mgnt be used to implement the
of any technology technology which will be database
used
Physical Database Design
- Definition : translates logical description of
data into technical specifications for storing
and retrieving data
- Goal : create a design for storing data that

will provide data processing efficiency and
ensure database integrity, security, and
recoverability
- Output : technical specifications for use in

the implementation phase in information
systems construction
Key Decisions in Physical DB Design
1. Data type for each attribute (storage
format) - must maximize data integrity;
minimize storage space
2. Grouping of attributes - grouping of

attributes in a logical data model is not
always the optimal grouping in a physical
design
Key Decisions in Physical DB Design:
3. Choice of file organization arranging
similarly structured records in secondary
memory so that records can be stored,
retrieved, and updated rapidly
4. Selection of indexes and overall database

architecture for efficiency of data retrieval
5. Proper handling of queries by the DBMS so

that file organization and indexes will be
optimized
format)
2. Grouping of attributes
3. Choice of File Organization
4. Selection of Indexes
5. Proper Handling of Queries


format) must maximize data integrity;
minimize storage space
Choose the appropriate data type

1. Choosing Data Types
Represent all possible values

Alphanumeric as needed e.g. Drivers
license
Improve data integrity

Default value
Range control
Null-value control
Referential integrity
Choosing Data Types
Support all data manipulations

Numeric for numerical calculations
(width should be enough)
Date for date computations
Character for parsing text
Minimize storage space (width should be

just enough)
format)

2. Grouping of attributes - grouping of

attributes in a logical data model is not
always the optimal grouping in a physical
design
Denormalization
Normalization Definitions
Normalization involves decomposing relations
to produce smaller, well-structured relations
A formal process for deciding which attributes

should be grouped together in a relation so that
all anomalies are removed
More specifically, if a relation is normalized

(well-formed), rows can be inserted, deleted, or
modified without creating anomalies
Normalization Goals
Minimize data redundancy thereby

conserving space and avoiding anomalies
Make it easier to maintain data

Disadvantages of Denormalization
- Redundant copies of same data are often not

updated in synchronized way
- Extra programming is required to ensure that

all copies of exactly the same business data
are updated together
- More storage space for raw data and for DB

overhead (e.g. indexes)
Denormalization
- Mechanism used to improve efficient

processing of data
- Quick access to stored data
- Motivation for denormalization :

normalized tables often creates many
tables and joining tables slows DB
processing
Opportunities for Denormalization
1.Two entities with one-to-one relationship -

Even if one of the entities is an optional
participant
2.A many-to-many relationship (associative
entity) with non-key attributes
3.Reference Data - exists on the one side of a
one-to-many relationship and this entity
participates in no other database
relationship
1.Two entities with one-to-one relationship -

Even if one of the entities is an optional
participant
student(studentid, campusaddress)
application(applicationid , applicationdate,
qualification, studentid)
student(studentid, campusaddress,
applicationdate, qualification)
2. A many-to-many relationship (associative

vendor(vendorid, address, contact name)
pricequote(price)
item(itemid, description)

pricequote(vendorid,itemid, price)
item(itemid, description)

itemquote(vendorid,itemid, price,
description)
3. Reference Data - exists on the one side of a one-to-many

relationship and this entity participates in no other database relationship
ITEM
itemid itemdesc instrid
advantageous
A1 laptop 1 when there are
A2 tablet 1 few instances of
A3 computer table 2 the entity on the
A4 cabinet 2 many side for each
STORAGE entity on the one
side
instrid wherestore constainertype
1 ortigas depot van
2 pasig depot truck
3. Reference Data - exists on the one side of a one-to-many

relationship and this entity participates in no other database relationship
ITEM
itemid itemdesc wherestore constainertype
A1 laptop ortigas depot van
A2 tablet ortigas depot van
computer
A3 table pasig depot truck
A4 cabinet pasig depot truck
Other Forms of Denormalization
- Partitioning creation of more tables

(horizontal, vertical , or record partitioning)
- Data replication
Horizontal Partitioning
- Places different rows into separate tables

based on a common value
Three Forms
Range
Hash
List
Three (3) forms of Horizontal Partitioning
Range - Each partition is defined by a

range of values (lower and upper key
value limits)
Example - Partition by range of dates

Hash Data are evenly spread across

partitions independent of any
partition key value
Example - for a table of 1M records,

to be divided in 5 partitions, each
partition will compose of 250k records
List Partitions are defined based on

predefined list of values
Example Table partitioned based

on region. All records with of regions I,
II, III will be grouped, while IV, V, VII will
be grouped, etc..
Data Replication same data are stored in

multiple places for improved data access
speed.
Advantages of Partitioning
1.Efficiency
2.Local optimization
3.Security
4.Recovery and uptime
5.Load Balancing
Disadvantages of Partitioning
1.Inconsistent access speed
2.Complexity
3.Extra space and update time
Vertical Partitioning - distributes the

columns of a logical relation into separate
tables
Record Partitioning combination of

horizontal and vertical partitioning and is
common for a database whose files are
distributed across multiple computers.
format)

3. Choice of file organization arranging

similarly structured records in secondary
memory so that records can be stored,
retrieved, and updated rapidly
3. File Organizations
Sequential
Indexed
Hashed
Sequential Data are stored in

sequence
Not used in database except for back-up purposes

File Organizations - Sequential
Data Access Scan file from the

beginning until record is found
Data Insertion, Update - requires

rewriting a file
Data Deletion Records are marked

for deletion; requires reorganizing
Indexed - An Index is created that is

used as location of records
Extensively used with relational DBMS

File Organizations - Indexed
Data Access Random retrieval is

moderately fast; search records by index key
Data Insertion- Easy, requires maintenance
of indexes
Data Update- Easy, requires maintenance of
indexes
Data Deletion - Easy, requires maintenance
of indexes
Hashed - Uses a hashing algorithm to

create a record address
File Organizations - Hashed
Data Access Random retrieval is very

fast
Data Insertion- Very easy
Data Update- Very easy
Data Deletion - Very easy

File Organizations - In terms of storage
space
Sequential - no wasted space

Index no wasted space but requires
extra space for index
Hash Extra space may be needed to
allow for addition and deletion of records
after the initial set of records is loaded
format)

4. Selection of indexes and overall database
architecture for efficiency of data retrieval
Selecting an index is one of the most

important decisions in DB design
When to Use Indexes
o Indexes should be used generously for

data retrieval purposes e.g. data
warehouse applications
o Indexes should be used judiciously for

databases with heavy processing
requirements
Indexes makes the data retrieval and

transactions faster if they are properly
implemented.
Over-indexing may degrade performance,

specially when inserting or updating
records.
Types of Indexes:
Clustered
Non-Clustered
https://technet.microsoft.com/en-us/library/ms190457(v=sql.110).aspx
Clustered indexes sort and store the data

rows in the table based on their key values.
Only one clustered index per table as data

rows can be sorted in one order.
https://technet.microsoft.com/en-us/library/ms190457(v=sql.110).aspx
Non-clustered index contains the the
non-clustered index key values and each
key value entry has a pointer to the data
row that contains the key value.
The data in the index is stored in order

based on the key value while the data rows
of the underlying table are not sorted.
You can create multiple non-clustered

indexes on a table.
Index Use Tips
1. Indexes are most useful in large
tables
2. Specify a unique index for the

primary key of each table
3. Indexes are most useful for columns

that frequently appear in WHERE
clauses
Index Use Tips
4. Use an index for attributes
referenced in ORDER BY
5. Use an index if there is significant

variety in the values of a group
6. Consider creating surrogate keys for

index fields with long values
Index Use Tips
7. Check your DBMS for the limit, if any,
on the number of indexes allowable
per table.
8. Be careful of indexing attributes that

have null values as these cannot be
referenced in the index
format)

5. Proper handling of queries by the DBMS so

that file organization and indexes will be
optimized
DBMS
Parallel Query Processing
Use of multiple processors in DB servers
Query Optimization
Optimizer determines access indexes,

join operations, etc.
Guidelines for Better Query Design
Write simple queries

Retrieve only the data you need
Use compatible data types for fields and
literals in queries
Break complex queries into multiple simple
parts
Dont nest one query inside another query
Create temporary tables for groups of
queries
Guidelines for Better Query Design
Understand how indexes are used in query

processing
In general queries with equality criteria
are more efficient
Dont have the DBMS sort without an
index
If possible, avoid using self-joins
Thank you!

2 - Physical DB Design

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

2 - Physical DB Design

Transféré par

Droits d'auteur :

Formats disponibles

From Modern Database Management

11th Ed Hoffer, Ramesh, Topi

- Goal : create a design for storing data that

- Output : technical specifications for use in

2. Grouping of attributes - grouping of

4. Selection of indexes and overall database

5. Proper handling of queries by the DBMS so

3. Choice of File Organization

5. Proper Handling of Queries

1. Data type for each attribute (storage

Choose the appropriate data type

1. Choosing Data Types

Represent all possible values

Improve data integrity

Support all data manipulations

Minimize storage space (width should be

3. Choice of File Organization

5. Proper Handling of Queries

2. Grouping of attributes - grouping of

A formal process for deciding which attributes

More specifically, if a relation is normalized

Minimize data redundancy thereby

Make it easier to maintain data

- Redundant copies of same data are often not

- Extra programming is required to ensure that

- More storage space for raw data and for DB

- Mechanism used to improve efficient

- Quick access to stored data

- Motivation for denormalization :

1.Two entities with one-to-one relationship -

1.Two entities with one-to-one relationship -

2. A many-to-many relationship (associative

vendor(vendorid, address, contact name)

2. A many-to-many relationship (associative

vendor(vendorid, address, contact name)

2. A many-to-many relationship (associative

vendor(vendorid, address, contact name)

3. Reference Data - exists on the one side of a one-to-many

3. Reference Data - exists on the one side of a one-to-many

- Partitioning creation of more tables

- Places different rows into separate tables

Three (3) forms of Horizontal Partitioning

Range - Each partition is defined by a

Example - Partition by range of dates

Three (3) forms of Horizontal Partitioning

Hash Data are evenly spread across

Example - for a table of 1M records,

Three (3) forms of Horizontal Partitioning

List Partitions are defined based on

Example Table partitioned based

Data Replication same data are stored in

Vertical Partitioning - distributes the

Record Partitioning combination of

3. Choice of File Organization

5. Proper Handling of Queries

3. Choice of file organization arranging

Sequential Data are stored in

Not used in database except for back-up purposes

Data Access Scan file from the

Data Insertion, Update - requires

Data Deletion Records are marked

Indexed - An Index is created that is

Extensively used with relational DBMS