Vous êtes sur la page 1sur 14

Tutorial: Step by Step Database Design in SQL

Feb 22, 2015


53,597 views
172 Likes

20 Comments
Share on LinkedIn

Share on Facebook

Share on Twitter

Please check out my related article "How did the modern relational
database come to be?" which is currently trending in Big Data and
follow me for daily articles on technology, digital marketing,
psychology and pharmaceuticals.

Database Design and Implementation is applicable for whatever


industry your in. Here is a step by step approach to designing and
implementing a database in your organisation, using specific data
from a sweet shop case study I implemented during my M.Sc. in
Software & Information Systems. By the end of this tutorial, you will
know about databases, advantages of databases system over regular
file system, the steps of a database design process, software
development lifecycle, qualities of a well built database, relations and
relationships, data integrity, and more. Databases are used in every
industry, including the pharmaceutical industry.
Background of Databases
The database system approach to data management overcomes
many of the shortcomings of the old-fashioned file system approach.
One of the key features of a database system is that data is stored
as a single logical unit. What this means is that although the data
may be spread across multiple physical files, the database conveys
the data as being located in a single data repository. Organizing data

in single logical repository allows for easy manipulation and


querying of the data, in contrast with traditional file systems where
the programmer must specify what and how the data retrieval is
done.
With database systems, it need only be specified what must be
done, the DBMS (Database Management System) does the rest.
Another advantage of the database approach is that, because data
is located in one single database, data in different physical locations
need not be duplicated. The database software can interact with all
the data in the database. Non duplication of data is one way of
maintaining the integrity of the data. When data is allowed to be
duplicated, errors can happen if one instance of the data is altered
and another instance remains the same. When data is allowed to be
duplicated, more maintenance and system resources are required to
ensure that data is always integral.
One of the greatest benefits of databases is that data can be shared
or secured among users or applications. There is more control and
accountability over how the data is managed because the data all
resides in one database.
If there are shortcomings to database systems, its that much more
powerful and sophisticated software is needed to control the
database and designing the software and database can be
extremely time consuming. More extensive knowledge of how to use
the database is required, thus making the database system less user

friendly than traditional file systems. Since the database is one


logical repository, even a small error can damage the entire
database and reduce the integrity of the data. One benchmark of a
good database is one which is complete, integral, simple,
understandable, flexible and implementable. Batini et al says that
database modelling strives for a non redundant, unified
representation of all data managed in an organization. By following
the database software development lifecycle methodology, and by
using the data models, the database design ideals are fulfilled and
will minimize the disadvantages.

Databases & the Software Development Lifecycle


The steps in developing any application can be represented as a
linear sequence where each step in the sequence is a function,
which passes its output to its successor function. Adherence of a
waterfall development model ensures quality software, which is
complete, efficient, usable, consistent, correct and flexible.
These traits are also some of the core underpinnings of a well-built
database. The waterfall model can be applied to database design
theory as effectively as it is applied to other software engineering
theory. The steps can be summarized as follows:

Requirements specification -> Analysis -> Conceptual design ->


Implementation Design-> Physical Schema Design and Optimisation
In consultation with all potential users of the database, a database
designers first step is to draw up a data requirements document.

The requirements document contains a concise and non-technical


summary of what data items will be stored in the database, and how
the various data items relate to one another. Taking the data
requirements document, further analysis is done to give meaning to
the data items, e.g. define the more detailed attributes of the data
and define constraints if needed. The result of this analysis is a
preliminary specifications document. Taking the specifications
document, the database designer models how the information is
viewed by the database system and is how it is processed and
conveyed to the end user. In the implementation design phase, the
conceptual design is translated into a more low-level, DBMS specific
design.
Data Models & Schemas as a Means of Capturing Data
The database development design phases brings up the concept of
data models. Data models are diagrams or schemas, which are
used to present the data requirements at different levels of
abstraction. The first step in the Database Development Life Cycle is
to draw up a requirements document.

Figure 1: A
basic example of a requirements document
The requirements document can then be analysed and turned into a
basic data set (as shown in Figure 2) which can be converted into a
conceptual model. The end result of the conceptual design phase is
a conceptual data model (Figure 3), which provides little information
of how the database system will eventually be implemented. The
conceptual data model is simply a high-level overview of the
database system.

Figure 2: A Database Data Set is the Result of analyzing the


Information from the Requirements Phase. The Primary Keys are
Underlined.

Figure 3: A
Normalized Entity-Relationship model (ERD) in Crows Foot Notation
is an Example of a Conceptual Data Model and provides no
information of how the database system will eventually be
implemented
In the implementation design phase, the conceptual data model is
translated into a logical representation of the database system.
The logical data model conveys the logical functioning and

structure of the database and describes how the data is stored


(e.g. what tables are used, what constraints are applied) but is not
specific to any DBMS. Logical database model is a lower-level
conceptual model, which must be translated to a physical design.

Figure 4: In the implementation design phase, the conceptual data


model (ERD) is translated into a logical representation (logical
schema) of the database system: a data dictionary.
Physical modelling deals with the representational aspects and the
operational aspects of the database, i.e. the internal DBMS specific
operations and processes and how the DBMS interacts with the data,
the database and the user. The translation from logical design to
physical design assigns functions to both the machine (the DBMS)
and to the user, functions such as storage and security, and
additional aspects such as consistency (of data) and learnability are
dealt with in the physical model/schema. Practically speaking, a
physical schema is the SQL code used to build the database.

One benchmark of a good database is one, which is complete,


integral, simple, understandable, flexible and implementable.
Database modelling strives for a non-redundant, unified
representation of all data managed in an organization. By following
the above methodology, and by using the data models, these
database design ideals are fulfilled. In conclusion, here are two
examples of why using data models is paramount to capturing and
conveying data requirements of the information system:
1.

By drawing up a logical model, extra data items can be added


more easily in that model then in the physical model. A database
design that can change easily according to needs of the company is
important, because it ensures the final database system is complete
and up-to-date.

2.

Another consideration is understandability. By initially creating


a conceptual model, both the designer and the organization are
able to understand the database design and decide if it is complete
or not. If there were no conceptual model, the organization would
not be able to conceptualize the database design and make sure
that it actually represents all the data requirements of the
organization.

3.

By creating a physical model, the designers can have a low


level overview of how the database system would operate before it
is actually implemented.

SQL Statements Implementing the Database

The final step is to physically implement the logical design which


was illustrated in Figure 4. To physically implement the database,
SQL can be used. These are the main steps in implementing the
database:
1. Create the Database Tables
The tables come directly from the information contained in the Data
Dictionary. The following blocks of code each represent a row in the
data dictionary and are executed one after another. The blocks of
create table code contain the details of all the data items
(COMPANY, SUPPLIER, PURCHASES, EMPLOYEE etc), their attributes
(names, ages, costs, numbers and other details), the Relationships
between the data items, the Keys and Data Integrity Rules. All of this
information is already detailed in the Data Dictionary, but now we
are actually converting it and implementing it in a physical database
system.

Explanation:
1.

The create table statement indicates that you want a table to


be created.

2.

The name of the table proceeds the first '('

3.

The table attributes and Data Integrity Rules are defined within
the two parentheses. How the table relates to other tables is also
defined within the two parentheses (e.g. by defining FOREIGN KEYS)

4.

The not null statement means that if you try to populate the
table with values, but leave the value of that attribute empty, you
will get an error.

5.

The varchar2 (19) means a string of 19 characters.

6.

The number (6,2) means a number which can have a number


of up to 6 digits, 2 of them being after the decimal place, e.g. you
can have a number from 0.0 to 1234.56.

7.

The date means that that attribute will be represented as a


date within the database system.

8.

The CONSTRAINT statement means that a constraint is being


defined. This statement will be used to describe which of the
attributes are primary keys and which (if any) of the attributes are
foreign keys (referencing another table).

9.

The CONSTRAINT statement is in the form CONSTRAINT xxx


PRIMARY
KEY(name_of_attribute_that_you_want_as_the_primary_key) or
CONSTRAINT yyy FOREIGN
KEY(name_of_attribute_that_references_another_table) REFERENCES
hhh(name_of_attribute_that_references_another_table) where the
values of xxx and yyy are just arbitrarily made up names which are
not important. hhh is the name of the base table being referenced.

2. Populate the tables


Use SQL statements to populate each table with specific data (such
as employee names, ages, wages etc).

3. Query the database.


Write SQL statements to obtain information and knowledge about
the company, e.g. how many employees are there, total profit etc.

Keys & Data Integrity Rules


Data integrity rules are a core component of a data model. Integrity
rules implicitly or explicitly define the set of consistent database
state(s). So, integrity rules ensure that database states and
changes of state confirm to specified rules. Data integrity rules are
of two types: Entity integrity rules and Referential integrity rules..
How do keys relate to ensuring that changes in database states
confirm to specified rules?
Well, for example, you could ensure that the primary key of an entity
cannot be null. This is one way of ensuring entity integrity. If primary
keys were allowed to be null, then there would be no way of
ensuring that individual entities were uniquely identifiable. If you
cannot ensure that individual entities are uniquely identifiable then
you cant ensure that the database is integral, which is a core
property of a properly designed data model. So, by ensuring that
keys follow certain rules, you can ensure integrity of data.

Another way of enforcing integrity of data via keys, is to ensure that,


if two tables are related to each other, an attribute of one relation
must be the same as the primary attribute (primary key) of the
other one. Enforcing this rule ensures referential integrity of data.
So, we do need integrity rules, and proper defining of keys are a
means of enforcing them.

Relationships
When initially explaining the relational model, E.F. Codd proposed
that users should be abstracted from the internal representation of
the data, such that if the internal representation of that data were to
change (e.g. because of system growth), the way that the user
perceives the data should remain unchanged. This is why he
proposed that users should only interact with a collection of timevarying relationships i.e. a user should only know the name of the
relationship together with its domain (e.g. department is the
domain of employees, and the employees are owned by the
department) rather than the relation (table) itself.
In using the terms relation and table as synonyms, Codd must
have implied that a table should be viewed in terms of its
relationship with other tables. Relationships are what bind the
relations/tables in a database together, so proper understanding is
needed.

Incorrect understanding of relationships may lead to incorrectly


defined relationships between tables. Incorrectly defined
relationships between tables could lead to data not being updated
correctly in some tables, or could cause a data item to be unnecessarily duplicated in another table. All this may lead to
incomplete information in the database, which in turn results in
incomplete knowledge.

Vous aimerez peut-être aussi