Vous êtes sur la page 1sur 99

Designing of Database

Entity-Relationship Model

By Parteek Bhatia Faculty Thapar University Patiala

How to design the database?

How to design the database?


There are two approaches E-R Modeling: Identifying entity and relations Normalization: Refinement of database designing

Entity-Relation Model

E-R Model

The Entity-Relationship (ER) model was originally proposed by Peter in 1976 The ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the EntityRelationship diagram, which is used to visually represent data objects.

Basic Constructs of E-R Modeling

A database can be modeled as:


a collection of entities, relationship among entities.

An entity is an object that exists and is distinguishable from other objects.

Example: specific person, company, event, plant


Example: people have names and addresses Example: set of all persons, companies, trees, holidays

Entities have attributes

An entity set is a set of entities of the same type that share the same properties.

Basic Constructs of E-R Modeling

Entities Entities are the principal data object about which information is to be collected. Entities are usually recognizable concepts, either concrete or abstract, such as person, places, things, or events, which have relevance to the database. Some specific examples of entities are EMPLOYEES, PROJECTS, and INVOICES. An entity is analogous to a table in the relational model.

Relationships

A Relationship represents an association between two or more entities. Relationships are classified in terms of degree, connectivity, cardinality, and existence. An example of a relationship would be:

Employees are assigned to projects


Projects have subtasks Departments manage one or more projects

Entity Sets customer and loan

Relationship Set borrower

Attributes
Attributes describe the properties of the entity of which they are associated. We can classify attributes as following: Simple Composite Single-values Multi-values Derived

Example

Degree of a Relationship

The degree of a relationship is the number of entities associated with the relationship. The n-ary relationship is the general form for degree n. Special cases are the binary, and ternary, where the degree is 2, and 3, respectively.

Connectivity and Cardinality


The connectivity of a relationship describes the mapping of associated entity instances in the relationship. The values of connectivity are "one" or "many". The cardinality of a relationship is the actual number of related occurrences for each of the two entities. The basic types of connectivity for relations are: One to One (1:1) One to Many (1:M) Many to One (M:1) Many to Many (M:M)

Direction
The direction of a relationship indicates the originating entity of a relationship. The entity from which a relationship originates is the parent entity; the entity where the relationship terminates is the child entity. The type of the relation is determined by the direction of line connecting relationship component and the entity. To distinguish different types of relation, we draw either a directed line or an undirected line between the relationship set and the entity set. Directed line is used to indicate one occurrence and undirected line is used to indicate many occurrences in a relation as shown in next case.

E-R Notation

Entities are represented by labeled rectangles. The label is the name of the entity. Entity names should be singular nouns. Attributes are represented by Ellipses. A solid line connecting two entities represents relationships. The name of the relationship is written above the line. Relationship names should be verbs and diamonds sign is used to represent relationship sets. Attributes, when included, are listed inside the entity rectangle. Attributes, which are identifiers, are underlined. Attribute names should be singular nouns. Multi-valued attributes are represented by double ellipses. Directed line is used to indicate one occurrence and undirected line is used to indicate many occurrences in a relation.

E-R Notation

Example

Customer-Loan Relationship

Exercise

Consider the following database: S (S#, SSNAME, STATUS, CITY) P (P#, PNAME, COLOR, WEIGHT, CITY) J ( J#, JNAME, CITY) SPJ( S#, P#, J#, QTY) Here, S indicates information of suppliers, P Parts, J Projects and SPJ indicates the supplied quantity details.

Total Participation

Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set E.g. participation of loan in borrower is total

every loan must have a customer associated to it via borrower

Partial participation: some entities may not participate in any relationship in the relationship set Example: participation of customer in borrower is partial

Some More Examples

Representation of Cardinality in the E-R Diagram

The cardinality of a relationship is the actual number of related occurrences for each of the two entities. E-R diagrams also provide a way to indicate cardinality of the relationship. An edge between an entity set and a relationship set can have an associated minimum and maximum cardinality, shown in the form l..h, where l is the minimum and h the maximum cardinality. A minimum value of 1 indicates total participation of the entity set in the relationship set. A maximum value of 1 indicates that the entity participates in at most one relationship, while a maximum value * indicates no limit. The label 1..* on an edge is equivalent to a double line.

Phone_no Address City Loan_No amount

C_name
Customer

0..*

Cust_ Loan

1..1

Loan

It is easy to misinterpret the 0..* on the edge between customer and Cust_Loan, and think that the relationship Cust_Loan is many to one from customer to loan, this is exactly the reverse of the correct interpretation.

For example,the edge between loan and Cust_Loan has a cardinality constraint of 1..1, meaning the minimum and the maximum cardinality are both 1. That is, each loan must have exactly one associated customer. The limit 0..* on the edge from customer to Cust_Loan indicates that a customer can have zero or more loans. Thus, the relationship Cust_Loan is one to many from customer to loan, and further the participation of loan in Cust_Loan is total.

Phone_no Address City Loan_No amount

C_name
Customer

0..*

Cust_ Loan

1..1

Loan

If both edges from a binary relationship have a maximum value of 1, the relationship is one to one. If we had specified a cardinality limit of 1..* on the edge between customer and Cust_Loan, we would be saying that each customer must have at least one loan.

Some More Examples

Some More Examples

Some More Examples

Consider the following database: S (S#, SSNAME, STATUS, CITY)

P (P#, PNAME, COLOR, WEIGHT, CITY)

J ( J#, JNAME, CITY)

SPJ( S#, P#, J#, QTY)

Here S indicates information of suppliers, P Parts, J Projects and SPJ indicates the supplied quantity details.

Car-insurance company It has a set of customers, each of who owns one or more cars. Car may have any number of customers. Each car has associated with it zero to any number of recorded accidents. System should store date and location of accident. Car insurance company will pay damage amount for the accidental cars to concerned driver.

Case Study of University Management System

Consider, a university contains many departments. Each department can offer any number of courses. Many teachers can work in a department. A teacher can work only in one department. For each department there is a Head. A teacher can be head of only one department. Each teacher can take any number of courses. A course can be taken by only one instructor. A student can enroll for any number of courses. Each course can have any number of students.

Steps to design E-R diagram

First Step to Identify the Entities Second Step to find relationships among these entities Step 3 to identify the key attributes Step 4 to identify other relevant attributes Step 5 to draw the complete e-r diagram

First Step to Identify the Entities

In order to identify the entities collect all the noun in the requirement sheet which has some properties and are important for the system. We can identify the following nouns: University, Department, Course, Teacher, Student. Here, the database is of only one university. If an entity has a single instance then that entity is ignored. Thus, the final entities are: DEPARTMENT COURSE TEACHER STUDENT

Second Step to find relationships among these entities

Each department can offer any number of courses and we can assume that each course belongs to only one department. Then the connectivity of the relation ship among DEPARTMENT and COURSE is One to Many. If a course can run in more than one department then it is Many to Many. Many teachers can work in a department and a teacher can work only in one department. Thus the connectivity among DEPARTMENT and TEACHER is one to many. For each department there is a Head and a teacher can be head of only one department. Hence, the connectivity is one to one. Each teacher can take any number of courses and a course can be taken by only one instructor. Thus, the connectivity between TEACHER and COURSE is one to many. A student can enroll for any number of courses and each course can have any number of students. Thus, the connectivity between STUDENT and COURSE is many to many.

Step 3 to identify the key attributes

Following are the primary key attributes for each entity set: Dno (Department number) is the key attribute for the Entity DEPARTMENT. C_code (Course number) is the key attribute for COURSE Entity. Roll_no (Roll number) is the key attribute for STUDENT Entity. T_code (Teacher code) is the key attribute for TEACHER Entity.

Step 4 to identify other relevant attributes

Following are the other relevant attributes for each entity set: DEPARTMENT entity will have other relevant attributes as dname, loc. For COURSE entity, c_name, credits. For TEACHER entity, name, mob_no For STUDENT entity, name, address

Step 5 to draw the complete e-r diagram


dno dname loc c_code c_name credits

DEPARTMENT

offers

COURSE

head s

has teach

enroll

TEACHER

STUDENT

t_code

name

mob_no

roll_no

name

address

Strong and Weak Entity Sets The entity set which does not has sufficient attributes to form a primary key is called as weak entity set. An entity set that has a primary key is called as Strong entity set. Consider an entity set Payment which has three attributes: payment_number, payment_date and payment_amount. Although each payment entity is distinct but payment for different loans may share the same payment number. Thus, this entity set does not have a primary key and it is a weak entity set. Each weak set must be a part of one-to-many relationship set.

A member of a strong entity set is called dominant entity and member of weak entity set is called as subordinate entity. A weak entity set does not have a primary key but we need a means of distinguishing among all those entries in the entity set that depend on one particular strong entity set. The discriminator of a weak entity set is a set of attributes that allows this distinction to be made. For example, payment_number is acts as discriminator for payment entity set. It is also called as the Partial key of the entity set.

The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity sets discriminator. In the above example {loan_number, payment_number} acts as primary key for payment entity set.

The relationship between weak entity and strong entity set is called as Identifying Relationship. In example, loan-payment is the identifying relationship for payment entity. A weak entity set is represented by doubly outlined box and corresponding identifying relation by a doubly outlined diamond

Case Study
Represent each of the following requirements with an ER diagram: A regional council requires the design of a database system that can provide information on all schools in the region. The requirements collection and analysis phase of the database design process has provided the following data requirements for the schools database system. (a) Every school has many pupils and many teachers. Each pupil is assigned to one school and each teacher works for one school only. (b) Each teacher teaches more than one subject but a subject may be taught by more than one teacher. The database should store the number of hours a teacher spent teaching a subject. Data held on each teacher includes his/her national Insurance Number (NIN), name (first and last), sex, and qualifications. The data held on each subject includes subject title and type. (c) Each pupil can study more than one subject and a subject may be studied by more than one pupil. Data held on each pupil includes the pupil's code, name (first and last), sex, and date of birth. (d) Each school is managed by one of its teachers. The database should keep track of the date he/she started managing the school. Data stored on each school includes the school's code, name, address (town, street, and post code) and phone.

Design Issues Use of Entity Sets versus Attributes

Consider the entity set employee with attributes employee-name and telephone-number. It can easily be argued that a telephone is an entity in its own right with attributes telephone-number and location (the office where the telephone is located). If we take this point of view, we must redefine the employee entity set as: The employee entity set with attribute employee-name The telephone entity set with attributes telephonenumber and location The relationship set emp-telephone, which denotes the association between employees and the telephones that they have

What, then, is the main difference between these two definitions of an employee?

Treating a telephone as an attribute telephonenumber implies that employees have precisely one telephone number each. Treating a telephone as an entity telephone permits employees to have several telephone numbers (including zero) associated with them However, we could instead easily define telephonenumber as a multivalued attribute to allow multiple telephones per employee.

The main difference then is that treating a telephone as an entity better models a situation where one may want to keep extra information about a telephone, such asits location, or its type (mobile, video phone, or plain old telephone), or who all share the telephone. Thus, treating telephone as an entity is more general than treating it as an attribute and is appropriate when the generality may be useful.

In contrast, it would not be appropriate to treat the attribute employee-name as an entity; it is difficult to argue that employeename is an entity in its own right (in contrast to the telephone). Thus, it is appropriate to have employee-name as an attribute of the employee entity set.

Two natural questions thus arise: What constitutes an attribute, and what constitutes an entity set? Unfortunately, there are no simple answers. The distinctions mainly depend on the structure of the real-world enterprise being modeled, and on the semantics associated with the attribute in question.

Some common mistakes we make during designing of E-R diagram

A common mistake is to use the primary key of an entity set as an attribute of another entity set, instead of using a relationship. For example, it is incorrect to model customer-id as an attribute of loan even if each loan had only one customer. The relationship borrower is the correct way to represent the connection between loans and customers, since it makes their connection explicit, rather than implicit via an attribute. Another related mistake that people sometimes make is to designate the primary key attributes of the related entity sets as attributes of the relationship set. This should not be done, since the primary key attributes are already implicit in the relationship.

Use of Entity Sets versus Relationship Sets

It is not always clear whether an object is best expressed by an entity set or a relationship set. We assumed that a bank loan is modeled as an entity. An alternative is to model a loan not as an entity, but rather as a relationship betwee customers and branches, with loan-number and amount as descriptive attributes. Each loan is represented by a relationship between a customer and a branch.

If every loan is held by exactly one customer and is associated with exactly one branch, we may find satisfactory the design where a loan is represented as a relationship.

Problems

However, with this design, we cannot represent conveniently a situation in which several customers hold a loan jointly. To handle such a situation, we must define a separate relationship for each holder of the joint loan. Then, we must replicate the values for the descriptive attributes loan-number and amount in each such relationship. Each such relationship must, of course, have the same value for the descriptive attributes loan-number and amount.

Two problems arise as a result of the replication: (1) the data are stored multiple times, wasting storage space, and (2) updates potentially leave the data in an inconsistent state, where the values differ in two relationships for attributes that are supposed to have the same value. The issue of how to avoid such replication is treated formally by normalization theory

The problem of replication of the attributes loannumber and amount is absent in the original design because there loan is an entity set. One possible guideline in determining whether to use an entity set or a relationship set is to designate a relationship set to describe an action that occurs between entities. This approach can also be useful in deciding whether certain attributes may be more appropriately expressed as relationships.

Generalization: A bottom-up design process

A generalization hierarchy is a form of abstraction that specifies that two or more entities that share common attributes can be generalized into a higher-level entity type called a supertype or generic entity. The lower level of entities becomes the subtype, or categories, to the super type. Subtypes are dependent entities.
Generalization is used to emphasize the similarities among lower-level entity sets and to hide differences. It makes ER diagram simpler because shared attributes are not repeated. Generalization is denoted through a triangle component labeled IS A,

Specialization: Top-down design process Specialization is the process of taking subsets of a higher-level entity set to form lower level entity sets. It is a process of defining a set of subclasses of an entity type, which is called as superclas of the specialization. The process of defining subclass is based on the basis of some distinguish characteristics of the entities in the super class. For example, specialization of the Employee entity type may yield the set of subclasses namely Salaried_Employee and Hourly_Employee on the method of pay

Difference between Specialization and Generalization

Specialization is the process of taking subsets of a higher-level entity set to form lower level entity sets. Specialization emphasizes differences among entities within the set by creating distinct lower-level entity sets. These lower-level entity sets may have attributes, or may participate in relationships, that do not apply to all entities in the higher-level entity set. Generalization proceeds from the recognition that a number of entities set share some common features, which are described by the same, attributes and participate in the same relationship sets. Generalization is used to emphasize the similarities among lowerlevel entity sets and to hide differences.

Design Constraints on a Specialization/Generalization

Constraint on which entities can be members of a given lower-level entity set.

condition-defined OR Attribute Defined

Example: all customers over 65 years are members of senior-citizen entity set; senior-citizen ISA person. All account entities are evaluated on the defining account-type attribute. Only those entities that satisfy the condition account-type = savings account are allowed to belong to the lower-level entity set person. All entities that satisfy the condition account-type = checking account are included in checking account. Since all the lower-level entities are evaluated on the basis of the same attribute (in this case, on account-type), this type of generalization is said to be attributedefined.

user-defined User-defined lower-level entity sets are not constrained by a membership condition; rather, the database user assigns entities to a given entity set. For instance, let us assume that, after 3 months of employment, bank employees are assigned to one of four work teams.We therefore represent the teams as four lower-level entity sets of the higher-level employee entity set. A given employee is not assigned to a specific team entity automatically on the basis of an explicit defining condition. Instead, the user in charge of this decision makes the team assignment on an individual basis. The assignment is implemented by an operation that adds an entity to an entity set.

Constraint on whether or not entities may belong to more than one lower-level entity set within a single generalization. Disjoint

A disjointness constraint requires that an entity belong to no more than one lower-level entity set. In our example, an account entity can satisfy only one condition for the account-type attribute; an entity can be either a savings account or a checking account, but cannot be both. Overlapping an entity can belong to more than one lower-level entity set

Overlapping. In overlapping generalizations, the same entity may belong to more than one lower-level entity set within a single generalization. For an illustration, consider the employee work team example, and assume that certain managers participate in more than one work team.A given employee may therefore appear in more than one of the team entity sets that are lowerlevel entity sets of employee. Thus, the generalization is overlapping. As another example, suppose generalization applied to entity sets customer and employee leads to a higher-level entity set person. The generalization is overlapping if an employee can also be a customer.

Design Constraints on a Specialization/Generalization (Cont.)

Completeness constraint -- specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization.

total : an entity must belong to one of the lowerlevel entity sets partial: an entity need not belong to one of the lower-level entity sets

Partial generalization is the default.We can specify total generalization in an E-R diagram by using a double line to connect the box representing the higher-level entity set to the triangle symbol. (This notation is similar to the notation for total participation in a relationship.) The account generalization is total: All account entities must be either a savings account or a checking account. Because the higher-level entity set arrived at through generalization is generally composed of only those entities in the lower-level entity sets, the completeness constraint for a generalized higher-level entity set is usually total. When the generalization is partial, a higher-level entity is not constrained to appear in a lower-level entity set. The work team entity sets illustrate a partial specialization. Since employees are assigned to a team only after 3 months on the job, some employee entities may not be members of any of the lower-level team entity sets. We may characterize the team entity sets more fully as a partial, overlapping specialization of employee.

The generalization of checking-account and savings-account into account is a total, disjoint generalization. The completeness and disjointness constraints, however, do not depend on each other. Constraint patterns may also be partial-disjoint and totaloverlapping. We can see that certain insertion and deletion requirements follow from the constraints that apply to a given generalization or specialization. For instance, when a total completeness constraint is in place, an entity inserted into a higher-level entity set must also be inserted into at least one of the lower-level entity sets. With a condition-defined constraint, all higher-level entities that satisfy the condition must be inserted into that lower-level entity set. Finally, an entity that is deleted from a higher-level entity set also is deleted from all the associated lower-level entity sets to which it belongs.

Aggregation

One limitation of the E-R model is that it cannot express relationships among relationships.

The best way to model a situation like this is by the use of aggregation. Thus, the relationship set work_on relating the entity sets Employee, Branch and Job is a higher-level entity set. Such an entity set is treated in the same manner, as is any other entity set. We can then create a binary relationship Manages between work_on and Manager to represent who manages what tasks.

Suppose a customer loan pair may have a bank emp Who is a loan officer for that particular pair. CUST LOAN

BORROWER

LOANOFFICER

EMP

The best way to model the situation is to use aggregation. Aggregation is asn abstraction through which relationship are treated as higher level entities
CUST LOAN

BORROWER

LOAN-OFFICER

EMP

E-R Diagram of Baking System

Case Study: Database Design for Banking Enterprise

Here are the major characteristics of the banking enterprise. The bank is organized into branches. Each branch is located in a particular city and is identified by a unique name. The bank monitors the assets of each branch.

References

Simplified Approach To DBMS By Parteek Bhatia Kalyani Publishers

Vous aimerez peut-être aussi