Vous êtes sur la page 1sur 172

Gokaraju Rangaraju Institute of Engineering and Technology

Department of Computer Science and Engineering


Year/Semester : II / I Academic year: 2015-16
SUB:DBMS Tutorial Sheet: UNIT I-1
DATABASE SYSTEM APPLICATIONS
Short answer questions
1. Define the terms “Data” and “Information”. Give examples?
2. Define the term “Database” and “DBMS”?
3. List the applications of DBMS?
4. What are the advantages of DBMS?
5. What are the disadvantages of File processing system?
6. Define Data abstraction?
7. What are the three levels of data abstraction?
8. Define Instance and Schema?
9. What is Data Model? What are different data models?
10. What is data independence?
11. Differentiate between logical and physical independence?
12. Write about different types of Database users?
13. Write about the Conceptual schema?
14. Write about the External schema?
15. Write about the Physical schema?
Descriptive questions/Programs/Experiments
1. Distinguish between DBMS & File System?
2. Explain the different levels of data abstraction?
3. Define data model. Explain about ER model, Relational Model and other models?
4. Explain about data base system structure with neat sketch?
5. Explain about the DBA responsibilities?

Tutor Faculty HOD

Gokaraju Rangaraju Institute of Engineering and Technology


Department of Computer Science and Engineering

1
Year/Semester : II / I Academic year: 2015-16
Tutorial Sheet: I-1 Question & Answers

SHORT ANSWER QUESTIONS

1. Define the terms “Data” and “Information”. Give examples.


Ans: Data are simply facts or figures — bits of information, but not information itself. When
data are processed, interpreted, organized, structured or presented so as to make them meaningful
or useful, they are called information. Information provides context for data.
For example, a list of dates — data — is meaningless without the information that makes
the dates relevant (dates of holiday).
"Data" and "information" are intricately tied together, whether one is recognizing them as
two separate words or using them interchangeably, as is common today. Whether they are used
interchangeably depends somewhat on the usage of "data" — its context and grammar.
Examples of Data and Information
 The history of temperature readings all over the world for the past 100 years is data. If this
data is organized and analyzed to find that global temperature is rising, then that is
information.
 The number of visitors to a website by country is an example of data. Finding out that traffic
from the U.S. is increasing while that from Australia is decreasing is meaningful
information.
2. Define the term “Database” and “DBMS”.
Ans: A database is an organized collection of data. It is the collection of tables, queries,
reports, views and other objects. The data is typically organized to model aspects of reality in a
way that supports processes requiring information, such as modeling the availability of rooms in
hotels in a way that supports finding a hotel with vacancies.
Database management systems (DBMS) are computer software applications that
interact with the user, other applications, and the database itself to capture and analyze data. A
general-purpose DBMS is designed to allow the definition, creation, querying, update, and
administration of databases. Well-known DBMSs include MySQL, PostgreSQL, Microsoft SQL
Server, Oracle, Sybase and IBM DB2.

2
Database management systems are often classified according to the database model that
they support; the most popular database systems since the 1980s have all supported the relational
model as represented by the SQL language. Sometimes a DBMS is loosely referred to as a
'database'.
3.List the applications of DBMS.
Ans: Databases are widely used. Here are some representative applications:
 Banking: For customer information, accounts, and loans, and banking transactions.
 Airlines: For reservations and schedule information.
 Universities: For student information, course registrations, and grades.
 Credit card transactions: For purchases on credit cards and generation of monthly
statements.
 Telecommunication: For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards, and storing information about the
communication networks.
 Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds.
 Sales: For customer, product, and purchase information.
 Manufacturing: For management of supply chain and for tracking production of items in
factories, inventories of items in warehouses/stores, and orders for items.
 Human resources: For information about employees, salaries, payroll taxes and benefits,
and for generation of paychecks.
4. What are the advantages of DBMS?
Ans: Using a DBMS to manage data has many advantages:
 Data independence: Application programs should be as independent as possible
from details of data representation and storage.
 Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently.
 Data integrity and security: If data is always accessed through the DBMS, the DBMS can
enforce integrity constraints on the data. Also, the DBMS can enforce access controls that
govern what data is visible to different classes of users.

3
 Data administration: When several users share the data, centralizing the administration
of data can offer significant improvements.
 Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the
data in such a manner that users can think of the data as being accessed by only one user at a
time. Further, the DBMS protects users from the effects of system failures.
 Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS. This, in
conjunction with the high-level interface to the data, facilitates quick development of
applications.
5. What are the disadvantages of File processing system?
Ans: Keeping organizational information in a file-processing system has a number of major
disadvantages:
 Data redundancy and inconsistency: Since different programmers create the files and
application programs over a long period, the various files are likely to have different formats
and the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places (files). In addition, it may lead to data
inconsistency; that is, the various copies of the same data may no longer agree.
 Difficulty in accessing data: We have to write special programs to answer each question
that users may want to ask about the data. These programs are likely to be complex because
of the large volume of data to be searched.
 Data isolation: Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
 Integrity problems: The data values stored in the database must satisfy certain types of
consistency constraints. Developers enforce these constraints in the system by adding
appropriate code in the various application programs.
 Atomicity problems: The transactions must be atomic—it must happen in its entirety or not
at all. It is difficult to ensure atomicity in a conventional file-processing system.
 Concurrent-access anomalies: We must protect the data from inconsistent changes made
by different users accessing the data concurrently. If programs that access the data are
written with such concurrent access in mind, this adds greatly to their complexity.
 Security problems: Not every user of the database system should be able to access all the
4
data
6. Define Data abstraction.
Ans: A major purpose of a database system is to provide users with an abstract view of the data.
That is, the system hides certain details of how the data are stored and maintained. For the
system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to
use complex data structures to represent data in the database. Since many database-systems users
are not computer trained, developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system: They are :
A. Physical Level
B. Logical or Conceptual Level
C. View or External Level
7.What are the three levels of data abstraction?
Ans: The data in a DBMS is described at three levels of abstraction, as illustrated in Figure. The
database description consists of a schema at each of these three levels of abstraction: the
conceptual or Logical, physical, and external or View schemas.

External External Schema External Schema


Schema

Conceptual Schema

Physical Schema

Disk

8.Define Instance and Schema.


Ans: Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema. Schemas are changed
infrequently, if at all.

5
A database schema corresponds to the variable declarations (along with associated type
definitions) in a program. Each variable has a particular value at a given instant. The values of
the variables in a program at a point in time correspond to an instance of a database schema.
Database systems have several schemas, partitioned according to the levels of abstraction. The
physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level. A database may also have several schemas at
the view level, sometimes called sub schemas that describe different views of the database.
9.What is Data Model? What are different data models?
Ans: Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. The various data
models that have been proposed fall into three different groups:
a) Obect-based logical models
i) Entity Relationship model
ii) Object-oriented model
iii) Semantic data model
iv) Functional data model
b) Record based logical models
i) Relational model
ii) Network model
iii) Hierarchical model
c) Physical Data model
10. What is data independence.
Ans: A very important advantage of using a DBMS is that it offers data independence. That is,
application programs are insulated from changes in the way the data is structured and stored.
Data independence is achieved through use of the three levels of data abstraction; in particular,
the conceptual schema and the external schema provide distinct benefits in this area.
11. Differentiate between logical and physical independence.
Ans: Physical data independence is the ability to modify the physical scheme without making
it necessary to rewrite application programs. Such modifications include changing from
unblocked to blocked record storage, or from sequential to random access files.
Logical data independence is the ability to modify the conceptual scheme without

6
making it necessary to rewrite application programs. Such a modification might be adding a field
to a record; an application program’s view hides this change from the program
Logical Data independence Physical Data independence
It is concerned with the structure of the It is concerned with storage of the data.
data or changing the data definition.
It is very difficult as the retrieving of data It is easy to retrieve.
are heavily dependent on logical structure
of data.
Application program need not be changed Physical database is concerned with the
if new fields are added or deleted from the change of the storage device.
database
It is concerned with the conceptual It is concerned with the internal Schema
Schema
12. Write about different types of Database users.
Ans: There are four different types of database-system users, differentiated by the way they
expect to interact with the system. Different types of user interfaces have been designed for the
different types of users.
 Naive users are unsophisticated users who interact with the system by invoking
one of the application programs that have been written previously.
 Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
 Sophisticated users interact with the system without writing programs. Instead, they form
their requests in a database query language. They submit each such query to a query
processor, whose function is to break down DML statements into instructions that the storage
manager understands.
 Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework. Among these applications are
computer-aided design systems, knowledge base and expert systems.
13. Write about the Conceptual schema.
Ans: The conceptual schema (sometimes called the logical schema) describes the stored data in

7
terms of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all
relations that are stored in the database. In a sample university database, these relations contain
information about entities, such as students and faculty, and about relationships, such as students'
enrollment in courses. In fact, each collection of entities and each collection of relationships can
be described as a relation, leading to the following conceptual schema:
Students(sid: string, name: string, login: string,
age: integer, gpa: real)
Faculty(_d: string, fname: string, sal: real)
Courses(cid: string, cname: string, credits: integer)
Rooms(rno: integer, address: string, capacity: integer)
Enrolled(sid: string, cid: string, grade: string)
Teaches(_d: string, cid: string)
Meets In(cid: string, rno: integer, time: string)
The choice of relations, and the choice of fields for each relation, is not always obvious, and
the process of arriving at a good conceptual schema is called conceptual database design.
14. Write about the External schema.
Ans: External schemas allow data access to be customized (and authorized) at the level of
individual users or groups of users. Any given database has exactly one conceptual schema and
one physical schema because it has just one set of stored relations, but it may have several
external schemas, each tailored to a particular group of users. Each external schema consists of a
collection of one or more views and relations from the conceptual schema.
A view is conceptually a relation, but the records in a view are not stored in the DBMS.
The external schema design is guided by end user requirements. For example, we might want to
allow students to find out the names of faculty members teaching courses, as well as course
enrollments. This can be done by defining the following view:
Courseinfo(cid: string, fname: string, enrollment: integer)
A user can treat a view just like a relation and ask questions about the records in the view.
Even though the records in the view are not stored explicitly, they are computed as needed. We
did not include Courseinfo in the conceptual schema because we can compute Courseinfo from
the relations in the conceptual schema, and to store it in addition would be redundant. Such
redundancy, in addition to the wasted space, could lead to inconsistencies.

8
15. Write about the Physical schema.
Ans: The physical schema specifies additional storage details. Essentially, the physical schema
summarizes how the relations described in the conceptual schema are actually stored on
secondary storage devices such as disks and tapes. We must decide what file organizations to use
to store the relations, and create auxiliary data structures called indexes to speed up data retrieval
operations.
A sample physical schema for the university database follows:
Store all relations as unsorted files of records.
Create indexes on the first column of the Students, Faculty, and Courses relations, the sal column
of Faculty, and the capacity column of Rooms.
The process of arriving at a good physical schema is called physical database design.
LONG ANSWERS
1. Distinguish between DBMS & File System.
Ans: File-processing system is supported by conventional operating system.The system stores
permanent records in various files, and it needs different application programs to extract records
from, and add records to, the appropriate files. Before database management systems (DBMSs)
came along, organizations usually stored information in such systems. Keeping organizational
information in a file-processing system has a number of major disadvantages:
 Data redundancy and inconsistency. Since different programmers create the files and
application programs over a long period, the various files are likely to have different
formats and the programs may be written in several programming languages. Moreover,
the same information may be duplicated in several places (files). For example, the
address and telephone number of a particular customer may appear in a file that consists
of savings-account records and in a file that consists of checking-account records. This
redundancy leads to higher storage and access cost. In addition, it may lead to data
inconsistency; that is, the various copies of the same data may no longer agree. For
example, a changed customer address may be reflected in savings-account records but not
elsewhere in the system.
 Difficulty in accessing data. Suppose that one of the bank officers needs to find out the
names of all customers who live within a particular postal-code area. The officer asks the
data-processing department to generate such a list. Because the designers of the original

9
system did not anticipate this request, there is no application program on hand to meet it.
There is, however, an application program to generate the list of all customers. The bank
officer has now two choices: either obtain the list of all customers and extract the needed
information manually or ask a system programmer to write the necessary application
program. Both alternatives are obviously unsatisfactory.
Suppose that such a program is written, and that, several days later, the same
officer needs to trim that list to include only those customers who have an account
balance of $10,000 or more. As expected, a program to generate such a list does not exist.
Again, the officer has the preceding two options, neither of which is satisfactory. The
point here is that conventional file-processing environments do not allow needed data to
be retrieved in a convenient and efficient manner. More responsive data-retrieval systems
are required for general use.
 Data isolation. Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve the appropriate data is
difficult.
 Integrity problems. The data values stored in the database must satisfy certain types
of consistency constraints. For example, the balance of a bank account may never fall
below a prescribed amount (say, $25). Developers enforce these constraints in the
system by adding appropriate code in the various application programs. However,
when new constraints are added, it is difficult to change the programs to enforce
them. The problem is compounded when constraints involve several data items from
different files.
 Atomicity problems. A computer system, like any other mechanical or electrical
device, is subject to failure. In many applications, it is crucial that, if a failure occurs,
the data be restored to the consistent state that existed prior to the failure. Consider a
program to transfer $50 from account A to account B. If a system failure occurs
during the execution of the program, it is possible that the $50 was removed from
account A but was not credited to account B, resulting in an inconsistent database
state. Clearly, it is essential to database consistency that either both the credit and
debit occur, or that neither occur. That is, the funds transfer must be atomic—it must
happen in its entirety or not at all. It is difficult to ensure atomicity in a conventional

10
file-processing system.
 Concurrent-access anomalies. For the sake of overall performance of the system
and faster response, many systems allow multiple users to update the data
simultaneously. In such an environment, interaction of concurrent updates may result
in inconsistent data. Consider bank account A, containing $500. If two customers
withdraw funds (say $50 and $100 respectively) from account A at about the same
time, the result of the concurrent executions may leave the account in an incorrect (or
inconsistent) state. Suppose that the programs executing on behalf of each withdrawal
read the old balance, reduce that value by the amount being withdrawn, and write the
result back. If the two programs run oncurrently, they may both read the value $500,
and written back $450 and $400, respectively. Depending on which one writes the
value last, the account may contain either $450 or $400, rather than the correct value
of $350. To guard against this possibility, the system must maintain some form of
supervision. But supervision is difficult to provide because data may be accessed by
many different application programs that have not been coordinated previously.
 Security problems. Not every user of the database system should be able to access
all the data. For example, in a banking system, payroll personnel need to see only that
part of the database that has information about the various bank employees. They do
not need access to information about customer accounts. But, since application
programs are added to the system in an ad hoc manner, enforcing such security
constraints is difficult.
2. Explain the different levels of data abstraction.
Ans: For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database-
systems users are not computer trained, developers hide the complexity from users through
several levels of abstraction, to simplify users’ interactions with the system:
 Physical level. The lowest level of abstraction describes how the data are actually stored.
The physical level describes complex low-level data structures in detail.
 Logical level. The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes
the entire database in terms of a small number of relatively simple structures. Although

11
implementation of the simple structures at the logical level may involve complex
physical-level structures, the user of the logical level does not need to be aware of this
complexity. Database administrators, who must decide what information to keep in the
database, use the logical level of abstraction.
 View level. The highest level of abstraction describes only part of the entire database.
Even though the logical level uses simpler structures, complexity remains because of the
variety of information stored in a large database. Many users of the database system do
not need all this information; instead, they need to access only a part of the database. The
View level of abstraction exists to simplify their interaction with the system. The system
may provide many views for the same database.
Figure 1.1 shows the relationship among the three levels of abstraction.

View Level
view 1 view 2 View n
.....
Logical level

Physical
level

An analogy to the concept of data types in programming languages may clarify the
distinction among levels of abstraction. Most high-level programming languages support the
notion of a record type. For example, in a Pascal-like language, we may declare a record as
follows:
type customer = record
customer-id : string;
customer-name : string;
customer-street : string;
customer-city : string;
end;
This code defines a new record type called customer with four fields. Each field has a name and
a type associated with it. A banking enterprise may have several such record types, including
• account, with fields account-number and balance

12
• employee, with fields employee-name and salary
At the physical level, a customer, account, or employee record can be described as a
block of consecutive storage locations (for example, words or bytes). The language compiler
hides this level of detail from programmers. Similarly, the database system hides many of the
lowest-level storage details from database programmers.
Database Administrators, on the other hand, may be aware of certain details of the
physical organization of the data.
At the logical level, each such record is described by a type definition, as in the previous
code segment, and the interrelationship of these record types is defined as well. Programmers
using a programming language work at this level of abstraction. Similarly, database
administrators usually work at this level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide
details of the data types. Similarly, at the view level, several views of the database are defined,
and database users see these views. In addition to hiding details of the logical level of the
database, the views also provide a security mechanism to prevent users from accessing certain
parts of the database. For example, tellers in a bank see only that part of the database that has
information on customer accounts; they cannot access information about salaries of employees.
3. Define data model. Explain about ER model, Relational Model and other models.
Ans: Underlying the structure of a database is the data model: a collection of conceptual tools
for describing data, data relationships, data semantics, and consistency constraints.
The Entity-Relationship Model
The entity-relationship (E-R) data model is based on a perception of a real world that
consists of a collection of basic objects, called entities, and of relationships among these objects.
An entity is a “thing” or “object” in the real world that is distinguishable from other objects. For
example, each person is an entity, and bank accounts can be considered as entities.
Entities are described in a database by a set of attributes. For example, the attributes
account-number and balance may describe one particular account in a bank, and they form
attributes of the account entity set. Similarly, attributes customer-name, customer-street address
and customer-city may describe a customer entity. An extra attribute customer-id is used to
uniquely identify customers (since it may be possible to have two customers with the same name,
street address, and city).

13
A unique customer identifier must be assigned to each customer. In the United States,
many enterprises use the social-security number of a person (a unique number the U.S.
government assigns to every person in the United States) as a customer identifier.
A relationship is an association among several entities. For example, a depositor
relationship associates a customer with each account that she has. The set of all entities of the
same type and the set of all relationships of the same type are termed an entity set and
relationship set, respectively.
The overall logical structure (schema) of a database can be expressed graphically by an
E-R diagram, which is built up from the following components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationships among entity sets
• Lines, which link attributes to entity sets and entity sets to relationships
Each component is labeled with the entity or relationship that it represents. In addition to
entities and relationships, the E-R model represents certain constraints to which the contents of a
database must conform. One important constraint is mapping cardinalities, which express the
number of entities to which another entity can be associated via a relationship set. The entity-
relationship model is widely used in database design.
Relational Model
The relational model uses a collection of tables to represent both data and the
relationships among those data. Each table has multiple columns, and each column has a unique
name. Figure presents a sample relational database comprising three tables:
One shows details of bank customers, the second shows accounts, and the third shows
which accounts belong to which customers.
The first table, the customer table, shows, for example, that the customer identified by
customer-id 192-83-7465 is named Johnson and lives at 12 Alma St. in Palo Alto.
The second table, account, shows, for example, that account A-101 has a balance of
$500, and A-201 has a balance of $900.
The third table shows which accounts belong to which customers.
For example, account number A-101 belongs to the customer whose customer-id is 192-
83-7465, namely Johnson, and customers 192-83-7465 (Johnson) and 019-28-3746 (Smith) share

14
account number A-201 (they may share a business venture).

The relational model is an example of a record-based model. Record-based models are so


named because the database is structured in fixed-format records of several types. Each table
contains records of a particular type. Each record type defines a fixed number of fields, or
attributes. The columns of the table correspond to the attributes of the record type. It is not hard
to see how tables may be stored in files. For instance, a special character (such as a comma) may
be used to delimit the different attributes of a record, and another special character (such as a

15
newline character) may be used to delimit records. The relational model hides such low-level
implementation details from database developers and users.
The relational data model is the most widely used data model, and a vast majority of
current database systems are based on the relational model. The relational model is at a lower
level of abstraction than the E-R model.
Other Data Models
The object-oriented data model is another data model that has seen increasing attention.
The object-oriented model can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identity.
The object-relational data model combines features of the object-oriented data model
and relational data model.
Semi structured data models permit the specification of data where individual data items
of the same type may have different sets of attributes. This is in contrast with the data models
mentioned earlier, where every data item of a particular type must have the same set of attributes.
The extensible markup language (XML) is widely used to represent semi structured data.
Historically, two other data models, the network data model and the hierarchical data
model, preceded the relational data model. These models were tied closely to the underlying
implementation, and complicated the task of modeling data. As a result they are little used now,
except in old database code that is still in service in some places.
4. Explain about data base system structure with neat sketch.
Ans: A database system is partitioned into modules that deal with each of the responsibilities of
the overall system. The functional components of a database system can be broadly divided into
the storage manager and the query processor components. The storage manager is important
because databases typically require a large amount of storage space. Corporate databases range
in size from hundreds of gigabytes to, for the largest databases, terabytes of data. A gigabyte is
1000 megabytes (1 billion bytes), and a terabyte is 1 million megabytes (1 trillion bytes). Since
the main memory of computers cannot store this much information, the information is stored on
disks. Data are moved between disk storage and main memory as needed. Since the movement of
data to and from disk is slow relative to the speed of the central processing unit, it is imperative
that the database system structure the data so as to minimize the need to move data between disk
and main memory.

16
The query processor is important because it helps the database system simplify and
facilitate access to data. High-level views help to achieve this goal; with them, users of the
system are not be burdened unnecessarily with the physical details of the implementation of the
system. However, quick processing of updates and queries is important. It is the job of the
database system to translate updates and queries written in a nonprocedural language, at the
logical level, into an efficient sequence of operations at the physical level.
Storage Manager
A storage manager is a program module that provides the interface between the low level
data stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible for the interaction with the file manager. The raw data are
stored on the disk using the file system, which is usually provided by a conventional operating
system. The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in
the database.
The storage manager components include:
1) Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
2) Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
3) File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
4) Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
Data files, which store the database itself.
Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
Indices, which provide fast access to data items that hold particular values.

17
The Query Processor
The query processor components include
DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. A query can
usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs query optimization, that is, it picks the lowest cost
evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
Figure shows these components and the connections among them.

18
5. Explain about the DBA responsibilities.
Ans: One of the main reasons for using DBMSs is to have central control of both the data and
the programs that access those data. A person who has such central control over the system is
called a database administrator (DBA). The functions of a DBA include:
 Schema definition. The DBA creates the original database schema by executing a set of

19
data definition statements in the DDL.
 Storage structure and access-method definition.
 Schema and physical-organization modification. The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or to
alter the physical organization to improve performance.
 Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access.
The authorization information is kept in a special system structure that the database system
consults whenever someone attempts to access the data in the system.
 Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
o Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
o Ensuring that enough free disk space is available for normal operations and upgrading
disk space as required.
o Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.

20
Gokaraju Rangaraju Institute of Engineering and Technology
Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16
SUB:DBMS Tutorial Sheet: I-2
Short answer questions
1. Define Entity and Entity set?
2. Define Attribute. What are the various types of attributes?
3. Define relationship and relationship set?
4. What are the different types of mapping cardinalities?
5. Define a Key. What are the different types of Keys?
6. Explain about Transaction Management?
7. What is an ER diagram? List the components in it. Give example.
8. List the various symbols used in ER diagram.
9. Draw ER diagram showing different types of relationships?
10. Draw an ER diagram depicting Composite, Multi valued, Derived attributes?
11. Differentiate between Weak entity set and Strong entity set?
12. How Roles are represented in ER diagram? Give Example.
13. What is Generalization? Give example
14. What is Specialization? Give Example.
15. What is Aggregation? Give example.
Descriptive questions/Programs/Experiments
1. Distinguish between DBMS & File System?
2. Explain the different levels of data abstraction?
3. Define data model. Explain about ER model, Relational Model and other models?
4. Explain about data base system structure with neat sketch?
5. Explain about the DBA responsibilities?
Tutor Faculty HOD

21
Gokaraju Rangaraju Institute of Engineering and Technology
Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16

SUB:DBMS Tutorial Sheet: I-2

SHORT ANSWER QUESTIONS


1. Define Entity and Entity set.
Ans: An Entity is a “thing” or “object” in the real world that is distinguishable from all other
objects. For example, in a school database, students, teachers, classes, and courses offered can be
considered as entities. All these entities have some attributes or properties that give them their
identity.
An Entity Set is a collection of similar types of entities. An entity set may contain entities
with attribute sharing similar values. For example, a Students set may contain all the students of
a school; likewise a Teachers set may contain all the teachers of a school from all faculties.
Entity sets need not be disjoint.
2. Define Attribute. What are the various types of attributes?
Ans: Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, and age as attributes. There exists a
domain or range of values that can be assigned to attributes. For example, a student's name
cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.
Attributes are represented by means of ellipses. Every ellipse represents one attribute and
is directly connected to its entity (rectangle).
Types of Attributes:
 Simple attribute − Simple attributes are atomic values, which cannot be divided further.
For example, a student's phone number is an atomic value of 10 digits.
 Composite attribute − Composite attributes are made of more than one simple attribute.
For example, a student's complete name may have first_name and last_name.
 Derived attribute − Derived attributes are the attributes that do not exist in the physical

22
database, but their values are derived from other attributes present in the database. For
example, average_salary in a department should not be saved directly in the database,
instead it can be derived. For another example, age can be derived from data_of_birth.
 Single-value attribute − Single-value attributes contain single value. For example −
Social_Security_Number.
 Multi-value attribute − Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.
 Key attribute --- represents the main characteristic of an Entity. It is used to represent
Primary key. Ellipse with underlying lines represents Key Attribute.
3. Define relationship and relationship set.
Ans: The association among entities is called a relationship. For example , an employee
“works_ at” a department, a student “enrolls” in a course. Here, Works_at and Enrolls are called
relationships.
A set of relationships of similar type is called a relationship set. Like entities, a
relationship too can have attributes. These attributes are called descriptive attributes. The number
of participating entities in a relationship defines the degree of the relationship.
 Binary = degree 2
 Ternary = degree 3
 n-ary = degree
Example :
Student (entity type) is related to Department (entity type) by MajorsIn (relationship
type).

4. What are the different types of mapping cardinalities?


Ans: Cardinality defines the number of entities in one entity set, which can be associated with
the number of entities of other set via relationship set. Most useful in describing binary

23
relationship sets. For a binary relationship set the mapping cardinality must be one of the
following:
 One to one
 One to many
 Many to one
 Many to many
 One-to-one –Only one entity of the first set is related to only one entity of the second set.
E.g. A teacher teaches a student. Only one teacher is teaching only one student. This can be
expressed in the following diagram as:

 One-to-many – Only one entity of the first set is related to multiple entities of the second
set. E.g. A teacher teaches students. Only one teacher is teaching many students. This can
be expressed in the following diagram as:

 Many-to-one – Multiple entities of the first set are related to multiple entities of the
second set. E.g.Teachers teach a student. Many teachers are teaching only one student.
This can be expressed in the following diagram as:

 Many-to-many –Multiple entities of the first set is related to multiple entities of the
second set. E.g.Teachers teach students. In any school or college many teachers are
teaching many students. This can be considered as a two way one-to-many relationship.
This can be expressed in the following diagram as:

5. Define a Key. What are the different types of Keys.


Ans: A Key is an attribute or collection of attributes that uniquely identifies an entity among

24
entity set. For example, the roll_number of a student makes him/her identifiable among students.
The different types of Keys are:
 Super Key − A set of attributes (one or more) that collectively identifies an entity in an
entity set.
 Candidate Key − A minimal super key is called a candidate key. An entity set may have
more than one candidate key.
 Primary Key − A primary key is one of the candidate keys chosen by the database designer
to uniquely identify the entity set.
6. Explain about Transaction Management.
Ans: A transaction is a collection of operations that performs a single logical function in a
database application. Each transaction is a unit of both atomicity ( either all the operations are
performed entirely or none) and consistency. Thus, we require that transactions do not violate
any database-consistency constraints. That is, if the database was consistent when a transaction
started, the database must be consistent when the transaction successfully terminates. It is the
programmer’s responsibility to define properly the various transactions, so that each preserves
the consistency of the database.
Ensuring the atomicity and durability properties is the responsibility of the database
system itself—specifically, of the transaction-management component. In the absence of
failures, all transactions complete successfully, and atomicity is achieved easily. However,
because of various types of failure, a transaction may not always complete its execution
successfully. If we are to ensure the atomicity property, a failed transaction must have no effect
on the state of the database. Thus, the database must be restored to the state in which it was
before the transaction in question started executing. The database system must therefore
perform failure recovery, that is, detect system failures and restore the database to the state that
existed prior to the occurrence of the failure.
Finally, when several transactions update the database concurrently, the consistency of
data may no longer be preserved, even though each individual transaction is correct. It is the
responsibility of the concurrency-control manager to control the interaction among the
concurrent transactions, to ensure the consistency of the database.

25
7. What is an ER diagram? List the components in it. Give example.
Ans: ER Model is represented by means of an ER diagram. Any object, for example, entities,
attributes of an entity, relationship sets, and attributes of relationship sets, can be represented
with the help of an ER diagram. An E-R diagram can express the overall logical structure of a
database graphically. E-R diagrams are simple and clear—qualities that may well account in
large part for the widespread use of the E-R model. Such a diagram consists of the following
major components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationship sets
• Lines, which link attributes to entity sets and entity sets to relationship sets
• Double ellipses, which represent multivalued attributes
• Dashed ellipses, which denote derived attributes
• Double lines, which indicate total participation of an entity in a relationship set
• Double rectangles, which represent weak entity sets
Consider the entity-relationship diagram, which consists of two entity sets, customer and
loan, related through a binary relationship set borrower. The attributes associated with customer
are customer-id, customer-name, customer-street, and customer-city. The attributes associated
with loan are loan-number and amount. Attributes of an entity set that are members of the
primary key are underlined.

26
8. List the various symbols used in ER diagram.
Ans:

27
9. Draw ER diagram showing different types of relationships.
Ans:

10. Draw an ER diagram depicting Composite, Multivalued, Derived attributes.


Ans:

28
11. Differentiate between Weak entity set and Strong entity set.
Ans:

12. How Roles are represented in ER diagram. Give Example.


Ans: The function that an entity plays in a relationship is called its role. Roles are normally
explicit and not specified. They are useful when the meaning of a relationship set needs
clarification. For example, the entity sets of a relationship may not be distinct. The
relationship works-for might be ordered pairs of employees (first is manager, second is worker).
In the E-R diagram, this can be shown by labelling the lines connecting entities (rectangles) to
relationships (diamonds).

29
13. What is Generalization. Give example.
Ans: Generalization is a bottom-up approach in which two lower level entities combine to form
a higher level entity. In generalization, the higher level entity can also combine with other lower
level entity to make further higher level entity.

14. What is Specialization? Give Example.


Ans: Specialization is opposite to Generalization. It is a top-down approach in which one higher
level entity can be broken down into two lower level entity. In specialization, some higher level
entities may not have lower-level entity sets at all.

15. What is Aggregation? Give example.


Ans: Aggregration is a process when relation between two entity is treated as a single entity.
Here the relation between Center and Course, is acting as an Entity in relation with Visitor.

30
LONG ANSWERS
1) Explain about the database users.
Ans: A primary goal of a database system is to retrieve information from and store new
information in the database. People who work with a database can be categorized as database
users or database administrators.
Database Users and User Interfaces
There are four different types of database-system users, differentiated by the way they
expect to interact with the system. Different types of user interfaces have been designed for the
different types of users.
 Naive users are unsophisticated users who interact with the system by invoking one of
the application programs that have been written previously. For example, a bank teller
who needs to transfer $50 from account A to account B invokes a program called transfer.
This program asks the teller for the amount of money to be transferred, the account from
which the money is to be transferred, and the account to which the money is to be
transferred. As another example, consider a user who wishes to find her account balance
over the World Wide Web. Such a user may access a form, where she enters her account
number. An application program at the Web server then retrieves the account balance,
using the given account number, and passes this information back to the user. The typical
user interface for naive users is a forms interface, where the user can fill in appropriate
fields of the form. Naive users may also simply read reports generated from the database.
 Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer
to construct forms and reports without writing a program. There are also special types of
programming languages that combine imperative control structures (for example, for
loops, while loops and if-then-else statements) with statements of the data manipulation
language.
 These languages, sometimes called fourth-generation languages, often include special
features to facilitate the generation of forms and the display of data on the screen. Most
major commercial database systems include a fourth generation language.
 Sophisticated users interact with the system without writing programs. Instead, they

31
form their requests in a database query language. They submit each such query to a
query processor, whose function is to break down DML statements into instructions that
the storage manager understands. Analysts who submit queries to explore data in the
database fall in this category.
 Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view
summaries of data in different ways. For instance, an analyst can see total sales by region
(for example, North, South, East, andWest), or by product, or by a combination of region
and product (that is, total sales of each product in each region). The tools also permit the
analyst to select specific regions, look at data in more detail (for example, sales by city
within a region) or look at the data in less detail (for example, aggregate products
together by category). Another class of tools for analysts is data mining tools, which
help them find certain kinds of patterns in data.
 Specialized users are sophisticated users who write specialized database applications that
do not fit into the traditional data-processing framework. Among these applications are
computer-aided design systems, knowledge base and expert systems, systems that store
data with complex data types (for example, graphics data and audio data), and
environment-modeling systems.
2) Explain the steps involved in Database design.
Ans: The process of database design is divided into different parts. It consists of a series of steps.
They are
 Requirement Analysis
 Conceptual Database Design (ER-Diagram)
 Logical Database Design (Tables, Normalization etc)
 Physical Database design (Table Indexing, Clustering etc)
Requirement Analysis
In this phase a detailed analysis of the requirement is done .The objective of this phase is
to get a clear understanding of the requirements. It makes use of various information gathering
methods for this purpose. Some of them are
 Interview
 Analyzing documents
 Survey

32
 Site visit
 Joint Applications Design (JAD) and Joint Requirements Analysis (JRA)
 Prototyping
Conceptual Database Design
The requirement analysis is modeled in this conceptual design. The ER Model is used at
the conceptual design stage of the database design. The ER diagram is used to represent this
conceptual design. ER diagram consists of Entities, Attributes and Relationships.
Logical Database Design
Once the relationships and dependencies are identified the data can be arranged into
logical structures and is mapped into database management system tables. Normalization is
performed to make the relations in appropriate normal forms.
Physical Database Design
It deals with the physical implementation of the database in a database management
system. It includes the specification of data elements, data types, indexing etc. All these
information are stored in the data dictionary.
3) Describe Weak Entity Set and Strong Entity set with example.
Ans: An entity set may not have sufficient attributes to form a primary key. Such an entity set is
termed a weak entity set. An entity set that has a primary key is termed a strong entity set. As
an illustration, consider the entity set payment, which has the three attributes:
payment-number, payment-date, and payment-amount. Payment numbers are typically
sequential numbers, starting from 1, generated separately for each loan. Thus, although each
payment entity is distinct, payments for different loans may share the same payment number.
Thus, this entity set does not have a primary key; it is a weak entity set.
For a weak entity set to be meaningful, it must be associated with another entity set,
called the identifying or owner entity set. Every weak entity must be associated with an
identifying entity; that is, the weak entity set is said to be existence dependent on the identifying
entity set. The identifying entity set is said to own the weak entity set that it identifies. The
relationship associating the weak entity set with the identifying entity set is called the
identifying relationship. The identifying relationship is many to one from the weak entity set to
the identifying entity set, and the participation of the weak entity set in the relationship is total.
In our example, the identifying entity set for payment is loan, and a relationship loan-

33
payment that associates payment entities with their corresponding loan entities is the identifying
relationship.
Although a weak entity set does not have a primary key, we nevertheless need a means of
distinguishing among all those entities in the weak entity set that depend on one particular strong
entity. The discriminator of a weak entity set is a set of attributes that allows this distinction to
be made. For example, the discriminator of the weak entity set payment is the attribute payment-
number, since, for each loan, a payment number uniquely identifies one single payment for that
loan. The discriminator of a weak entity set is also called the partial key of the entity set.
The primary key of a weak entity set is formed by the primary key of the identifying
entity set, plus the weak entity set’s discriminator. In the case of the entity set payment, its
primary key is {loan-number, payment-number}, where loan-number is the primary key of the
identifying entity set, namely loan, and payment-number distinguishes payment entities within
the same loan.
The identifying relationship set should have no descriptive attributes, since any required
attributes can be associated with the weak entity set. A weak entity set can participate in
relationships other than the identifying relationship. For instance, the payment entity could
participate in a relationship with the account entity set, identifying the account from which the
payment was made. A weak entity set may participate as owner in an identifying relationship
with another weak entity set. It is also possible to have a weak entity set with more than one
identifying entity set. A particular weak entity would then be identified by a combination of
entities, one from each identifying entity set. The primary key of the weak entity set would
consist of the union of the primary keys of the identifying entity sets, plus the discriminator of
the weak entity set.
In E-R diagrams, a doubly outlined box indicates a weak entity set, and a doubly outlined
diamond indicates the corresponding identifying relationship. In the below Figure, the weak
entity set payment depends on the strong entity set loan via the relationship set loan-payment.
The figure also illustrates the use of double lines to indicate total participation—the
participation of the (weak) entity set payment in the relationship loan-payment is total, meaning
that every payment must be related via loan-payment to some loan. Finally, the arrow from loan-
payment to loan indicates that each payment is for a single loan. The discriminator of a weak
entity set also is underlined, but with a dashed, rather than a solid, line.

34
4) Construct an E-R diagram for a hospital with a set of patients and a set of medical
doctors. Associate with each patient a log of the various tests and examinations conducted.
Ans:

35
5) Draw an ER diagram for the following Scenario :

Ans:

36
37
Gokaraju Rangaraju Institute of Engineering and Technology
Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16

SUB:DBMS Assignment Sheet: UNIT-I

Descriptive questions/Programs/Experiments

1. Define DBMS. Explain the applications of DBMS.


2. Compare and contrast DBMS with File processing System.
3. Explain the levels of data abstraction.
4. Define Instance and Schema? Explain about different Schemas.
5. What is data independence? Differentiate between Logical and physical independence.
6. What is data model? Explain about ER model?
7. Explain about Relational model with suitable examples.
8. Describe Database System Structure with neat diagram.
9. Explain about Database users with suitable examples.
10. Describe the functions of Database Administrator.
11. Explain about Transaction Management.
12. Explain the phases involved in Database design.
13. What is ER diagram? What are the components of ER diagram? List the Symbols and
their usage.
14. Define the terms a) entity b) relationship c) attribute with examples
15. Explain different types of attributes with examples.
16. Explain different types of relationships with examples.
17. Differentiate between weak entity set and strong entity set.
18. Explain about Generalization with example.
19. Explain about Specialization with example.
20. Construct an ER diagram for University database.

Tutor Faculty HOD

38
Gokaraju Rangaraju Institute of Engineering and Technology
Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16
SUB: DBMS Tutorial Sheet: UNIT II-1
Relational Model
Short answer questions
1.Write a short notes on relational model.
2.What is a Relation Schema?
3.What is a relation instance.
4.Does the relational model, as seen by an SQL query writer, provide physical and logical data
independence? Explain.
5.What is a super key?
6.What is a Candidate key?
7.What is a primary key?
8.What is a foreign key?
9.Define relational algebra.
10.What are the operations in relational algebra?
11.What is a PROJECT operation?
12.Write short notes on tuple relational calculus.
13.Write short notes on domain relational calculus
14.Explain the statement that relational algebra operators can be composed .Why is the ability to
compose operators important?
15.Explain the two conditions needed for the set difference operation (union operation) to be
valid.
Descriptive questions
1.Discuss the differences between relational calculus and relational algebra.
2.Explain about various keys
3.Explain in detail about the set operations in relational algebra.
4.Write the following queries in tuple relational calculus for the following schema:
Sailors(Sid:Integer,sname:string,rating:integer,age:real)
Boats(bid:integer,bname:string,color:string)
Reserves(sid:integer,bid:integer,day:date)
a.Find the names of sailors who has reserved a red boat.

39
b.Find the names of the sailors who have reserved atleast one boat.
c.Find the names of sailors who have reserved atleast two boats.
d.Find the names of the sailors who have reserved all the boats.
5. Consider the following the schema
Suppliers(sid:integer,sname:string,address:string)
Parts(pid:integer,pname:string,color:string)
Catalog(sid:integer,pid:integer,cost:real)
Write the following queries in tuple relational calculus
a.Find the names of the suppliers who supply some red part
b.Find the sids of suppliers who supply some red or green part.
c.Find the sid’s of suppliers who supply every red or green part.
d.Find the pid’s of parts supplied by at least two different supplier

Tutor Faculty HOD

Gokaraju Rangaraju Institute of Engineering and Technology


Department of Computer Science and Engineering
40
Year/Semester : II / I Academic year: 2015-16

SUB:DBMS
Tutorial Sheet: II-1 Question & Answers
SHORT ANSWER QUESTIONS
1.Write a short notes on relational model
Ans: The relational model represents data in the form of two dimensional tables. Each table
represents some real world entity. The relational model is an example of a record based model. .
2.What is a Relation schema?
Ans: The schema specifies the relation’s name of each field and the domain of each field. A
domain is referred by the domain name and set of associated values
3.What is a relation instance?
Ans: An instance of a relation is a set of tuples in which each tuple has same number of fields as
the relation schema.
4. Does the relational model, as seen by an SQL query writer, provide physical and logical
data independence? Explain
Ans: The user of SQL has no idea how the data is physically represented in the machine. He or
she relies entirely on the relation abstraction for querying. Physical data independence is
therefore assured. Since a user can define views, logical data independence can also be achieved
by using view definitions to hide changes in the conceptual schema..
5. What is a super key?
Ans: A super key is a set of one or more attributes that allows us to identify uniquely an entity in
the entity set.

6. What is a Candidate key?


Ans: A super key may contain extraneous attributes and we are often interest in the smallest
super key.A super key for which no subset is a super key is called a candidate key.
7. What is a primary key?
Ans: It is a candidate key that is chosen by the database designer as the principle means of

41
identifying entities within an entity set.
8. What is a foreign key?
Ans: An attribute or set of attributes within one relation that matches the candidate key of some
relation is called a foreign key.
9. Define relational algebra.
Ans: The relational algebra is a procedural query language. It consists of a set of operations that
take one or two relations as input and produce a new relation as output.
10. What are the operations in relational algebra?
Ans: In relational algebra operations are divided into two groups:
The first group includes set operations like Union, Intersect, Set difference, Cartesian
product. The second group is developed specially for relational data bases which include Select,
Project, Rename, Join ,Division..
11. What is a Project operation.
Ans: The project operation is a unary operation that returns its argument relation with certain
attributes left out.
12. Write short notes on Tuple relational calculus.
Ans: The tuple relational calculation is anon procedural query language. It describes the desired
information without giving a specific procedure for obtaining that information. A query or
expression can be expressed in tuple relational calculus as {t / P (t)} which means the set of all
tuples‗t‘ such that predicate P is true for‗t‘. .
13. Write short notes on domain relational calculus.
Ans: The domain relational calculus uses domain variables that take on values from an attribute
domain rather than values for entire tuple.
14. Explain the statement that relational algebra operators can be composed. Why is the
ability to compose operators important.
Ans: Every operator in relational algebra accepts one or more relation instances as arguments
and the result is always an relation instance. So the argument of one operator could be the result
of another operator. This is important because, this makes it easy to write complex queries by
simply composing the relational algebra operators.
15. Explain the two conditions needed for the set difference operation (union operation) to

42
be valid .
Ans: The two conditions are:
Relations, r and s must be of the same arity ie., they must have same number of attributes
Domains of the ith attribute of r and the ith attribute of s must be same for all i
LONG ANSWERS
1.Discuss the differences between relational calculus and relational algebra
Ans: Differences between relational calculus and relational algebra:
Relational calculus Relational algebra

It is a non –procedural language It is a procedural language

It allows the programmer to specify what data is It allows the programmer to specify what data is
to be retrieved but not the way of retrieving it required and also the way of retrieving it

It is possible to express all relational algebra It is restricted to safe queries on the calculus
queries into relational calculus

It is mostly preferred by the end-users It is mostly preferred by the programmers

It is considered as a user friendly language It is not a user –friendly language

The level of abstraction is high The level of abstraction is low

It determines the order of evaluation based on It determines the order of evaluation by itself
the compiler or interpreter
It is difficult to learn It is easy to learn

The queries are represented in a logical form The queries are represented in an algebraic form

It defines a relation in terms of one or more It defines a process of building a relation from
relations one or more relations

2.Explain about various keys .


Ans: In order to specify how tuples within a given relation are distinguished we use attributes.
That is, the values of the attribute values of a tuple must be such that they can uniquely identify
the tuple. In other words, no two tuples in a relation are allowed to have exactly the same value
for all attributes.
A super key is a set of one or more attributes that, taken collectively, allow us to identify
uniquely a tuple in the relation. For example, the ID attribute of the relation instructor is
Sufficient to distinguish one instructor tuple from another. Thus, ID is a super key. The name
attribute of instructor, on the other hand, is not a super key, because several instructors might

43
have the same name.
Formally, let R denote the set of attributes in the schema of relation r. If we say that a
subset K of R is a super key for r , we are restricting consideration to instances of relations r in
which no two distinct tuples have the same values on all attributes in K. That is, if t1 and t2 are
in r and t1 != t2, then t1.K != t2.K.
A super key may contain extraneous attributes. For example, the combination of ID and
name is a super key for the relation instructor. If K is a super key, then so is any superset of K.
We are often interested in super keys for which no proper subset is a super key. Such minimal
super keys are called candidate keys.
It is possible that several distinct sets of attributes could serve as a candidate key.
Suppose that a combination of name and dept name is sufficient to distinguish among members
of the instructor relation. Then, both {ID} and {name, dept name} are candidate keys. Although
the attributes ID and name together can distinguish instructor tuples, their combination, {ID,
name}, does not form a candidate key, since the attribute ID alone is a candidate key. primary
key is used to denote a candidate key that is chosen by the database designer as the principal
means of identifying tuples within a relation. A key (whether primary, candidate, or super) is a
property of the entire relation, rather than of the individual tuples. Any two individual tuples in
the relation are prohibited from having the same value on the key attributes at the same time. The
designation of a key represents a constraint in the real-world enterprise being modelled.Primary
keys must be chosen with care.
The primary key should be chosen such that its attribute values are never, or very rarely,
changed. For instance, the address field of a person should not be part of the primary key, since it
is likely to change. Social-security numbers, on the other hand, are guaranteed never to change.
A relation, say r1, may include among its attributes the primary key of another relation, say r2.
This attribute is called a foreign key from r1, referencing r2.The relation r1 is also called the
referencing relation of the foreign key dependency, and r2 is called the referenced relation of
the foreign key. For example, the attribute dept name in instructor is a foreign key from
instructor, referencing department, since dept name is the primary key of department. In any
database instance, given any tuple, say ta, from the instructor relation, there must be some tuple,
say tb, in the department relation such that the value of the dept name attribute of ta is the same
as the value of the primary key, dept name, of tb .

44
3.Explain in detail about the set operations in relational algebra.
Ans: The following standard operations on sets are also available in relational algebra:
 Union (U)
 Intersection (∩)
 Set-difference(-)
 Cross-product (x).
Union: R U S returns a relation instance containing tuples that occur in either relation instance
R or relation instance S (or both). R and S must be union-compatible, and the schema of the
result is defined to be identical to the schema of R.
Two relation instances are said to be union-compatible if the following conditions hold:
 They have the same number of the fields, and
 Corresponding fields, taken in order from left to right, have the same domains.
Intersection: R ∩ S returns a relation instance containing all tuples that occur in both Rand S.
The relations Rand S must be union-compatible and the schema of the result is defined to be
identical to the schema of R.
Set-difference: R-S returns a relation instance containing all tuples that occur in R but not in S.
The relations Rand S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.
• Cross-product: R x S returns a relation instance whose schema contains all the fields of R (in
the same order as they appear in R) followed by all the fields of S (in the same order as they
appear in S). The result of R x S contains the concatenation of tuples rand s for each pair. The
cross-product operation is sometimes called Cartesian product.
Example: Consider the following relations:
Employee:
Eid Name salary
1 John 10000
2 Ramesh 5000
3 Smith 8000
4 Jack 6000
5 Nile 15000
Student:
Sid Name fee
11 Smith 1000

45
22 Vijay 950
33 Gaurav 2000
44 Nile 1500
55 John 950

Now if we want to find the names of all employees and names of all students then we can write
π name(employee) U π name( student) and the result will be
Name
John
Ramesh
Smith
Jack
Nile
Vijay
Gaurav

if we want to find all the employees who are also students then we write
π name(employee) ∩ π name( student) and the result will be

Name
John
Smith
Nile

If we want the names of employees that are not students then we can write
π name(employee) - π name( student) and the result will be

Name
Ramesh
Jack
If we apply the Cartesian product on the following relations result will be:
Publisher_ information
Pid Name
P1 PHI
P2 TMH
P3 BPB
Book_Info
Bid Title

46
B1 DBMS
B2 COMPILER
The result will be:
Pid Name Bid name
P1 PHI B1 DBMS
P2 TMH B1 DBMS
P3 BPB B1 DBMS
P1 PHI B2 COMPILER
P2 TMH B2 COMPILER
P3 BPB B2 COMPILER

4. Write the following queries in tuple relational calculus for the following schema
Sailors(Sid:Integer,sname:string,rating:integer,age:real)
Boats(bid:integer,bname:string,color:string)
Reserves(sid:integer,bid:integer,day:date)
a. Find the names of sailors who has reserved a red boat.
b. Find the names of the sailors who have reserved atleast one boat.
c. Find the names of sailors who have reserved atleast two boats.
d. Find the names of the sailors who have reserved all the boats.
Ans:
a.
{T/∃S�Sailors∃R�Reserves(R.sid=S.Sid�P.sname=S.sname�∃B�Boats(B.bid=R.bid�B.color
’red’))}
b. {T/∃S � Sailors ∃R �Reserves(S.sid=R.sid�P.sname=S.sname)}
c. {T/∃S � Sailors ∃R1 �Reserves∃R2�Reserves(S.sid=R.sid�R1.sid=R2.sid�R1.bid≠R2.bid�
P.sname=S.sname)}
d.{T//∃S � Sailors ∀ B� Boats(∃ R� Reserves(S.sid=R.sid � R.bid=B.bid� P.sname=S.sname))}
5. Consider the following schema
Suppliers(sid:integer,sname:string,address:string)
Parts(pid:integer,pname:string,color:string)
Catalog(sid:integer,pid:integer,cost:real)
Write the following queries in tuple relational calculus
a.Find the names of the suppliers who supply some red part

47
b.Find the sids of suppliers who supply some red or green part.
c.Find the sid’s of suppliers who supply every red or green part.
d.Find the pid’s of parts supplied by atleast two different suppliers.
Ans:
a. {T | ∃T 1 ∈ Suppliers(∃X ∈ Parts(X.color =’red’ ∧ ∃Y ∈ Catalog(Y.pid = X.pid ∧ Y.sid = T
1.sid)) ∧ T.sname = T 1.sname)}

b. {T | ∃T 1 ∈ Catalog(∃X ∈ Parts((X.color = ‘red_ ∨ X.color = ‘green’) ∧X.pid = T 1.pid) ∧


T.sid = T 1.sid)}
c. {T | ∃T 1 ∈ Catalog(∀X ∈ Parts((X.color = ‘red∧X.color = ‘green_) ∨ ∃T 2 ∈ Catalog(T 2.pid
= X.pid ∧ T 2.sid = T 1.sid)) ∧ T.sid = T 1.sid)}
d. {T | ∃T 1 ∈ Catalog(∃T 2 ∈ Catalog(T 2.pid = T 1.pid ∧ T 2.sid = T 1.sid)∧T.pid = T 1.pid)}

48
Gokaraju Rangaraju Institute of Engineering and Technology
Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16
SUB:DBMS Tutorial Sheet: UNIT II-2
Short answer questions
1. What are the types of storage devices?
2. Define cache.
3. Define file organization.
4. What is a search key?
5. What are the ways in which the variable-length records arise in database systems?
6. What are the two types of blocks in the fixed-length representation? Define them.
7. What is a heap file organization?
8. What is hashing file organization?
9. What is a sequential file organization ?
10. What is the relationship between files and indexes
11. What is the difference between a primary index and a secondary index?
12. Differentiate between sparse and dense indices.
13. Why is a hash structure not the best choice for a search key on which range queries are
likely?
14. Explain the limitations of static hashing?
15. Explain B+ tree index structure.
Descriptive questions:
1. Why does DBMS store data on external storage and how the data is stored?
2. Distinguish between extendible hashing and linear hashing
3. Write a short note on two basic approaches for organizing data entries

49
4. List out the differences between ISAM and B+ trees
5. Differentiate between sequential and direct file organizations:
Tutor Faculty HOD

Gokaraju Rangaraju Institute of Engineering and Technology


Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16

SUB:DBMS Tutorial Sheet: UNIT II-2


SHORT ANSWER QUESTIONS

1. What are the types of storage devices?


Ans:
 Primary storage devices
 Secondary storage devices
 Tertiary storage devices
 Volatile storage
 Non –volatile storage
2. Define cache
Ans: The cache is the fastest and most costly form of storage. Cache memory is small; its use is
managed by the operating system..
3. Define file organization
Ans: A file is organized logically as a sequence of records. These records are mapped on to disk
blocks. File are prided as a basic construct in operating system Ternary = degree 3
4. What is a search key?
Ans: An attribute or set of attributes used to look up records in a file is called a search key.
5. What are the ways in which the variable length records arise in the database system
Ans:

50
 Storage of multiple records in a file
 Record types that allow variable lengths for one or more fields
 Record types that allow repeating fields.

6. What are the two types of blocks in the fixed length representation? Define them.
Ans: Anchor block – contains the first record of a chain
Overflow block - contains records other than those that are the first record of a chain
7. What is a heap file organization.
Ans: In the heap file organization any record can be placed anywhere in the file where there is
space for the record. There is no ordering of records. There is a single file for each relation.
8. What is a hashing file organization
Ans: In the hashing file organization, a hash function is computed on some attribute of each
record. The result of the hash function specifies in which block of the file the record should be
placed
9. What is a sequential file organization.
Ans: In this organization records are stored in sequential order, based on the value of the search
key of each record
10. What is the relationship between files and indexes.
Ans: A file is a collection or sequence of records that can be built and destroyed whereas, an
index is a list of keys or keywords that form a disk based data structure. Allocation of an index
onto a file is done in order to speed up the searching and retrieval of records that satisfy search
conditions on the search key fields of the index.
11. What is the difference between a primary index and a secondary index.
Ans: The primary index is on the field which specifies the sequential order of the file. There
can be only one primary index while there can be many secondary indices
12. Differentiate between sparse and dense indexes.
Ans: Differences between sparse and dense indexes:

Sparse index Dense index


Index entry is available only for a few of the Index entry is available for every search key value
search key values in the file
It is complex It is simple

51
It locates the records indirectly It locates the records directly
It consumes more time It consumes less time

13. Why is a hash structure not the best choice for a search key on which range queries are
likely.
Ans: A range query cannot be answered efficiently using a hash index, we will have to read all
the buckets. This is because key values in the range do not occupy consecutive locations in the
buckets, they are distributed uniformly and randomly throughout all the buckets.
14. Explain the limitations of static hashing.
Ans: The static hashing technique uses fixed number of buckets, which degrades it’s
performance. Hence it does not allow the file to be dynamically expanded or shrinked depending
on the number of records being stored in the file.
15. Explain B+ tree index structure
Ans: The B+ tree index structure is the most widely used of several index structures that
maintain their efficiency despite insertion and deletion of data. A B+ tree index takes the form of
a balanced tree in which ever path from the root of the tree to a leaf of the tree is the same length
LONG ANSWERS
2) Why does a DBMS store data on external storage and how the data is stored.
Ans: A DBMS stores data on external storage because the quantity of data is vast and must
persist across program executions. As the quantity of data is vast it stores the data on devices like
disks and tapes. Disks provide random access of data and tapes provide sequential access of
data. The cost of accessing the data randomly is more than accessing the data sequentially. The
data stored in the disks is usually in the form of form of files. These files consist of records that
have a unique identifier known as a record id .This identifier is used to determine the address of
the page.The data that is read from or written to disk is considered a page whose size is 4kB or
8kB. The cost of page I/O is usually high than the cost of typical database operations. This cost
can be reduced if a optimized database system is built.The DBMS components that read and
write data from main memory are:
 Buffer manager
 Disk space manager.
Buffer manager: It is a software layer whose major responsibility is to fetch the page from main

52
memory whenever it receives a request from the files and access methods layer. The pages are
fetched based on the pages rid which is associated with them. And if the requested page is not
found in the main memory then the buffer manager fetches it from the disk.
Disk space manager: It is a software layer that is used to allocate or deal locate the disk page
when new records in a file are to be written on to the disk or a page on the disk which is not in
use.The disk space manager is responsible to keep the information regarding all the pages that
are being processed by the file and access methods layer. Once the page has been processed the
corresponding disk space becomes free which can be used again when a new page is stored into
the disk
2) Distinguish between extendible hashing and linear hashing
Ans: Differences between extendible hashing and linear hashing:
Extendible hashing Linear hashing
Here a directory structure is used Here the directory structure is avoided b
allocating primary bucket pages
successively
It does not require overflow pages It requires temporary overflow pages
Space utilization is more Space utilization is less
When bucket overflows ,it is splitted by When bucket overflows ,it is splitted in a
just doubling the directory structure round-robin fashion
In uniform data distributions ,the average In uniform data distributions ,the average
cost is high for equality selections cost is low for equality selections
In skewed data distributions, the In skewed data distributions, the
performance of extendible hashing is performance of linear hashing is low
high

3) Write a short note on two basic approaches for organizing data entries.
Ans: Indexing refers to the process of finding a particular record in a file using one or more
index or storing a record in any order. The two methods in which file data entries can be
organized are:
1.Hash-based indexing
2.Tree based indexing
Hash-based indexing: In this approach, the records in a file are grouped in buckets, where a
bucket consists of a primary page and, possibly, additional pages linked in a chain. The bucket to

53
which a record belongs can be determined by applying a special function, called a hash function,
to the search key. Given a bucket number, a hash-based index structure allows us to retrieve the
primary page for the bucket in one or two disk l/Os. On inserts, the record is inserted into the
appropriate bucket, with 'overflow' pages allocated as necessary. To search for a record with a
given search key value, we apply the hash function to identify the bucket to which such records
belong and look at all pages in that bucket. If we do not have the search key value for the record,
for example, the index is based on sal and we want records sswith a given age value, we have to
scan all pages in the file.
Tree-based indexing: An alternative to hash-based indexing is to organize records using a tree
like data structure. The data entries are arranged in sorted order by search key value, and a
hierarchical search data structure is maintained that directs searches to the correct page of data
entries. The lowest level of the tree, called the leaf level, contains the data entries This structure
allows us to efficiently locate all data entries with search key values in a desired range. All
searches begin at the topmost node, called the root, and the contents of pages in non-leaf levels
direct searches to the correct leaf page. Non-leaf pages contain node pointers separated by search
key values.The node pointer to the left of a key value k points to a sub tree that contains only
data entries less than k. The node pointer to the right of a key value k points to a sub tree that
contains only data entries greater than or equal to k.
4) List out the differences between ISAM and B+ trees
Ans: Differences between ISAM and B+ trees:
ISAM B+ Trees
ISAM is a static indexing structure B+ Tree is a dynamic indexing structure
The leaf pages are allocated sequentially The leaf pages are allocated randomly
Due to static size of ISAM overflow chains Due to dynamic size of B+ Trees over flow
may rarely occur. chains may frequently occur.
Only leaf pages can be modified Leaf as well as index level pages can be
modified.
Insertions lead to long over flow chains Insertions are handled elegantly with out over
flow chains.
Performance of ISAM is less efficient. Performance of B+ Tree is more efficient
Locking over head of ISAM is less Locking over head of B+ Tree is more
Scanning is done more efficiently Scanning is done less efficiently

5) Differentiate between sequential and direct file organizations:


Ans: Differences between sequential and direct file organizations:

54
Sequential file organization Direct file organization
In this records are stored in sequential access In this records are stored in direct access
storage devices storage devices
In this required records are being accessed by In this required records are searched randomly
searching from beginning of the file to the end using keys
of the file till the record is found
Time consumption is more Time consumption is less
Accessing speed is very less Accessing speed is more
This organization is economically low when This is more expensive when compared with
compared with direct file organization sequential file organization
Before processing transactions records must be Before processing transactions it is not
sorted either in ascending or descending order necessary to sort the records stored in memory

Gokaraju Rangaraju Institute of Engineering and Technology


Department of Computer Science and Engineering

55
Year/Semester : II / I Academic year: 2015-16
SUB:DBMS Assignment Sheet: UNIT II
Descriptive questions
1. Describe the different types of file organization? Explain using a sketch of each of them with
their advantages and disadvantages
2. Describe the structure of B+ tree and give the algorithm for search in the B+ tree with xample
3. State various fundamental operations in the relational algebra and explain them with suitable
examples.
4. Explain the differences between extendible hashing and linear hashing
5. List out the differences between domain relational calculus and tuple relational calculus
6. Write a short note on various keys available with examples
7. Differentiate between sequential, direct and indexed sequential file organizations
8. Explain about tertiary storage media in detail.
9. Explain about fixed length file organization with an example.
10. Explain about byte string representation in detail
11. Write a short note on ISAM
12. Explain all the operations on B+ trees by taking a sample example
13. Explain in detail about static hashing
14. Consider the following
shema:Suppliers(sid,sname,saddress);Parts(pid,pname,color);Catalog(sid,pid,cost). Write the
following queries in relational algebra:(a)Find the names of suppliers who supply some blue
part.(b)Find the sids of suppliers who supply every red part
15. Explain about extendible hashing
16.Explain in detail about linear hashing
17.Write the differences between primary and secondary indexes
18.List out the properties of indexes.
19.When must we create non –clustering index despite the advantage of a clustering index.
Explain
20.Describe all the variants of join operation.
Tutor Faculty HOD

56
GokarajuRangaraju Institute of Engineering and Technology

Department of Computer Science and Engineering


Year/Semester: II / I Academic year: 2015-16
SUB:DBMS Tutorial Sheet: UNIT III-1

57
Short answer questions

1. What is the basic form of an SQL query?


2. Explain DML commands with examples.
3. a) Select the names and salaries of the employees whose salary is greater than 20000.
b) Find the names of employees who are working in ‘Accounting’ department.
4. Mention the steps to be followed for query evaluation.
5. Explain DDL commands with examples.
6. Explain about the operator used for pattern matching.
7. a) List the names of the employees whose name start with ‘H’ and have ‘r’ as the third
character.
b) Find all the employees whose department name starts with ‘pac’.
8. What are the set manipulation operators? Explain them.
9. a) Find the names of employees who are working for the departments 4 and 6 using set
manipulation operator.
b) Find the names of department who are controlling the project number 44 but not
project number 55.
10. What is correlated nested query? Explain with example.
11. a) Find the names of employees whose salary is greater than the salary of Henry.
b) List the names of employees whose salary is greater than every employee called
Henry.
12. What is a nested query? Explain with an example.
13. List the number of employees in the company.
14. a) Find the names of sailors who have reserved a red boat.
b) Find the colors of boats reserved by Lubber.
15. Find the ages of sailors whose name begins and ends with B and has atleast 3 characters.

Descriptive questions/Programs/Experiments

1. Explain the query evaluation steps with an example.


2. What is a nested query? Explain.
3. Explain about database languages with examples.
4. What operations does SQL provide over sets of tuples and how would you use these in
writing queries?
5. Explain the basic form of a SQL query in detail with example.

Tutor Faculty HOD

Gokaraju Rangaraju Institute of Engineering and Technology


Department of Computer Science and Engineering
Year/Semester: II / I Academic year: 2015-16
58
SUB:DBMS Tutorial Sheet: UNIT III-1
Short answer questions
16. What is the basic form of an SQL query?
Ans: The basic form of a SQL Query is as follows:
SELECT [DISTINCT] <select-list>
FROM <from-list>
WHERE <Qualification/Condition>;
In this SQL statement, DISTINCT and WHERE clauses are optional.
<select-list> specifies the list of columns to be retrieved from the table.
<from-list> specifies the list of tables along with their range variables.
<condition> is the condition to be satisfied in order to retrieve the columns-list from the
list of tables.
17. Explain DML commands with examples.
Ans: Data Manipulation Language (DML) statements are used for managing data in database.
DML commands include:
1. Insert
2. Update
3. Delete
1.Insert Command: Insert command is used to insert data into a table.
Syntax: INSERT INTO <table-name> VALUES (val1, val2, val3…);
INSERT INTO <table-name>(col1, col2, col3, …) VALUES (val1, val2, val3, …);
Example:INSERT INTO student VALUES (‘1A1’,’XXX’,’CSE’, 98, 20);
INSERT INTO student (sid, sname) VALUES (‘1A1’,’XXX’);
2.Update Command:Update command is used to update a row of a table.
Syntax:UPDATE <table-name> SET <modification> WHERE <condition>;
Example:Update student set marks=marks+2 where sid=2;
3. Delete Command: Delete command is used to delete data from a table. Delete
command can be used with a condition to delete a particular row.
Syntax: DELETE FROM <table-name> WHERE <condition>;
Example: Delete from student where sid=’1A1’;

59
18. a) Select the names and salaries of the employees whose salary is greater than
20000.
b) Find the names of employees who are working in ‘Accounting’ department.
Ans: a) select ename,esal from emp where esal>20000;
The above SQL statement displays the names and salaries of employees from emp table
whose salary is greater than 20000.
b) select e.ename from emp e,dept d where e.dno=d.dno and d.dname=’accounting’;
The above SQL statement displays names of employees from emp table. But emp table
does not have dname.So we need to join dept and emp tables using the common column
dno.
19. Mention the steps to be followed for query evaluation.
Ans: Execution of a SELECT statement can be explained I n a conceptual evaluation
strategy.
1. Compute the cross product of tables in the from-list.
2. Delete the rows that fail in the qualification conditions.
3. Delete all the columns that do not appear in the select-list.
4. If DISTINCT is specified, eliminate the duplicate rows.
Hence, the conceptual evaluation strategy makes explicit the rows that must be present
in the answer to the query.
20. Explain DDL commands with examples.
Ans: Data Definition Language (DDL) statements are used to define the database schema or
structure. DDL commands include:
1. Create
2. Alter
3. Drop
4. Truncate
5. Rename
1.Create Command:Used to create table’s structure.
Syntax:CREATE TABLE <table-name>(column1 Datatype1[,Column2 Datatype2,…..]
[DEFAULT expr][,….]);

60
Example:Create table student(sid varchar2(20), sname varchar2(20), branch
varchar2(20), marks number(4), age number(3));
2. Alter Command: Alter command is used for alteration of table structures. It is used
to add a column to existing table, to rename any existing column, to change data type of
any column or to modify its size, to drop a column.
Adding column to existing table:
Syntax: ALTER TABLE <table-name> ADD(column-name datatype);
Example:alter table student add(address char);
Adding Multiple Column to existing table:
Syntax: ALTER TABLE <table-name> ADD(column1 datatype1, column2 datatype2,
….);
Example:Alter table student add(fname varchar2(20),dob date);
Adding column with default value:
Syntax: ALTER TABLE <table-name> ADD(column1 datatyp1 DEFAULT data);
Example: alter table student add(dob date default ‘1-jan-99’);
Modifying an Existing Column:
Syntax: ALTER TABLE <table-name> MODIFY(column datatype);
Example:alter table student modify(address varchar2(30));
Rename a Column:
Syntax: ALTER TABLE <table-name> RENAME <old column name> TO <new
column name>;
Example: alter table student rename address to location;
Drop a Column:
Syntax:ALTER TABLE <table-name> DROP(column-name);
Example: alter table student drop(address);
3.Truncate Command:This command removes all records from a table but it will not
destroy the table’s structure.
Syntax: TRUNCATE TABLE <table-name>;
Example: Truncate table student;
4.DROP Command: It completely removes a table from the database. It will destroy the
table structure.

61
Syntax: DROP TABLE <table-name>;
Example: Drop table student;
5. Rename Command: It is used rename a table.
Syntax:RENAME TABLE <old table name> TO <new table name>;
Example:Rename table student to student1;
21. Explain about the operator used for pattern matching.
Ans: The operator used for pattern matching is ‘LIKE’ operator. This is used to compare a
value to similar values using wildcard operators. There are two wildcards used in
conjunction with the LIKE operator:
1. The percent sign (%)
2. The underscore ( _ )
The percent sign represents zero, one or multiple characters. The underscore represents a
single number or character. These symbols are used in combination. LIKE operator is
used in where clause.
Syntax: SELECT <select-list> FROM <table-name> WHERE <column> LIKE ‘_
%characters’;
Example: Find the employees whose name starts with H and ends with e.
SQL> Select * from emp where ename like ‘H%e’;
22. a) List the names of the employees whose name start with ‘H’ and have ‘r’ as the
third character.
b) Find all the employees whose department name starts with ‘pac’.
Ans: a) select ename,esal from emp where ename like ‘H_r%’;
The output of the above query results in displaying the name and salary of the
employees with H as strating character and r as third character.
Output: ENAME ESAL
Hari 10000
Haritha 16000
b) select e.eid,d.dname from emp e,dept d where e.dno=d.dno and d.dname like ‘pac%’;
The above SQL statement displays names of employees from emp table. But emp table
does not have dname. So we need to join dept and emp tables using the common column
dno. The dname should start with pac and can have any number of characters thereafter.

62
Output: EID DNAME
101 packaging
23. What are the set manipulation operators? Explain them.
Ans: There are three set manipulation constructs provided by SQL. Since the answer to a
query is a multiset of rows, it is required to perform UNION, INTERSECT and EXCEPT
operations.
1. UNION:It is used to combine the results of two or more select statements. It will eliminate
duplicate rows from its result set. The number of columns and data type must be same in both
the tables.
Example: Select name from first
UNION
select name from second;
UNION ALL shows duplicate values also.
2. INTERSECT:It is used to combine two select statements, but it only returns the records
which are common from both the select statements. The number of columns and data type
must be same.
Example: select name from first
INTERSECT
select name1 from second;
INTERSECT ALL shows duplicate values also.
3.EXCEPT:It combines result of two select statements and return only those result which
belong to first set of result.
Example: select name from first
EXCEPT
select name1 from second;
EXCEPT ALL shows duplicate values also.
9. a) Find the names of employees who are working for the departments 4 and 6 using set
manipulation operator.
b) Find the names of department who are controlling the project number 44 but not
project number 55.
Ans: a) select e.ename from emp e where e.dno=4

63
INTERSECT
SELECT e1.ename from emp e1 where e1.dno=6;
b) select d.dname from dept d where d.pno=44
EXCEPT
select d1.dname from dept d1 where d1.pno=55;
10. What is correlated nested query? Explain with example.
Ans: A Correlated Subquery is one that is executed after the outer query is executed. So
correlated subqueries take an approach opposite to that of normal subqueries. The correlated
subquery execution is as follows:
1. The outer query receives a row.
2. For each candidate row of the outer query, the subquery (the correlated subquery) is executed
once.
3. The results of the correlated subquery are used to determine whetherthe candidate row should
be part of the result set.
4. Theprocess is repeated for all rows.
Correlated Subqueries differ from the normal subqueries in that the nested SELECTstatement
refers back to the table in the first SELECT statement.
Example: To find out the names of all the students who appeared in more than three papers of
their opted course, the SQL will be
Select namefrom student AWhere 3 < (select count (*)from result bwhere b.rollno = a.rollno);
In other words, a correlated subquery is one whose value depends upon some variablethat
receives its value in some outer query.
11. a) Find the names of employees whose salary is greater than the salary of Henry.
b) List the names of employees whose salary is greater than every employee called Henry.
Ans: a) select e.ename from emp e where e.esal>ANY(select e1.esal from emp e1 where
e1.ename=’Henry’);
b) select e.ename from emp e where e.esal>ALL(select e1.esal from emp e1 where
e1.ename=’Henry’);

12. What is a nested query? Explain with an example.

64
Ans: Subqueries are nested, when the subquery is executed first, and its results are inserted into
Where clause of the main query.
Nested Subqueries:
A subquery is nested when you are having a subquery in the where or having clause of another
subquery.
Example: Get the result of all the students who are enrolled in the same course as the student
with ROLLNO 12.
Select * From result where rollno in (select rollno from student where courseid = (select courseid
from student where rollno = 12));
The innermost subquery will be executed first and then based on its result the next subquery will
be executed and based on that result the outer query will be executed. The levels to which you
can do the nesting is implementation-dependent.
13. List the number of employees in the company.
Ans: select count(*) from emp;
The above function displays the number of rows in the table.
14. a) Find the names of sailors who have reserved a red boat.
b) Find the colors of boats reserved by Lubber.
Ans: a) select s.sname from sailors s,reserves r,boats b where s.sid=r.sid and r.bid=b.bid and
b.color=’red’;
b) select b.color from boats b,reserves r,sailors s where b.bid=r.bid and r.sid=s.sid and
s.sname=’Lubber’;
15. Find the ages of sailors whose name begins and ends with B and has atleast 3
characters.
Ans: select s.age from sailors s where s.name like ‘B_%B’;

Descriptive questions/Programs/Experiments
6. Explain the query evaluation steps with an example.
Ans: Execution of a SELECT statement can be explained I n a conceptual evaluation strategy.
1. Compute the cross product of tables in the from-list.
2. Delete the rows that fail in the qualification conditions.
3. Delete all the columns that do not appear in the select-list.

65
4. If DISTINCT is specified, eliminate the duplicate rows.
Hence, the conceptual evaluation strategy makes explicit the rows that must be present
in the answer to the query.
Example: Find the names of sailors who have reserved boat number 103
select s.sname from sailors s, reserves r where s.sid=r.sid and r.bid=103;
For the above example, Let us consider the tables as below:
Sailors table Reserves table

ratin sid bid day


sid sname age
g
22 101 10/10/98
22 Dustin 7 45.0
Step- 22 102 10/10/98 1: compute the
31 Lubber 8 55.5
22 103 10/8/98 cross product of
64 Horatio 7 35.0
31 103 11/6/98 sailors and
reserves tables. 64 101 9/5/98

sid sname rating age sid bid day


22 Dustin 7 45.0 22 101 10/10/98
22 Dustin 7 45.0 22 102 10/10/98
22 Dustin 7 45.0 22 103 10/8/98
22 Dustin 7 45.0 31 103 11/6/98
22 Dustin 7 45.0 64 101 9/5/98
31 Lubber 8 55.5 22 101 10/10/98
31 Lubber 8 55.5 22 102 10/10/98
31 Lubber 8 55.5 22 103 10/8/98
31 Lubber 8 55.5 31 103 11/6/98
31 Lubber 8 55.5 64 101 9/5/98
64 Horatio 7 35.0 22 101 10/10/98
64 Horatio 7 35.0 22 102 10/10/98
64 Horatio 7 35.0 22 103 10/8/98
64 Horatio 7 35.0 31 103 11/6/98
64 Horatio 7 35.0 64 101 9/5/98

Step-2: Delete the rows that fail in the qualification conditions.

sid sname rating age sid bid day


22 Dustin 7 45.0 22 103 10/8/98
31 Lubber 8 55.5 31 103 11/6/98

66
Step-3:Delete all the columns that do not appear in the select-list.

sname
Dustin
Lubber
The final result has only these two rows in the output.
2. What is a nested query? Explain.
Ans: A nested query is a query that has another query embedded within it; the embedded query
is called a subquery. When writing a query, we sometimes need to express a condition that
refers to a table that must itself be computed. The query used to compute this subsidiary table
is a subquery and appears as part of the main query. A subquery typically appears within the
WHERE clause of a query. Subqueries can sometimes appear in the FROM clause or the
HAVING clause.
Example:Find the names of sailors who have reserved boat 103.
SELECT S.snameFROM Sailors SWHERE S.sid IN ( SELECT R.sidFROM Reserves R
WHERE R.bid = 103 );
The nested subquery computes the (multi)set of sids for sailors who have reserved boat
103 (the set contains 22, 31, and 74 on instances R2 and S3), and the top-level query retrieves
the names of sailors whose sid is in this set.
The IN operator allows us to test whether a value is in a given set of elements; an SQL query
is used to generate the set to be tested.
Example: Find the names of sailors who have reserved a red boat.
SELECT S.sname FROM Sailors S WHERE S.sid IN ( SELECT R.sid FROM Reserves R
WHERE R.bid IN ( SELECT B.bid FROM Boats B WHERE B.color = `red' )
The innermost subquery _nds the set of bids of red boats (102 and 104 on instance B1).
The subquery one level above _nds the set of sids of sailors who have reserved one of these
boats. On instances B1, R2, and S3, this set of sids contains 22, 31, and 64. The top-level
query finds the names of sailors whose sid is in this set of sids. For the example instances, we
get Dustin, Lubber, and Horatio. To find the names of sailors who have not reserved a red
boat, we replace the outermost occurrence of IN by NOT IN.
Example: Find the names of sailors who have not reserved a red boat.

67
SELECT S.sname FROM Sailors S WHERE S.sid NOT IN ( SELECT R.sid FROM
Reserves R WHERE R.bid IN ( SELECT B.bid FROM Boats B WHERE B.color = `red' )
This query computes the names of sailors whose sid is not in the set 22, 31, and 64.
3. Explain about database languages with examples.
Ans: A database system provides a DDL to specify the database schema and DML too express
database queries and updates.
1. DDL Commands
2. DML Commands
1.DDL Commands: These commands update the special set of tables, called data dictionary
or data directory. Data dictionary contains metadata i.e. data about data. Data that describe
the properties of other data is known as metadata. It keeps the information, how and where
the other data is stored. Some of the properties include data definition, data structures and
rules or constraints. Metadata describes the properties of data.
Eg:Schema of database in the data dictionary.
Data Definition Language (DDL) statements are used to define the database schema or
structure. DDL commands include:
1. Create
2. Alter
3. Drop
4. Truncate
5. Rename
1.Create Command:Used to create table’s structure.
Syntax: CREATE TABLE <table-name> (column1 Datatype1[,Column2 Datatype2,…..]
[DEFAULT expr][,….]);
Example: Create table student (sid varchar2(20), sname varchar2(20), branch varchar2(20),
marks number(4), age number(3));
2. Alter Command: Alter command is used for alteration of table structures. It is used to add a
column to existing table, to rename any existing column, to change data type of any column or to
modify its size, to drop a column.
Adding column to existing table:
Syntax: ALTER TABLE <table-name> ADD (column-name data type);

68
Example: alter table student add(address char);
Adding Multiple Column to existing table:
Syntax: ALTER TABLE <table-name> ADD(column1 datatype1, column2 datatype2, ….);
Example: Alter table student add (fname varchar2 (20), dob date);
Adding column with default value:
Syntax: ALTER TABLE <table-name> ADD(column1 datatyp1 DEFAULT data);
Example: alter table student add(dob date default ‘1-jan-99’);
Modifying an Existing Column:
Syntax: ALTER TABLE <table-name> MODIFY(column datatype);
Example: alter table student modify(address varchar2(30));
Rename a Column:
Syntax: ALTER TABLE <table-name> RENAME <old column name> TO <new column
name>;
Example: alter table student rename address to location;
Drop a Column:
Syntax: ALTER TABLE <table-name> DROP(column-name);
Example: alter table student drop(address);
3. Truncate Command: This command removes all records from a table but it will not destroy
the table’s structure.
Syntax: TRUNCATE TABLE <table-name>;
Example: Truncate table student;
4. DROP Command: It completely removes a table from the database. It will destroy the table
structure.
Syntax: DROP TABLE <table-name>;
Example: Drop table student;
5. Rename Command: It is used rename a table.
Syntax: RENAME TABLE <old table name> TO <new table name>;
Example: Rename table student to student1;
2. DML Commands:These are of 2 types:
i) Procedural DML: It requires a user to specify what data are needed and how to get those data.

69
ii) Non-Procedural DML: Also called as declarative DML. It requires a user to specify what
data are needed without specifying how to get those data.
Data Manipulation Language (DML) statements are used for managing data in database.
DML commands include:
1. Insert
2. Update
3. Delete
1.Insert Command: Insert command is used to insert data into a table.
Syntax: INSERT INTO <table-name> VALUES (val1, val2, val3…);
INSERT INTO <table-name> (col1, col2, col3, …) VALUES (val1, val2, val3, …);
Example: INSERT INTO student VALUES (‘1A1’,’XXX’,’CSE’, 98, 20);
INSERT INTO student (sid, sname) VALUES (‘1A1’,’XXX’);
2.Update Command:Update command is used to update a row of a table.
Syntax: UPDATE <table-name> SET <modification> WHERE <condition>;
Example:Update student set marks=marks+2 where sid=2;
3. Delete Command: Delete command is used to delete data from a table. Delete
command can be used with a condition to delete a particular row.
Syntax: DELETE FROM <table-name> WHERE <condition>;
Example: Delete from student where sid=’1A1’;
4. What operations does SQL provide over sets of tuples and how would you use these in
writing queries?
Ans: There are three set manipulation constructs provided by SQL. Since the answer to a query
is a multiset of rows, it is required to perform UNION, INTERSECT and EXCEPT operations.
1. UNION: It is used to combine the results of two or more select statements. It will
eliminate duplicate rows from its result set. The number of columns and data type must be
same in both the tables.
Example: Select name from first
UNION
select name from second;
UNION ALL shows duplicate values also.

70
2. INTERSECT: It is used to combine two select statements, but it only returns the records
which are common from both the select statements. The number of columns and data type
must be same.
Example: select name from first
INTERSECT
select name1 from second;
INTERSECT ALL shows duplicate values also.
3. EXCEPT: It combines result of two select statements and return only those result which
belong to first set of result.
Example: select name from first
EXCEPT
select name1 from second;
EXCEPT ALL shows duplicate values also.
5. Explain the basic form of a SQL query in detail with example.
Ans: The basic form of a SQL Query is as follows:
SELECT [DISTINCT] <select-list>
FROM <from-list>
WHERE <Qualification/Condition>;
In this SQL statement, DISTINCT and WHERE clauses are optional.
<select-list> specifies the list of columns to be retrieved from the table.
<from-list> specifies the list of tables along with their range variables.
<condition> is the condition to be satisfied in order to retrieve the columns-list from the list of
tables.
Example: Find the names of sailors whose age is greater than 35.
select sname from sailors where age>35.0;

ratin
sid sname age
g
22 Dustin 7 45.0
31 Lubber 8 55.5
64 Horatio 7 35.0 71
Gokaraju Rangaraju Institute of Engineering and Technology
Department of Computer Science and Engineering
Year/Semester: II/I Academic Year: 2015-16
SUB:DBMS Tutorial Sheet: UNIT IV-1
NORMALIZATION
Short answer questions
1. What is schema?

2. What is decomposition? Why it is used?

3. What is functional dependency?

72
4. What is normalization? List the types of normal forms?

5. What are the problems caused by redundancy?

6. What are the problems that can be caused by using decomposition?

7. When do we have to decompose a relation?

8. Define first normal form? Explain with an example?

9. Define second normal form? Explain with an example?

10. How to decompose a non 2NF relation to 2NF?

11. Define full functional dependency, partial dependency?

12. Define third normal form?

13. What is transitive dependency?

14. For relation R, determine the candidate key and if a relation is not in BCNF then
decompose it into a collection of BCNF relations. R1 (A, C, B, D, E), FDs: A → B, C→
D.
15. What is a prime attribute and non-prime attribute?

Descriptive questions/Programs/Experiments

1.What is schema refinement? Explain problems caused by redundancy?


2.What is decomposition? Why is it used? What are the problems related to decomposition?
3.What is a functional dependency? Explain with an example? When is an FD implied by a set of
F of FDs?
4.Explain about the closure of a set of FDs and the attribute closure?
5.What is a normalization? Explain different normal forms based on FDs?
6. For the following relational schema tell whether it is in 3NF or not. EMPLOYEE(E_code,
E_name, Dname, Salary, Project_No, Termination_date_of_project), where each project has no
unique Termination_date_of_project. If it is not in 3NF bring it into 3NF through normalization?

Tutor Faculty HOD

73
Gokaraju Rangaraju Institute of Engineering and Technology
Department of Computer Science and Engineering
Year/Semester: II / I Academic year: 2015-16
SUB :DBMS Tutorial Sheet: UNIT IV-1
Short answer questions
1. What is schema?
Ans: A schema can be defined as a complete description of database. The specifications for
database schema is provided during the database design stage and this schema doesn’t change
74
frequently.
2. What is decomposition? Why it is used?
Ans: Decomposition is the solution to the problem caused by data redundancy. Decomposition
means breaking up the large schema into smaller multiple schemas. Decomposition helps to
remove all the insertion, deletion and updation anomalies and also helps to maintain data
integrity.
3. What is functional dependency?
Ans: Functional dependency is the one of the most important concept related with
normalization. With Normalization a relation is converted to a standard form.
A functional dependency FD defines the relationship between different attributes of a
relation in a database.
R={X,Y,Z,K,L,M}
Where R is a relation and X,Y,Z,K,L,M are attributes. All attributes are unique i.e., they
have unique name in a database. Diagrammatically, FD between two attributes can be shown as,
FD : X —> Y which is read as,
Y is functionally dependent on X.
Consider the relation EMPLOYEE where EMP_ID uniquely identifies EMP_NAME and
EMP_SECTIONID. Thus, FD can be shown as:
FD: EMP_ID—>EMP_NAME
FD: EMP ID—>EMP_SECTIONID.

4.What is normalization? List the types of normal forms?


Ans: Normalization is the process of converting a relation to a standard form i.e., we need to
decompose a relation into smaller efficient tables/relations that depicts a good database design.
Different Normal forms are,
(a) First Normal Form (INF)
1NF
(b) Second Normal Form (2NF)
2NF
(c) Third Normal Form (3NF)
3NF
(d) Boyce-Codd Normal Form (BCNF)
BCN
F

75
4. What are the problems caused by redundancy?
Ans: the problems caused by redundancy are
(i) Anamolous Updation
If the update operation is performed- for example, the emp_sectionID 268 is updated to
520 and this correction is made only to the first record of the database, then this may lead to
inconsistent data unless all the copies in the database are updated. This is referred as update
anomalies. The changes must be done to all the copies of data.
(ii) Anomalous Deletions
Delete anomalies refers to the condition wherein it is impossible to delete some
information without losing some other information. For example, if we want to delete the grade
entries where grade is equal to ‘A’ then all the information of emp_sectionID-268 will be
deleted / lost.
(iii) Anomalous Insertions
Insertion anomalies refers to the condition where it is compulsory to store some
information in addition to some other information. For example, if a new employee record is
being entered, who has not yet assigned a emp_id, now if we assume that the null values are not
allowed, then it is impossible to enter the new record unless the new employee has been assigned
an emp_id. This is called as insertion anomalies.
5. What are the problems that can be caused by using decomposition?
Ans: The two problems caused by using decomposition are lossless join property and
dependency preservation.
(a) Lossless Join Property
This property helps to identify any instance of the original relation from the
corresponding instance of the smaller relation attained after decomposition.
(b) Dependency Preservation

76
This property helps to enforce constraint on smaller relation in order to enforce the
constraints on the original relation.
6. When do we have to decompose a relation?
Ans: We use normal forms to decompose a relation. Every relation schema is in one of these
normal forms and these normal forms can help to decide whether to decompose the relation
schema further or not.
7. Define first normal form? Explain with an example?
Ans: A relation schema is said to be in First Normal form if the values in the relation are atomic.
In simple words there should be no repeating groups in a particular column. A value can be
defined as an atomic value if it doesn’t contain a set of values.
A table is said to contain atomic values if there is a unique value of data item for any
given row and column intersection. For example, consider an EMPLOYEE relation with the
additional attribute dependents.
EMPLOYEE
EMP_ID EMP_SECTIONID EMP__NAME EMP_ADDRESS DEPENDENTS

0012 124 Ravi Hyderabad Father, Mother, Brother


0013 268 Nandan Delhi Wife, Mother, son
0014 314 Kumar Bangalore Wife, Daughter
0015 315 Rajesh Hyderabad Brother, Sister
0016 316 Satish Pune Wife, Sister

8. Define second normal form? Explain with an example?


Ans: A second normal form (2NF) can be defined as, a relation is said to be in 2NF if it is in 1NF
and non-key attribute is fully functionally dependent primary key attribute.
Consider a simple example of STUDENT relation:
Student (student_id, class_id, name, course, time)
Student_id and class-id in the primary key.
A student can attend different courses in different classes at different times.
STUDENT
Student_I Class_ID Name Course_ID Time/Day
D
0123 502 Ravi 312 10/10
0124 503 Kumar 313 10/07

77
0125 502 Mahesh 312 10/15
0126 504 Mehta 460 10/08
0126 505 Mehta 461 10/17
In this relation, two students can go to same class and one student can attend two
different courses. Name of the student can be determined by student_id. Therefore, a Non-key
attribute (name) is functionally dependant on a part of key (student_id) i.e , partially dependent.
This relation is not in second normal form (2NF).
9. How to decompose a non 2NF relation to 2NF?
Ans: For Example, Given R=ABCDEFGHIJ,
FD: ABC
ADE
BF
FGH
DIJ
The above relation is a non-2NF relation and to convert it into 2NF, we need to follow the
steps below:
1. Find the key by calculating the closure set.
AB+=ABCDEFGHIJ
2. The above relation R has partial dependencies:
ADE
BF
FGH
DIJ
3. Decompose the relation R to into three relations so that there are no partial dependencies.
R1=ADEIJ
R2=BFGH
R3=ABC.
10. Define full functional dependency, partial dependency?
Ans: Full Functional Dependency
This can be defined if every non-key dependents on the primary key.
For a relation R and FD X  Y, Y is fully functional dependent on X. If there is no Z.

78
where Z is the proper subset of X such that ZY. so FD XY such that ZY is false, Z Y.
so any subset of X(Z) does not functionally determine.
XY is Full Functional Dependency
e.g: Fatherson
e.g: cidcname
e.g: cidage

Partially Dependency
An attribute can be referred to as partially dependent if its value can be determined from
the one or more attributes of the primary key but not all.
For example (emp_id, job_id) EMP_ name - Name can be determined with the help
of empid. Name is partially dependent on composite key.
11. Define third normal form?
Ans:A relation is said to be 3NF if it is in 2NF and does not have transitivity dependencies.
A relation is said to be in 3NF if every determinant is a key i.e, for each and every
functional dependency FD
FD: A B, A is a key.
If any relation is in 3NF, then by default that relation is in 2NF.
12. What is transitive dependency?
Ans: A transitive dependency is a functional dependency which holds by virtue of transitivity. A
transitive dependency can occur only in a relation that has three or more attributes. Let A, B, and
C designates three distinct attributes (or distinct collections of attributes) in the relation. Suppose
all three of the following conditions hold:

1. A → B
2. It is not the case that B → A
3. B → C
Then the functional dependency A → C is a transitive dependency.
13. For relation R, determine the candidate key and if a relation is not in BCNF then
decompose it into a collection of BCNF relations.
R1 (A, C, B, D, E)
FDs: A → B, C→ D

79
Ans: First compute the keys for R1. The attributes A, C, E do not appear on right hand side of
any functional dependency therefore they must be part of a key. So we start from {A, C, E} and
find out that this set can determine all features. So the key is {A, C, E}.
We have dependencies A → B and C → D so the table is not BCNF. Applying the BCNF
decomposition algorithm, the non-BCNF dependency is A → B, therefore create two relations
(A, C, D, E) and (A, B). The first relation is still not in BCNF since we have a non-BCNF
dependency C→ D. Therefore decompose further into (A, C, E) and (C, D). Now all relations are
in BCNF and the final BCNF scheme is (A, C, E), (C, D), (A, B).
14. What is a prime attribute and non-prime attribute, give an example?
Ans: The attribute which is present in any of the candidate key is known as prime attribute. The
attribute which is not present in the candidate key is known as non-prime attribute.
Example, let us take a relation R ( A, B, C, D, E, F) and functional dependencies of this
relation are C → F, E→A, EC→D, A→B. Now to say what are the prime and non-prime
attributes are, first we need to find out the candidate key, to find candidate key we start with
writing attributes which is not present in the right hand side of FDs. So, C and E attributes will
be in candidate key, now find the closure of it, i.e,
(CE)+ = CEFADB
So now all attributes are determined. So we say CE as a key attribute or a candidate key.
Now, according to the example C, E attributes are prime attributes and F, A, D, B attributes are
non-prime attributes.
Long Answers:
1. What is schema refinement? Explain the problems caused by redundancy.
Ans: Schema Refinement
Schema A schema can be defined as a complete description of database. The specifications for
database schema is provided during the database design stage and this schema doesn’t change
frequently.
Consider an EMPLOYEE database that stores all the information regarding every employee.
EMPLOYEE EMP_ID EMP_SECTIONID EMP_NAME EMP_ADDRESS
0012 124 Ravi Hyderabad
0013 268 Nandan Delhi
0014 314 Kumar Bangalore

80
SECTION SECTION_ID EMP_ID JOB_SECTION GRADE
268 0013 Manager A
124 0012 Clerk C
314 0014 Secretary B
The schema diagram for this database is as follows:
EMP_ID EMP_SECTJONID EMP_NAME EMP_ADDRESS
SECTION_ID EMP_ID JOB_SECTION GRADE
Refinement approaches, based on decompositions are used by the schema refinement to solve the
problems which occurs due to redundancy. Redundancy in its simplest terms can be defined as
duplication of data. The root cause for all the problems which exists in database is redundancy.
One way to eliminate the redundant data is to make use of decompositions. Decompositions
means breaking into smaller relation schemas. This division into smaller schemas is based on the
functional and other dependencies which are specified by the database designer.
But, use of decomposition may lead to its own problems and hence should be used with more
caution. Decomposition of the relation may give rise to other problems and therefore, while
using it one must be more cautious.

Problems Caused by Redundancy


EMPLOYEE
EMP_ID EMP_NAME EMP_SECTIONID JOB_SECTION GRADE

0012 Ravi 124 Clerk C

0013 Nandan 268 Manager A

0013 Nandan 268 Manager A

0013 Nandan 268 Manager A

0014 Kumar 314 Secretary B

0016 Teja 059 Asst.Manager D

81
0012 Ravi 124 Clerk C

Consider the above database. The three tuples with emp_id 0013 repeat the same name
and same job-section information. This repetition wastes space as well as causes data
inconsistency i.e., this redundant data may lead to loss of data integrity. If for example, some
update operation is being carried out, entering new record for an employee with id 001 3. This
must be done multiple times i.e., it must be done for each file which stores the employees details.
This leads to redundant storage i.e., the same information is stored multiple times.
(i) Anomalous Updation
If the update operation is performed- for example, the emp_sectionID 268 is updated to
520 and this correction is made only to the first record of the database, then this may lead to
inconsistent data unless all the copies in the database are updated. This is referred as update
anomalies. The changes must be done to all the copies of data.
(ii) Anomalous Deletions
Delete anomalies refers to the condition wherein it is impossible to delete some
information without loosing some other information. For example, if we want to delete the grade
entries where grade is equal to ‘A’ then all the information of emp_sectionID-268 will be
deleted / lost.
(iii) Anomalous Insertions
Insertion anomalies refers to the condition where it is compulsory to store some
information in addition to some other information. For example, if a new employee record is
being entered, who has not yet assigned a emp_id, now if we assume that the null values are not
allowed, then it is impossible to enter the new record unless the new employee has been assigned
an emp_id. This is called as insertion anomalies.
Problems Caused by Redundancy
Consider the above table which is a part of database ‘COMPANY’. This database
contains different tables.
(a) Employee (Emp_ID, Emp_name, Emp_sectionID, Job_section, Grade)
(b) JOB (Job_ID, Job_section, Job_sectionID, Emp_ID)
(c) Grade (Job_section, Job_sectionID, Grade)
As shown in the table, the tuple with Emp_ID 0013 is repeated several times and also the

82
attribute Job_section is repeating in all the tables. If, for example, some update operation like,
the employee with Emp_ID 0013 is promoted to the job of “Managing director”, then this
information must be updated in all the tables referring to Emp_ID ‘0013’ and following
updations must be carried out.
Job_section = ‘Managing director’ grade = A+ and job_sectionId = “523”. This leads to
redundant storage of same data in multiple tables.
Due to redundancy, this update operation may lead to loss of data consistency (integrity).
For example, in EMPLOYEE relation, the updations are carried out correctly but in JOB
relation, the job_sectionID is updated to emp_sectionID and in relation grade job_sectionID is
updated to job_id. Thus causing loss of data.
Consider a single tuple for all the relations
Before Updation:
EMPLOYEE
emp_id EMP_Name emp_sectionId job_section grade
0013 Nandan 268 Manager A
JOB
Job_id Job_section Job_sectionId emp_id
891 Manager 523 0013

GRADE
Job_section job_sectionid grade
Manager 523 A
After Updation:
EMPLOYEE
emp_id Emp_name emp_sectionid job_section grade
0013 Nandan 268 Managing director A+
JOB
job_id job_section job_sectionId emp_id
891 Managing director 268 0013
GRADE

83
job_section job_sectionid grade
Managingdirector 891 A+
2. What is decomposition? Why is it used? What are the problems related to
decomposition?
Ans: Decomposition is the solution to the problem caused by data redundancy. Decomposition
means breaking up the large schema into smaller multiple schemas. Decomposition helps to
remove all the above mentioned anomalies and also helps to maintain data integrity.
We can restrict the redundancy in EMPLOYEE database by dividing it into two smaller
relations/schemas as follows:
EMPLOYEE
EMP_ID EMP_NAME JOB-SECTION GRADE
0012 RAVI CLERK C
0013 NANDAN MANAGER A
0013 NANDAN MANAGER A
0013 NANDAN MANAGER A
0014 KUMAR SECRETARY B
0016 TEJA ASST. MANAGER D
0012 RAVI CLERK C

SECTION
EMP-SECTIONID GRADE
124 C
268 A
314 B
059 D
Now we can easily update section-id in the schema SECTION without bothering about
the updations in the other tuples. To insert a new tuple, we can directly insert the new record in
the schema section (with the help of section-id) even if the new employee has not yet been
assigned the emp_id. To delete the entry with the grade equal to ‘A’, we can do it directly on the
SECTION schema which doesn’t lead to loss of other information. Thus, decomposition

84
eliminates the problems caused by different anomalies.
Note
 emp_id is primary key for the relation employee.
 emp section Id is primary key for the relation section.
[Due to insertion anomalies, unless and until emp_id is entered we cannot enter other
attributes but after decomposing the relation even if emp_id is not assigned we can still enter the
other data items.]
Problems Related to Decomposition
As already mentioned, the use of decomposition may lead to its own problems, therefore
one should be more careful with the use of decomposition. The questions that must be answered
in order to use decomposition are:
(a) What are the problems that can be caused by using decomposition and
(b) When do we have to decompose a relation.
To answer the first question, two properties of decompositions are considered:
(a) Lossless Join Property
This property helps to identify any instance of the original relation from the
corresponding instance of the smaller relation attained after decomposition.
(b) Dependency Preservation
This property helps to enforce constraint on smaller relation in order to enforce the
constraints on the original relation.
To answer the second question, number of normal forms are into existence. Every
relation schema is in one of these normal forms and these normal forms can help to decide
whether to decompose the relation schema further or not.
One of the drawback of decomposition is that it enforces us to join the decomposed
relations in order to solve the queries of the original relation. This may result in the performance
degradation if such queries are common. In order to improve the performance we may ignore the
problems caused by redundancy and also may not decompose the relation.
3. What is a functional dependency? When is an FD implied by a set of F of FDs?
Ans: Functional dependency is the one of the most important concept related with normalization.
With Normalization a relation is converted to a standard form.
A functional dependency FD defines the relationship between different attributes of a

85
relation in a database.
R={X,Y,Z,K,L,M}
Where R is a relation and X,Y,Z,K,L,M are attributes. All attributes are unique i.e., they
have unique name in a database. Diagrammatically, FD between two attributes can be shown as,
FD : X —> Y which is read as,
Y is functionally dependent on X.
Consider the relation EMPLOYEE where EMP_ID uniquely identifies EMP_NAME and
EMP_SECTIONID. Thus, FD can be shown as:
FD: EMP_ID—>EMP_NAME
FD: EMP ID—>EMP_SECTIONID.
This means that if any two records in a relation have the same value for their attribute
EMP_ID, then they must have the same value for their attribute
EMP_Name
FD: X—>Y

X Y
Y is Functionally dependent on X
Figure (1)
Example
 BRANCH—> Roll Number
branch - CSIT —> 02561
Roll number is functionally dependent on branch. Each branch is associated with a
unique roll number.
 STUDENT X Branch
This diagram indicates that Branch is not functionally dependent as Student.
IT
Student
CSE
A student can belong to any branch.
 FDX—>Y
Attribute ‘X’ which is on the left hand side is called as determinant. It is the key for the

86
relation as it uniquely identifies the attribute’s value in a tuple/ record.
‘R’ is a relation schema that consist of set of functional dependencies (FD), in addition to
the FDs, there are several additional FD s that hold. Consider an example,
SALE (product, date, customers, vendor, vendorcity)
Where product is the key.
FD: Product —> Vendor (1)
FD: Vendor —> Vendorcity (2)
If two tuples i.e., two records in relation “SALE” have the same product value (from
FDI) and since FD2 given to hold, they must have the same vendorcity value as they have same
value for the attribute vendor. Due to these reasoning the FD product —> vendorcity holds on
SALE.
Product —> Vendor (1)
Vendor —>Vendorcity (2)
Therefore, Product — >Vendorcity
To summarize, we can say that a functional pendency ‘f’ is implied by a given set of
functional dependencies if ‘f’ holds whenever all the FDs hold.
4. Explain about the closure of a set of FDs and the attribute closure?
Ans: Closure of a Set of FDs
Notation for the closure of set of FDs is F + and is defined as collection of all the FDs that
satisfies all the dependencies of the given set of FDs.
In order to find out the closure of a given set F of FD’s Armstrong Axioms are used. The
following one the three rules which are collectively known as Armstrongs Axioms. A,B, and C
are used to denote the set of attributes over a relation R:
(a) Reflexivity: If A⊇ B, then A —> B

(b) Augmentation: If A —>B, then AC —> BC for any C


(c) Transitivity: If A —>B and B—>C, then A —>C
Armstrong’s axioms are said to be complete because all the FDs in closure F + are
computed by the repeated application of all these rules.
Armstrong axioms are “sound” because they do not generate wrong dependencies
because they generate dependencies which are in the closure of F+.
In addition to the above rules some other rules are applied

87
(d) Union: if A —>B and A —>C then A —> BC
(e) Decomposition: if A —> BC, then A — >B and A—>C
(f) Pseudo transitivity: If A —> B and CB —> D then AC—>D.
To understand these rules, consider a SALE relation as follows:
SALE (product, date, customer, vendor, vendor city)
Now, the schema for this particular relation can be represented as PDCVV_C. The
meaning of one record is this relation is that the sale of product with production (P) on date (D)
to the customer with id as customer_id (C) is done by the vendor whose id is denoted as vendor
id(V) and who resides in the city whose id is given by vendor city (V_C).
The following functional dependencies hold,
1. The product id P is a key All the attributes one dependent on key.
P PDCVV-Z
2. A vendor purchases a given product on a single date.
VPD
3. The customer purchases almost one product from the vendor.
VC —>P
4. Every vendor is associated with then own city
V —> V_C
5. Every product belongs to one of the vendor
P—>V
In addition to these FDs, several other FDs also hold in the closure of given FDs
From
P —>V and V —> V_C and transitivity, we can compute
P  V_C
From
V —> V_C and augmentation, we can compute
VP—>PV_C
From
P —> PV_C and VP —>D, we infer,
VPPDV_C
From

88
VC —>P and P PDCVV_C and transitivity, we can infer
VC —> PDCVV_C.
With decomposition we can compute many other FDs such as
From P —> PDCVV_C and using decomposition, we can compute
P —>D, P —>P, P —>C, P —> V, P —>V_C
Attribute Closure
It is always not necessary to compute the F+ in order to find out whether a dependency, A
—> B, is in the closure of set of FDs. Another method is to compute the attribute closure A + for
the given dependency with respect to F (set of functional dependency). Attribute closure can be
defined as a set consisting of attributes X such that X —>A can be computed using the
Armstrong axioms. The algorithm to compute attribute closure of a set A is as follows:
Closure - A
while (No change)
{
if(K —> L in F && K⊆ closure)
closure = closure ∪ K
}
With set A containing the single attribute, this algorithm can be changed to find other keys and
this algorithm can be stopped as soon as the closure set contains all the attributes.

5. What is normalization? List the four types of normal forms based on FDs. Explain each
of them briefly.
Ans: As mentioned earlier, Normalization is the process of converting a relation to a standard
form i.e., we need to decompose a relation into smaller efficient tables/relations that depicts a
good database design.
With the help of primary key and candidate key, different relations can be analyzed. This
technique is called as Normalization. Normalization technique involves a sequence of rules that
are employed to test individual relations so that the database can be normalized to any degree.
The main objective of normalization is refining the design of database in order to remove data
maintaining anomalies, reduces data redundancy and also it eliminates data inconsistency.
The process of normalization is based on the concept of normal forms. Each and every

89
normal form has its own set of properties and constraints. A table / relation is said to be in normal
form if it satisfies all the properties of that normal form:
These properties/conditions are usually applied on the attributes of the tables and also on
the relationship among them.
Different levels of normal forms are used to specify different factors and helps in
reducing data maintenance anomalies.
Different Normal forms are,
(a) First Normal Form (INF)
(b) Second Normal Form (2NF)
(c) Third Normal Form (3NF)
(d) Boyce-Codd Normal Form (BCNF)

Relation

Normalization Process 1NF,


2NF, 3NF, BCNF

Smaller more efficient relation

First Normal Form


A relation schema is said to be in First Normal form if the values in the relation are
atomic. In simple words there should be no repeating groups in a particular column. A value can
be defined as an atomic value if it doesn’t contain a set of values.
A table is said to contain atomic values if there is a unique value of data item for any
given row and column intersection. For example, consider an EMPLOYEE relation with the
additional attribute dependents.
EMPLOYEE
EMP_ID EMP_ SECTIONID EMP__NAME EMP_ADDRESS DEPENDENTS
0012 124 Ravi Hyderabad Father,Mother,
Brother

90
0013 268 Nandan Delhi Wife, Mother, son

0014 314 Kumar Bangalore Wife, Daughter


0015 315 Rajesh Hyderabad Brother, Sister
0016 316 Satish Pune Wife, Sister
Here, the column “dependents” have non-atomic values. In order to convert this relation
into INF, we have to convert these non-atomic values to atomic values. The table below shows
the relation “EMPLOYEE” in INF.
EMPLOYEE
EMP_ID EMP_SECTIONID EMP_NAME EMP_ADDRESS DEPENDENTS
0012 124 Ravi Hyderabad Father

0012 124 Ravi Hyderabad Mother


0012 124 Ravi Hyderabad Brother
0013 268 Nandan Delhi Wife
0013 268 Nandan Delhi Mother
0013 268 Nandan Delhi Son
0014 314 Kumar Banglore Wife
0014 314 Kumar Banglore Daughter
0015 315 Rajesh Hyderabad Brother
0015 315 Rajesh Hyderabad Sister
0016 316 Satish Pune Wife
0016 316 Satish Pune Sister
Now, the relation EMPLOYEE is in INF since the column dependents’ have atomic
‘value .But the other attributes - emp_id, emp-section id, EMP_Name, and emp address and all
repeating and forming a group called is, for each value of attribute dependents the values are
repeating. However, the rule of INF says that any repeated group in a relation must be eliminated
as it give rise to date redundancy. So in order to delete the repeated groups from the table, the
table must be decomposed into other smaller tables by providing a link to the decomposed table
(link specifies the parent table which is decomposed into child tables). For example, the above
relation “EMPLOYEE” can be decomposed into two tables namely:
(a) EMP
(b) EMP_ dependents.
Each of these tables have their own primary keys. For table “EMP” the primary key is
EMP_ID and for table EMP dependents the primary key is S.No. Attribute EMP_ID is present in
both the tables which specify the link between two tables and the original table from which the
tables are derived.

91
Relation = = “EMP”E
EMP_ID EMP-SECTIONID EMP_NAME EMP_ADDRESS
0012 124 Ravi Hyderabad
0013 268 Nandan Delhi
0014 314 Kumar Banglore
0015 315 Rajesh Hyderabad
0016 316 Satish Pune
Relation=”EMP_DEPENDENTS”
S.No EMP_ID Dependents
1 124 Father
2 124 Mother
3 124 Brother
4 268 Wife
5 268 Mother
6 268 Son
7 314 Wife
8 314 Daughter
9 315 Brother
10 315 Sister
11 316 Wife
12 316 Sister
The most important point to remember is that a relation in database must always be in
firstNorma1 form.
Second Normal Form
Before we proceed with discussion of 2NF it is important to know the concepts of,
(a) Full functional dependency.
(b) Partially dependent.
(c) Transitive dependency.
(d) Trivial and Non-trivial functional dependency.
• Key refers to the primary key of the table
• Non-key refers to the other attributes of the table.
• Composite primary key refers to the primary key consisting of more than one attribute.
Full Functional Dependency
This can be defined if every non-key dependents on the primary key.
For a relation R and FD X  Y, Y is fully functional dependent on X. If there is no Z.
where Z is the proper subset of X such that ZY. so FD XY such that ZY is false, Z Y.
so any subset of X(Z) does not functionally determine.

92
XY is Full Functional Dependency
e.g: Fatherson
e.g: cidcname
e.g: cidage
Partially Dependent
An attribute can be referred to as partially dependent if its value can be determined from
the one or more attributes of the primary key but not all.
For example (emp_id, job_id) EMP_ name - Name can be determined with the help
of emp_id. Name is partially dependent on composite key.
Transitive Dependency
A transitive dependency is a functional dependency which holds by virtue of transitivity. A
transitive dependency can occur only in a relation that has three or more attributes. Let A, B, and
C designate three distinct attributes (or distinct collections of attributes) in the relation. Suppose
all three of the following conditions hold:
1. A → B
2. It is not the case that B → A
3. B → C
Then the functional dependency A → C is a transitive dependency.
Trivial and Non-trivial dependency:
Trival: If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called
a trivial FD. Trivial FDs always hold.
Non-trivial: If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial
FD.
A second normal form (2NF) can be defined as, a relation is said to be in 2NF if it is in
1NF and non-key attribute is fully functionally dependent primary key attribute.
Consider a simple example of STUDENT relation:
Student (student_id, class_id, name, course, time)
Student_id and class-id in the primary key.
A student can attend different courses in different classes at different times.
STUDENT
Student_ID Class_ID Name Course_ID Time/Day

93
0123 502 Ravi 312 10/10
0124 503 Kumar 313 10/07
0125 502 Mahesh 312 10/15
0126 504 Mehta 460 10/08
0126 505 Mehta 461 10/17
In this relation, 2 students can go to same class and one student can attend two different
courses. Name of the student can be determined by student_id. Therefore, a Non-key attribute
(name) is functionally dependant on a part of key (student_id) 1 e , partially dependent.
This relation is not in second normal form (2NF). This may lead to several problems such as:
(a) The name of the student is repeated every time he/she takes a different course.
(b) If name of the student is updated, then the entire tuple of the student must be updated. Thus,
giving rise to update anomalies.
(c.) This result in loss of data integrity as the relation will show different rows of information for
the same student. And hence data redundancy occurs. .
(d) This also leads to insertion anomolies as if the student is not attending any classes then there
will be no rows which to keep the student’s name.
So, in order to solve these problems, the student relation can be broken-down into two sub-tables
(child) both of which are in 2NF.
(a) student (student_id, class id, course_id, day)
(b) Student 1 (student_id, Name)
Foreign key: Student_id References STUDENT.
Note: Underlined attribute are primary keys.
(Student_id, class_id) —> composite key
References is a Keyword.
STUDENT
Student_id Class_id Course_id Day
0123 502 312 10/10
0124 503 313 10/07
0125 502 312 10/15
0126 504 460 10/08
0126 505 461 10/17
STUDENT 1
Student_id Name

94
0123 Ravi
0124 Kumar
0125 Mahesh
0126 Mehta
These two relations are called projections of the original relation. A projection in a
relation which selects certain attributes from the original relation and present them in a new
relation. These two projections and in 2NF and also solves all the problems listed above.
The following steps are noteworthy in order to decompose a non-2NF relation into 2NF
relation.
a) Create a new. relation by using the attributes from the offending FD as the attributes of
the New relation [i.e., eliminate “name” from the original table]
student_id,. —>Name. (Name is now fully functionally dependant on key)
b) The determinant (left hand side of the equation) i.e., student_id is the key of the new
relation.
c) The attribute on the right hand side (Name) is deleted from the original relation.
d) Repeat the steps (a,b,c) if more than one FD prevents the relation from being in 2NF.
e) Place all the attributes functionally dependent on the determinant if it is appearing in
more than one ED as non-key attributes in a relation having the determinant key.
Third Normal Form (3NF)
A relation is said to be 3NF if it is in 2NF and does not have transitivity dependencies.
A relation is said to be in 3NF if every determinant is a key i.e, for each and every
functional dependency FD
FD: A B, A is a key.
If any relation is in 3NF, then by default that relation is in 2NF.
Consider the same relation. “STUDENT” as discussed before but with additional attribute
“FEE” [For a particular course]
Student (student_id, name, course_id, fee)
Consider the following FDs:
FD: student_id —>course_id
FD: student_id—> fee .
These two FDs are in 3NF since both of them satisfies the 3NF criterion i.e., the
determinant should be a primary key. These two FDs are in 3NF, which also concludes that these
FDs are in 2NF. Now consider an another FD:

95
FD: course_id —> Fee
“Fee” attribute in functionally dependent upon the course_id but this course_id is not primary
key and hence this FD violates the 3NF criterion and therefore the relation “student” is not is
3NF (because according to the definition of 3NF for every FD: A —* B , A should be a primary
key).
STUDENT
STUDENT_ID NAME COURSE_ID FEE
0123 Ravi 312 5000
0124 Kumar 313 3500
0125 Mahesh 312 5000
0126 Mehta 460 4,500
The following is the list of problems which arises when a relation is not in 3NF form.
(a) The attribute “fee” is repeated for every row where the course_id is same. This leads to and
data redundancy and also wastage of storage space.
(b) If the “fee” of the course is updated, then every such row must be updated leading to update
anomalies. If we delete the attribute fee then we may lose the data giving rise to delete
anomalies.
(c) if there are no course_ids to be entered then there are no rows in which to keep the attribute
“fee” leading to insertion anomalies.
All the problems discussed above are similar to the problems of 2NF. In order to solve
these problems, the relation must be converted to 3NF form. The following are the steps involved
in the process of conversion
student (student_id, name, course_id, fee)
FD: student_id –>course_id
FD: student_id —> fee
FD: student_id
FD: Course_id —> fee.
The last FD is also called as Transitivity dependency which occurs when a non-key
attribut_[(Course_id) attribute which is not a primary key] is functionally dependent upon the
other non-key attributes (fee).
The first step towards the conversion is the removal of the attribute, which is on the right
hand side of the FD, violates the conditions of 3NF i.e., eliminating “fee” attribute from the
original relation which gives rise to and relation - student 1.

96
course_id is the foreign key reference student 2.
Student 1 (student_id, name, Course_id)
(a) Form another relation, student 2, which consist of attributes of the FD that doesn’t
support 3NF criterion. That is, student2, consist of attributes of the following FD
FD: Course_id —> fee.
The determinant of this FD is the primary key of the new relation.
student 2 (Course_id, fee)
STUDENT 1
Student_id Student_Name Course_id
0123 Ravi 312
0124 Kumar 313
0125 Mahesh 312
0126 Mehta 460
STUDENT 2
Course_id Fee
312 5000
313 3,500
460 4,500
To summarize, we can say that if a relation is in 3NF then it fellows that it is also in 2NF
and also in 1NF.
3NF—>2NF1NF
This version of 3NF where every determinant is a primary key is called Boyce-codd
normal form.
Third Normal form can be defined as, a relation is said to be in 3NF if it has no
transitivity dependency for example, we have a student relation,
Student (student_id, name, Course_id, fee)
FD1: Studentid —> Course_id
FD2: Student_id — >fee
FD3: Course_id —> fee
FD3 is in transitivity dependency and therefore the relation “student” is not in 3NF. This
relation can be converted to 3NF by decomposing it into another relation as follows:
Student (Student_id, name, Course_id) and
Course (Course_id, fee, Student_id)
Note Student_id is foreign key.

97
Every relation that is in BCNF, is also in 3NF and therefore in 2NF and hence is INF.
BCNF
Let ‘R’ be a relation schema, ‘F’ be the set of functional dependencies given to hold over
‘R’, ‘X’ be a subset of the attributes of ‘R’, and ‘A’ be an attribute of ‘R’. ‘R’ is said to be in
BCNF (Boyce-Codd Normal Form) if for every FD X —> A in F, one of the following is true.
(i) A ∈ X [i.e., it is a trivial FD] or
(ii) X is a super key.
In a BCNF relation, the only nontrivial dependencies are those in which a key determines
few attributes. Thus, each tuple can be thought of as an entity or relationship, identified by a key
and described by remaining attributes. If suppose we use ovals for representing attributes or sets
of attributes and draw arcs for indicating FDs, then the structure of a relation is BCNF will be as
shown in below figure.

Non-Key Non-Key Non-Key Non-Key


Key Attribute 3
Attribute 1 Attribute 2 Attribute 3

Figure (a):
Function Dependencies in a BCNF Relation
Boyce Codd Normal Form (BCNF) ensures that no redundancy can be detected using FD
information alone. If we take into account only FD information, then it is the most desirable
normal form. The below figure shows an instance of a relation with 3 attributes i.e., X, Y and A.
X Y A
x y1 a
x y2 ?
Figure (b): Instance Illustrating Boyce-Codd Normal Form
There exists 2 tuples with the same value in the X column. Consider this satisfies an FD
X —>A. We observe that one of the tuples has value ‘a’ in ‘A’ column. Using the FD, we can
conclude that the second tuple also has the value ‘a’ in this column. But ‘a’ appears twice. But
this type of situations cannot arise in a BCNF relation, if this relation is in BCNF, since A is
distinct from X, it follows that X should be a key. If X is a key, then yl = y2 i.e, two tuples are

98
identical. A relation is defined as a set of tuples, therefore we cannot have 2 copies of same tuple.
Therefore the situation shown in figure (b) cannot arise. Thus, if a relation is in BCNF every
field of every tuple records a piece of information, which cannot be inferred from the values in
all other field in the relations instance.

1NF
2NF
3NF

BCN
F

6. For the following relational schema tell whether it is in 3NF or not. EMPLOYEE
(E_code, E_name, Dname, Salary, Project_No, Termination_date_of_project), where each
project has no unique Termination_date_of_project. If it is not in 3NF bring it into 3NF
through normalization?
Ans: A relation to another schema R is in 3NF if for all nontrivial functional dependencies in F +
of the form X—> A, either X contains a primary key i.e., X is a super-key or ‘A’ is a primary key
attribute 3NF is based on the concept of transitive dependency, which states that a functional
dependency X —> Y in a relation schema R is a transitive dependency of there exists a set of
attributes Z, which is neither a candidate key nor a subset of any key of R1 and both X —>Z and
Z —> Y are satisfied from Codd’s rules, a relation schema R is in 3NF if it satisfies 2NF and no
non-key attribute of R is transitively dependent on the primary key let us see the employee
relation schema.
E_code E_name Dname Salary Project_No Terminat on_dateof_project
CSE_03 Raju CSE 5000 CS 120 14-8-06
CSEO4 Ramu CSE 5000 CS 140 16-8-06
ECE_19 Rajesh ECE 4500 CS 180 12-8-06
This employees relation holds following dependencies.

99
FD : E_Code —> E_name, Dname, Salary
FD : E_Code —> Project_No, Termination_date of_project.
3NF rules are satisfied by above functional dependencies but these ru1es are not satisfied by
below FD.
FD: E_Code —> Project_no, Termination_date of_project.
Here E_name is not a primary key but E-code is a primary key in first two FD’s. Thus we can
easily say employee relation is not in 3NF.
We can translate employee relation to satisfy 3NF by decomposing the employee relation into
smaller relations, where every non-key attribute should be made as primary key attributes. This
is shown in table below.
E_code E_name Dname Salary
CSE_03 Raju CSE 5000
CSE 04 Ramu CSE 5000
ECE_19 Rjesh ECE 4500
E_code Project_No Termination_dateof_project
CSE_03 CS 120 14-8-06
CSE_04 CS 140 16-8-06
ECE_19 CS 180 12-8-06

Gokaraju Rangaraju Institute of Engineering and Technology


100
Department of Computer Science and Engineering
Year/Semester: II/I Academic Year: 2015-16
SUB:DBMS Tutorial Sheet: UNIT IV-2

Short answer questions


1. Define lossless join dependency?

2. Define Data Redundancy?

3. What is multi-valued dependency?

4. Define 4NF?

5. What is a join dependency?

6. Explain BCNF?

7. Distingue between 3NF and BCNF?

8. Normalize the relation R(A,B,C,D,E,F,G,H) into 3NF using the set of FDs
AB->C
BC->D
CDE->ABH
BH->A
D->EF
9. What are the advantages and disadvantages of normalization?

10. Explain update anamolies?

11. Explain delete anamolies?

12. Explain insertion anamolies?

13. Explain trivial and non-trivial functional dependency?

14. Given a functional dependencies in relation R


ABC
BD
DE. Find AB+
15. Mention the steps to calculate the closure set of an attribute?

Descriptive questions/Programs/Experiments
1. Explain BCNF briefly?

101
2. Explain lossless join decomposition?

3. Explain dependency preserving decomposition?

4. Explain schema refinement in database design?

5. Explain multi-valued dependencies briefly?

6. Explain 4NF briefly?

Tutor Faculty HOD

Gokaraju Rangaraju Institute of Engineering and Technology

102
Department of Computer Science and Engineering
Year/Semester: II / I Academic year: 2015-16
SUB:DBMS Tutorial Sheet: UNIT IV-2
Short answer questions
1. Define lossless join dependency?
Ans: Lossless join dependency can be defined as the one which generates no additional tuples
when the natural “join” operation is performed on the decomposed relation schemas.
2. Define Data Redundancy?
Ans: Data redundancy is the existence of data that is additional to the actual data and permits
correction of errors in stored or transmitted data.
3. What is multi-valued dependency?
Ans: A multivalued dependency on R, X —> —>Y, says that if two tuples of R agree on all the
attributes of X, then their components in Y may be swapped, and the result will be two tuples
that are also in the relation i.e., for each value of X, the values of Y are independent of the values
of R-X-Y.
4. Define 4NF?
Ans: Fourth normal form (4NF) is a level of database normalization where there are no non-
trivial multi-valued dependencies other than a candidate key.
It builds on the first three normal forms (1NF, 2NF and 3NF) and the Boyce-Codd Normal Form
(BCNF). It states that, in addition to a database meeting the requirements of BCNF, it must not
contain more than one multi-valued dependency.
5. What is a join dependency?
Ans: A join dependency is a constraint on the set of legal relations over a database scheme. A
table T is subject to a join dependency if T can always be recreated by joining multiple tables
each having a subset of the attributes of T. If one of the tables in the join has all the attributes of
the table T, the join dependency is called trivial.
6. Explain BCNF?
Ans: Let ‘R’ be a relation schema, ‘F’ be the set of functional dependencies given to hold over
‘R’, ‘X’ be a subset of the attributes of ‘R’, and ‘A’ be an attribute of ‘R’. ‘R’ is said to be in
BCNF (Boyce-Codd Normal Form) if for every FD X —> A in F, one of the following is true.
(i) A ∈ X [i.e., it is a trivial FD] or

103
(ii) X is a super key.
7. Distinguish between 3NF and BCNF?
Ans:
3NF BCNF
1. Its focus is on primary keys. 1. Its focus is on candidate keys.
2. Redundancy is high compared to BCNF 2. Redundancy is low compared to BCNF
3. If there is a dependency of the form 3. If there is a FD XY it is allowed in
XY, it is allowed in 3NF, if X is a BCNF, if X is a super key.
super key or Y is a part of some key.
8. Normalize the relation R(A,B,C,D,E,F,G,H) into 3NF using the set of FDs
AB->C, BC->D, CDE->ABH, BH->A, D->EF
Ans: Given relation is R(A,B,C,D,E,F,G,H)
Functional dependencies are,
AB—>C
BC—>D
CDE—>ABH
BH—>A
D—>EF
Since, BC —> D and D —> EF, we can write BC —> EF (Transitivity rule)
As CDE —> ABH, we can write
CDE—>A
CDE—>B
CDE —> H [i.e., decomposition Rule]
or we can write it as
CDE—>A
CDE—>BH
Since, CDE —> BH and BH —> A, we can write, CDE —> A (transitivity Rule)
The decomposition is dependency preserving.
9. What are the advantages and disadvantages of normalization?
Ans:
Advantages of normalization Disadvantages of normalization
1. It reduces redundancy and hence 1. It reduces query retrieval performance
insertion, deletion and modification (too much normalization effects query

104
problems. retrieval).
2. It improves query retrieval 2. Normalized tables will not have real-
performance. i.e, if it properly world meaning. i.e, database design always
normalized, then we can get the data should have proper documentation in order
quickly. to make the users understand the
normalized tables.
10. Explain update anomalies?
Ans: If the update operation is performed- for example, the emp_sectionID 268 is updated to 520
and this correction is made only to the first record of the database, then this may lead to
inconsistent data unless all the copies in the database are updated. This is referred as update
anomalies. The changes must be done to all the copies of data.
11. Explain delete anomalies?
Ans: Delete anomalies refers to the condition wherein it is impossible to delete some information
without loosing some other information. For example, if we want to delete the grade entries
where grade is equal to ‘A’ then all the information of emp_sectionID-268 will be deleted / lost.
12. Explain insertion anomalies?
Answer: Insertion anomalies refers to the condition where it is compulsory to store some
information in addition to some other information. For example, if a new employee record is
being entered, who has not yet assigned a emp_id, now if we assume that the null values are not
allowed, then it is impossible to enter the new record unless the new employee has been assigned
an emp_id. This is called as insertion anomalies.
13. Explain trivial and non-trivial functional dependency?
Ans: A FD is said to be trivial if RHS subset or equal to LHS. Ex. ABB, B can determine B,
ABB is always true.
Non- trivial: A FD is said to be non-trivial if one of the attributes from RHS is not a subset
of LHS. Ex ABBC, this FD is non-trivial because B can be determined from AB but not C.
14. Given a functional dependencies in relation R
ABC
BD
DE. Find AB+
Ans: Include the dependents of A and B,
AB+ = ABCD

105
Include the dependents of the left-out attributes, that are from the above closure set,
AB+ = ABCDE
15. Mention the steps to calculate the closure set of an attribute?
Ans:
Step1: Equate attribute(s) to X for which closure needs to be identified.
Step2: Take each FD and check whether its LHS is available in X, if so add its RHS attribute to
X, if it is not already available.
Step3: Repeat step2 as many times as possible to cover are FD’s.
Step4: Declare X as closure set of attribute(s) after no more attributes can be added to X.
Long Answers
1. Explain BCNF briefly?
Ans:
BCNF
Let ‘R’ be a relation schema, ‘F’ be the set of functional dependencies given to hold over
‘R’, ‘X’ be a subset of the attributes of ‘R’, and ‘A’ be an attribute of ‘R’. ‘R’ is said to be in
BCNF (Boyce-Codd Normal Form) if for every FD X —> A in F, one of the following is true.
(i) A ∈ X [i.e., it is a trivial FD] or
(ii) X is a super key.
In a BCNF relation, the only nontrivial dependencies are those in which a key determines
few attributes. Thus, each tuple can be thought of as an entity or relationship, identified by a key
and described by remaining attributes. If suppose we use ovals for representing attributes or sets
of attributes and draw arcs for indicating FDs, then the structure of a relation is BCNF will be as
shown in below figure.

Figure(a): Function Dependencies in a BCNF Relation


Key Boyce Codd Normal Form (BCNF) ensures that no redundancy can be detected using FD
information alone. If we take into account only FD information, then it is the most desirable
normal form. The below figure shows an instance of a relation with 3 attributes i.e., X, Y and A.
X Y A
X y1 A

106
X y2 ?

Figure(b): Instance Illustrating Boyce-Codd Normal Form


There exists 2 tuples with the same value in the X column. Consider this satisfies an FD
X —>A. We observe that one of the tuples has value ‘a’ in ‘A’ column. Using the FD, we can
conclude that the second tuple also has the value ‘a’ in this column. But ‘a’ appears twice. But
this type of situations cannot arise in a BCNF relation, if this relation is in BCNF, since A is
distinct from X, it follows that X should be a key. If X is a key, then yl = y2 i.e, two tuples are
identical. A relation is defined as a set of tuples, therefore we cannot have 2 copies of same tuple.
Therefore the situation shown in figure (b) cannot arise. Thus, if a relation is in BCNF every
field of every tuple records a piece of information, which cannot be inferred from the values in
all other field in the relations instance.
2. Explain lossless join decomposition?
Ans: The definition for decompositions is already discussed but more formally a decomposition
can be defined as:
D1 and D2 are the two relations decomposed from the relation ‘R’ such that by
performing natural “join” operation on and results in the original relation “R”. In other words,
‘R’ is replaced by two smaller relations and whose attributes one subset of attributes of R.
Consider the example of the relation SALE.
Sale (Product, data, customer, vendor, vendor-city)
The following are the FDs for this relation,
FD1: Product —> PDCVV_C where
P – Product, D-date, C-customer, V- Vendor, VC - Vendorcity
FD2: V—>VC
FD3: P —> V
FD2 is not in 3NF as V is not a primary key and V_C does not form any part of the key.
To make the above relation in BCNF it can be decomposed into two relations as shown.
(a) Product (Product, Date, Customer, Vendor) and
(b) Vendor (Vendor, Vendorcity)
The FD: P—>PDCVV_C holds over the relation product (PDCU) as product is the
primary key.

107
The FD: V—>VC holds over the relation vendor (VV_C) as Vendor is the primary key.
Lossless Join Dependency
This is one of the properties of decompositions. We can easily make out from the name -
a “join” operation performed on the decomposed relations results into no loss of data and hence
the name loss-less. This dependency is also called as non-additive or non-loss join dependency.
In a more specific way, lossless join dependency can be defined as the one which generates no
additional tuples when the natural “join” operation is performed on the decomposed relation
schemas.
Consider the “student” relation.
Student (std_id, name, location)
This relation can be broken-down into two relations as follows:
(a) Location (std_id, location) and
(b) Name (std_id, name)
When we perform a natural join operation on these two schemas, the original ‘student’
relation is sustained.

Location ⋈ NameStudent.
Student
Std_id Location Name
012 Hyderabad Radha
013 Secunderabad Meena
014 Hyderabad Pinky
015 Secunderabad Rani

Location Location Std_id Name


Std_id Name
012 Hyderabad 012 Radha
013 Secunderabad 013 Meena
014 Hyderabad 014 Pinky
015 Secunderabad 015 Rani

Location ⋈ Namestudent

Std_id Location Name


012 Hyderabad Radha
013 Secunderabad Meena

108
014 Hyderabad Pinky
015 Secunderabad Rani

No additional tuples are generated and neither data is duplicated nor data is lost.
Instead, if we decompose the relation student into two schemas as follows:
(a) Location (Std_id, location)
(b) Name (location, name)
Note: The above two attributes are different.
Then the “join” of these schemas results into a relation other than the original relation.
Location Location Location Name
Std_id Name
012 Hyd Hyd Radha
013 Sec Sec Meena
014 Hyd Hyd Pinky
015 Sec Sec Rani
Location ⋈ NameStudent 1

Std_id Location Name


012 Hyd Radha
013 Sec Meena
014 Hyd Pinky
015 Sec Rani
012 Hyd Radha
013 Sec Meena
014 Hyd Pinky
015 Sec Rani
This relation resulted in the loss of data as well as in data redundancy. This type of
decomposition is referred as “Lossy” decomposition.
We can implement the simple test in order to check whether the decomposition is lossless
or not.
For a relation R which is defined for the set of FDs F, the closure of a set of FDs F must include
the
FD: R ∩ R, —>R1 or
FD: R1∩ R —>R2 where R1 and R2 are the two decomposed relations, then the
decompositions are said to be lossless.
For example, consider the “sale” relation which has attributes (PDCVV C) and the FD: Y
—>VC which violates the 3NF criterion. In order to convert the “sale” relation into 3NF we
109
decomposed it into:
(a) Product (PDCV) and
(b) Vendor (VV_C)
‘V’ attribute is common to both the decompositions and also VV_C holds. Therefore,
this is a lossless join dependency.
3. Explain dependency preserving decomposition?
Ans: R is a relational schema, that is decomposed into schemas R1,R2….Rn with attributes
A,B,C..... by applying all the steps of normalization. Let F be the set of functional dependencies
that hold over R and F1, F2, F3... be the set of functional dependencies that hold over
R1,R2,R3… Rn respectively. Fi refers to the attributes of R. where i = 1,2,3…
If R is decomposed into two relations R1 and R2 with as attribute set of S1 and S2. Then
the projection of R on R1 can be defined as a set of functional dependencies is the closure of F +
which contains the attributes of only R1. The notation for representation of projection of F on S1
in FS1. Similarly the projection of F on S2 in FS2.
Now the relation R with a set of FDs F is decomposed into two relations R1 and R2 with
an attribute set of S1 and S2 is said to be dependency preserving if,
(FS1 ∪ FS2)+ = F+
The union of closure set of two projections must be equal to the closure set of
dependencies of the original relation.
In other words we can say that we need to decompose a relation ‘R’ into projections R1,
R2, R3 in such a way that the enforcement of constraint to set of F1, F2, …. F i is altogether equal
to enforcing the constraint on the original set F. Thus, the decomposition in said to be
dependency preserving.
Consider an example where the relation Z is decomposed into relations. The attribute set
of Z consist of PQR and the attribute set of decomposed relations consist of PQ and QR. The set
of FDs that hold over Z includes P—>Q, Q—>R and R—>P.
FPQ - set of attributes of the relation contains
P—>Q and Q—>P
FQR - Contains Q—>R and R—>Q
Now, the original set of functional dependencies consist of
(a) F and

110
(b) P—>Q
(c) Q—>R
(d) R—>P
Now, the union of closure of FPQ and FQR includes; FPQ ∪ FQR
P—>Q, Q—>P, Q—>R, R—>Q
With R —>Q, Q —>P and transitivity
We get: R—>P
Thus, the decomposition is said to preserve the dependency.

4. Explain schema refinement in database design?


Ans: We all know that any relation can be represented with an entity - relation [ER] diagram.
Every relation has a ER diagram with attributes representing the columns of the tables and entity
representing the name of the tables. Consider a simple ER diagram of a college.
Different entities used are
(a) Student (b)College. (c) Lecturer
Attributes of these entities include,
Student —> name, id, class_id, course_id, address
Lecturer—>name, id, class_id, Course_id, age, address.
Student_i
College—>name, address, College_i
Class_i Phone Number,
No.ofdepartments, college-id.
d
d
Relationships among them dare
Name dept
Name
(a) goes to [student goes to college]
Phone.No
(b) Teaches [Lecturer teaches a student]
Student
goes to
(c) Works in [Lecturer works in college] College

The ER diagram for this relation is as follows:


Addre
Course_i
staff ss
d
Course
_id

Lecturer Work
teaches s in

111
Dept.
Name since
Lectures_i
d
Figure: ER- Diagram Showing Different Attributes of Student, College and Lecturer and
also Relationship Among Them.
This ER diagram can be converted into a relation which can he further decomposed into
smaller and efficient relations. But it is not always compulsory that a good- ER diagram should
always be converted into a good set- of relations conversion of ER diagram to a relation may
lead to data redundancy. An ER diagram cannot depict certain complicated conditions that apply
on relations leading to complicated relations.
But, the relations which are produced through an ER diagram must be decomposed for
the following reasons:
(a) Limitations of an Entity Set
ER model doesn’t express redundant storage. Consider the example of relation
“Employee”.
EMPLOYEE (emp_id, EMP_Name, emp_sectionid, job_section, grade)
Where emp_id is a primary key. The following are the FDs that hold over employee.
FD1 : id —> id Name, section, job, grade
FD2: Job —>grade.
FD2 denotes that the grade is calculated on the basis of jobsection.
Due to this FD, there was redundant data in the table which led to the problems of
different anomalies. This relation cannot be converted to an ER diagram, the FDs which uniquely
determines the attributes of the relation can be expressed in terms of ER diagram.

112
We can design an ER model, for this relation by decomposing the “employee” relation
into two schemas with attributes as follows
(a) employee (id, name, section id), (b) job (job_section, grade)
The relationship between these entity set is “is associated with”.

idid Section_id Job_section grade


Name Section_
Nam
e id
is
associated Job
Employee
with

Figure: ER Diagram Representing relation Between Employee and Job Entities


These techniques can be used for designing good databases.
(b) Limitations on Relationship Set
A relationship set is a set consisting of relationships that exists in a relation among the
different attributes of the relation/schema. Constraints defined on the relationships may lead to
data redundancy. Consider an example - a “Bank” relationship with attribute set consisting of,
Name of Bank, Number of Branches, customer name, customer id, account id, customer
address, amount.
Bank (N_bank, N_branch, C_Name, C_id, acc_id, C_address, amount)
Where, ace_id is the primary key.
ace_id  ace_id, N_bank, N_branch, C_Name, C_id, C_address, amount
A customer with id ”C_id” and name as “C_name” have an acc_id in the N_branch of
N_bank. C_address specifies the address of the customer and amount specifies the amount of
money deposited in the bank.
Now, the same customer can have number of accounts in the different branches of the
same bank.
N_bank, C_id —> N_branch, acc_id.
This relationship increases the amount of data redundancy in the table. In order to solve
this problem, we may decompose the relation Bank into two schemas as follows:
(a) R1 with attributes
C_name, Cid, C_address, amount

113
(b) R2 with attributes
C_id, N_branch, acc_id.
(c) Recognizing Attribute Sets for Entities
In order to decide whether to decompose a relation, which is formed by converting an ER
diagram, it is important to distinguish between different attributes of the entities. It may happen
that, in an ER diagram representing same group of attributes may refer to different entities. The
inability to properly recognize the set of attributes leads to data redundancy. For example,
consider the bank relation and customer relations-both of which are associated with the
relationship set “deposits money”.
cid
cid Name
Name acc_id
Name
amoun
t
Customer
Customer Deposit Bank
money

branch Figure:acc_id
ER Diagram for Bank and Customer Entities cid addres
s
Two relations formed from this ER diagram are:
(a) Customer (cid, phone, acc_id, branch)
(b) Bank (cid, Name of bank, acc_id, address, amount)
Note: Underlined -Primary key
Now, it is possible that the same customer can have different accounts in the same branch
of the bank.
branch —> acc_id
The dependency may lead in the redundant storage of the attribute branch. The solution to
this problem is that the relation customer can be decomposed into two schemas.
(a) customer 1 (cid, cname, acc_id)
(b) customer 2 (branch, Name of the bank, cid)
Note: Underlined - Foreign key
The key for customer2 relation is same as the key for the bank relation. In fact, both the
relations shows the same record entry for a particular entity.
cid cname cid acc_id
amoun
t
Customer Deposit Bank
money 114

acc_id branch address


Figure: ER Diagram for Bank and Customer
Converting, this ER to two relations:
(a) Customer (cid, cnan’ie, acc_id)
(b) Bank (cid, acc_id, amount, branch, Name, amount)
Thus, identify the right set of attributes for the entities helps in the reduction of
redundancy.
(d) Recognizing Entity Sets
Finally, the functional dependencies can be used as source for the refinement of an ER-
model.
Consider an example of “online enrollment” relation with attributes stud_id, exam, date,
location.
Online enrollment (id, exam, date, location)
The description of this relation is student with “id” has enrolled for the exam to be
conducted on date ‘d’ and at location ‘L’. Now, for the online enrollment, students have to pay
the fee online by using respective credit cards. Every student have unique credit card number.
The FD for this relation can be shown as,
FD: sid —> C where, ‘C’ denotes credit card number and sid= student id. ‘C’ is added as
another attribute for the relation.
Every time the student enrolls for the online examination he has to store his credit card
number. This again leads to data redundancy. One way to eliminate the redundancy is the
introduction of a new entity set “credit cards” with only attribute as credit card number.
We can relate the relation student and credit card with a relationship set “has a”. The
original relation “online enrollment” has been decomposed to two schemas,
(a) Enroll (id, exam, date, location)
(b) Card (id, c)
The relationship set “ has a” can be used to relate card and credit cards.

115
exam id

Online
enrollment

date
lacation

Figure: ER Representing Online Enrollment Entity

creditcar C
d
Figure: ER or Credit Card Entity

exam address C

Student has a creditcard

id name

Figure: ER Showing the Relationship Between Student and Credit Card

C
id C

card has a creditcard

Figure: ER Showing Card and Credit Card Relation


Instead of doing all these complicated processes, we can directly include the credit card

116
number C’ in the online enrollment relation and then by using FD information, we can refine the
tables.
5. Explain Multi-valued dependencies briefly.
Ans: To understand the concept of multivalve dependencies, consider a relation,
Course Student Textbook
Chemistry Jack Principles of Science
Chemistry Jack ABC of chemistry
Chemistry John Principles of Science
Chemistry John ABC of chemistry
Physics Jack Principles of Science
Physics Jack Optics
Physics Jack Optical physics

Here, each of the tuple means that the course ‘C’ is taken by the student ‘S’ and the
textbook ‘T’ is the one which is recommended. The attributes student and Textbook are
independent of each other. Any number of students can refer any textbook and can take any
course. The composite key for this relation consist of (CST).
Since all the attributes are part of key, this relation is in BCNF and therefore, there.is no
use of decomposing it further. We can also notice that much of the data is being repeated. That is,
the textbook for chemistry is ABC of chemistry is repeated for each student.
This redundancy again give rise to update anomalies. By decomposing this relation into
two schemas with attributes CS and CT, we can deal with redundancy.
It is worth noticing that the redundancy is due to the fact that students and textbooks are
independent of each other. Such type of constraint is an example of multivalue-dependency or
MVD.
CS Student CT Teacher
Course Course
Chemistry Jack Chemistry ABC of chemistry
Chemistry John Chemistry Principles of Science
Physics Jack Physics Principles of Science
Physics Optics
Physics Optical physics

By decomposing the relation we can eliminate the update anomalies. For example, if we
want to enter the data that a new student is taking chemistry course then we need to enter a single

117
tuple in the relation CS.
MVDs are generalization of functional dependencies. They can be represented as,
Course —>——>Student
Course —>——> Text book
This is read as “student is multi-dependent on course” or “Course multi-determines
Textbook”.
The meaning of MVD Course —>——> Student is,
There exists a set of students corresponding to each course C i.e., for a course C and a
textbook, B, the set of students matching the pair (C,B) in CST depends on the value of C only, it
is independent of the value B. A multi-valued dependency can be defined as: If X,Y,Z are the
attribute sub sets of attribute set of relation A then Y is said to be multi- dependent on X if for
every instance of A, a set of Y values matching a given pair (Xvalue, Zvalue) depends only on
the X value and does not depend on Z value.
X—>——>Y.
Every FD is an MVD.
i.e., X—>——> Y This also means X—>——> Y
5 rules are used to compute additional FDS and MUDS. These are
(a) MVD complementation: If A —>—> B then A —>—> R-AB.
(b) MVD Augmentation: If A —>—> B and C ⊆D then AD —>—> BC

(c) MVD Transitivity: If A—>—>B and B—>—>C then A—>—> (C—B).


(d) Replication: If A—> B, then A—>—>B.
(e) Coalescence: If A —>—> B and there is a C such than B∩ C is empty,
C—>D and D ⊆B, then A—>—> D.

6. Explain fourth normal form briefly.


Ans: A relation R is said to be in 4NF if it is BNF and doesn’t have any multivalued
dependencies. In other words, if R is a relational schema with F as a set of dependencies which
includes all the FDs and MVDs of R and A and B or two subsets attributes of R, R is said to be in
4NF if any of the following conditions hold over R,
1. For every MVD A—>—>B, B⊆A or AB=R or

2. A is a superkey.

118
MVDs which are trivial always holds over R. A dependency is said to be trivial i.e., A—>—>B

is trivial if B⊆ A⊆ R or more precisely AB = R.

The example discussed before “CST” relation is not in 4NF because it involves an MBD
which is not a FD. When the relation is decomposed into two schemas with attributes CS and CT,
then these schemas are in 4NF. And also if natural join operation is performed on CS and CT
then the resulting relation will be CST. Hence, this is a lossless decomposition.
In short, we can summarize, in order to achieve 4NF we must first eliminate the relation-
valued attributes (RVA) and then the schemas can be reduced to BCNF.

Gokaraju Rangaraju Institute of Engineering and Technology

Department of Computer Science and Engineering


Year/Semester: II/I Academic Year: 2015-16
Assignment Sheet: IV

1. Consider a relation R=ABCDEFG and FD’s are A--->B, BC--->DE, AEG--->G. Compute
AC+.

2. Consider a relation R=ABCDE and FD’s are A---->BC, CD---->E, B---->D, E--->A.
Compute B+.

3. Consider a relation R=ABCDEF and FD’s are AB--->C, BC--->AD, D--->E, CF--->B.
Compute AB+.

4. Consider a relation R=ABCDEH and FD’s are A--->BC, CD--->E, E--->C, D--->AEH,
AEH--->BD, DH--->BC. Compute the candidate key?

5. Consider a relation R=ABCDE and FD’s are A--->B, BC--->E, ED--->A. Compute the
candidate key?

6. Consider a relation R=ABCDEFGHIJ and FD’s are AB--->C, A--->D, B--->F, F--->GH,
D--->IJ. Find the candidate key and decompose R in 2NF.

7. Consider a relation R=ABCDEF and FD’s are A--->FC, C--->D, B--->E. Find the
candidate key and normalize it in to 2NF.

119
8. Consider a relation R=ABCDE and FD’s are B--->E,C--->D, A--->B. Find the candidate
key and normalize it into 2NF.

9. Consider a relation R=ABCDEFGHIJ and FD’s are AB--->C, BD--->EF, AD--->GF, A---
>I, H--->J. Find the candidate key and normalize it into 2NF.

10. Consider a relation R=ABCDEFGH and FD’s are AB--->CEFGH, A--->D, F--->G, FB---
>H, HBC--->ADEFG, FBC-ADE. Normalize the relation R into BCNF.

11. Consider a relation R=ABCDEFG and FD’s are BCD--->A, BC--->E, A--->F, F---
>G,C--->D,A--->G. Normalize the relation R into BCNF.

12. Consider a relation R=ABCDE and FD’s are AB--->CDE, A--->C, D--->E. Normalize the
relation R into BCNF.

13. Consider a relation R=ABCDEFGHIJ and FD’s are AB--->C, A--->DE, B--->F, F--->GH,
D--->IJ. Find the candidate key and normalize the relation R into 3NF.

14. Consider a relation R=ABCDE and FD’s are B--->E, C--->D, A--->B. Find the candidate
key and normalize it into 3NF.

15. Consider a relation R=ABCDEFGHIJ and FD’s are AB--->C, BD--->EF, AD--->GH, A---
>I, H--->J. Find the candidate key and normalize it into 3NF.

16. Consider a relation scheme R = (A, B, C, D, E, H) on which the following functional


dependencies hold: {A–>B, BC–>D, E–>C, D–>A}. What are the candidate keys of R?

17. Let R= (A, B, C, D, E, F) be a relation scheme with the following dependencies: C->F, E-
>A, EC->D, A->B. Find the candidate key for R?

18. A table has fields F1, F2, F3, F4, and F5, with the following functional dependencies:
F1->F3
F2->F4
(F1,F2)->F5
in terms of normalization, this table is in which normal form?

19. The relation schema Student_Performance (name, courseNo, rollNo, grade) has the
following FDs:

name,courseNo->grade
rollNo,courseNo->grade
name->rollNo
rollNo->name
What is the highest normal form of this relation?

120
20. Consider the following relational schema:Suppliers(sid:integer, sname:string, city:string,
street:string)Parts(pid:integer, pname:string, color:string)Catalog(sid:integer, pid:integer,
cost:real)Assume that, in the suppliers relation above, each supplier and each street
within a city has aunique name, and (sname, city) forms a candidate key. No other
functional dependencies are implied other than those implied by primary and candidate
keys. Find out normal form of the schema.

Gokaraju Rangaraju Institute of Engineering and Technology


Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16
SUB:DBMS
Tutorial Sheet: V-1
TRANSACTION MANAGEMENT

Short answer questions

1. Define transaction. Explain about transaction management with example?


2. Write about properties of transaction? (or) Write about ACID properties?
3. Mention different states of transaction?
4. Which technique is used for recovery management component? Explain it?
5. Write two good reasons for allowing concurrency Execution?
6. Define schedule with example?
7. Define serial schedule and explain with example?
8. Define concurrent schedule and explain with example?
9. Explain conflict serializability?
10. Explain view serializability?
11. What is mean by conflict equivalent?
12. Explain Recoverable Schedule?
13. Explain non-Recoverable Schedule?
14. Explain cascade schedule?
15. Explain cascade-less schedule?

Descriptive questions/Programs/Experiments

121
1. Define transaction management. Explain properties of transaction with example?
2. Write short notes on transaction? Write different states of transaction?
3. What do you mean by concurrency Execution transaction? Explain in detail?
4. Explain types of serializability in detail?
5. Explain shadow copy technique for atomicity and durability?

Tutor Faculty HOD

Gokaraju Rangaraju Institute of Engineering and Technology

Department of Computer Science and Engineering


Year/Semester: II / I Academic year: 2015-16

SUB: DBMS Tutorial Sheet: V-1

SHORT ANSWER QUESTIONS


1.Define transaction. Explain about transaction management with example?
Ans: A transaction is a unit of program execution that accesses and possibly updates various
data items. The transaction consists of all operations executed between the begin transaction
and end transaction.
Transaction management:
Transaction management a collection of several operations on the database appears to be a
single unit. For example, a transfer of funds from a checking account to a savings account is a
single operation from the customer’s standpoint within the database system, however, it consists
of several operations like transferring money from account A to account B. Initially account A
having 1000/-, account B having 500
1. Login to account A
2. Register B account details
3. Enter how much money we are transferring (500)
122
4. Submit
5. Successfully transferred
2. Write about properties of transaction? (Or) Write about ACID properties?
Ans: Database system maintains the following properties of the transactions:
 Atomicity. Either all operations of the transaction are reflected properly in the database,
or none are.
 Consistency. Execution of a transaction in isolation (that is, with no other transaction
executing concurrently) preserves the consistency of the database.
 Isolation. Even though multiple transactions may execute concurrently, the system
guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj
finished execution before Ti started, or Tj started execution after Ti finished. Thus, each
transaction is unaware of other transactions executing concurrently in the system.
 Durability. After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures.
These properties are often called the ACID properties; the acronym is derived from
the first letter of each of the four properties.
3. Mention different states of transaction?
Ans: Transaction States:
 Active:, the initial state; the transaction stays in this state while it is executing
 Partially committed: after the final statement has been executed
 Failed: after the discovery that normal execution can no longer proceed
 Aborted: after the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction
 Committed: after successful completion
4. Which technique is used for recovery management component? Explain it?
Ans: The recovery-management component of a database system can support atomicity and
durability by a variety of schemes. This scheme, which is based on making copies of the
database, called shadow copies, assumes that only one transaction is active at a time. The
scheme also assumes that the database is simply a file on disk. A pointer called db-pointer is
maintained on disk; it points to the current copy of the database.
In the shadow-copy scheme, a transaction that wants to update the database first
creates a complete copy of the database. All updates are done on the new database copy, leaving
the original copy, the shadow copy, untouched. If at any point the transaction has to be aborted,

123
the system merely deletes the new copy. The old copy of the database has not been affected.
5. Write two good reasons for allowing concurrency Execution?
Ans: Transaction-processing systems usually allow multiple transactions to run concurrently.
There are two good reasons for allowing concurrency:
 Improved throughput and resource utilization: Throughput - the number of
transactions executed in a given amount of time. Correspondingly, the processor and disk
utilization also increase
 Reduced waiting time: Concurrent execution reduces the unpredictable delays in
running transactions. Moreover, it also reduces the average response time: the average
time for a transaction to be completed after it has been submitted.
6. Define schedule with example?
Ans: The execution sequences just described are called schedules. They represent the
Chronological order in which instructions are executed in the system. A schedule for a set of
transactions must consist of all instructions of those transactions, and must preserve the order in which
the instructions appear in each individual transaction.
T1 T2

read(A)
A := A – 50
write (A)
read(B)
B := B + 50
write(B)
read(A)
temp := A * 0.1
A := A – temp
write(A)
read(B)
B := B + temp
write(B)

Schedule 1—a serial schedule in which T1 is followed by T2.


For example, in transaction T1, the instruction write (A) must appear before the instruction
124
read (B), in any valid schedule. In the following discussion, we shall refer to the first execution
sequence (T1 followed by T2) as schedule 1

7. Define serial schedule and explain with example?


Ans:
T1 T2
read(A)
A := A – 50
write (A)
read(B)
B := B + 50
write(B)
read(A)
temp := A * 0.1
A := A – temp
write(A)
read(B)
B := B + temp
write(B)

Schedule 1—A serial schedule in which T1 is followed by T2.


T1 T2
read(A)
temp := A * 0.1
A := A – temp
write(A)
read(B)
B := B + temp

125
write(B)
read(A)
A := A – 50
write (A)
read(B)
B := B + 50
write(B)

Schedule 2-A serial schedule in which T2 is followed by T1.

We can say that above two schedules are serial schedules. After completion of T1
transaction, T2 transaction will be started. After completion of T2 transaction, T1 transaction will
be started.
8. Define concurrent schedule and explain with example?
Ans: Transaction-processing systems usually allow multiple transactions to run concurrently.
When the database system executes several transactions concurrently, the corresponding
schedule no longer needs to be serial. If two transactions are running concurrently, the operating
system may execute one transaction for a little while, then perform a context switch, execute the
second transaction for some time, and then switch back to the first transaction for some time, and
so on. With multiple transactions, the CPU time is shared among all the transactions. One
possible schedule appears in Figure. After this execution takes place, we arrive at the same state
as the one in which the transactions are executed serially in the order T1 followed by T2. The
sum A + B is indeed preserved.

T1 T2
read(A)
A := A – 50
write (A)
read(A)
temp := A * 0.1
A := A – temp
write(A)

126
read(B)
B := B + 50
write(B) read(B)
B := B + temp
write(B)

Schedule 3-A concurrent schedule equivalent to schedule 1.


9. Explain conflict serializability?
Ans: If a schedule S can be transformed into a schedule S` by a series of swaps of non
conflicting instructions, we say that S and S` are conflict equivalent. The concept of conflict
equivalence leads to the concept of conflict serializability. We say that a schedule S is conflict
serializable if it is conflict equivalent to a serial schedule.
10. Explain view serializability?
Ans: The concept of view equivalence leads to the concept of view serializability. We say that a
schedule S is view serializable if it is view equivalent to a serial schedule. As an illustration,
suppose that we augment schedule 7 with transaction T6, and obtain schedule 9 in Figure .
Schedule 9 is view serializable. Indeed, it is view equivalent to the serial schedule <T3, T4, T6>,
since the one read (Q) instruction reads the initial value of Q in both schedules, and T6 performs
the final write of Q in both schedules.
T3 T4
Read(Q)
Write(Q)
Write(Q)

Schedule 7
T3 T4 T5
Read(Q)
Write(Q)
Write(Q)
Write(Q)
Schedule 9
11. What is mean by conflict equivalent?

127
Ans: If a schedule S can be transformed into a schedule S` by a series of swaps of non conflicting
instructions, we say that S and S` are conflict equivalent
12. Explain Recoverable Schedule?
Ans: A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads
a data item previously written by Ti, the commit operation of Ti appears before the commit
operation of Tj .
13. Explain Non-Recoverable Schedule?
Ans:
T8 T9
Read(A)
Write(A)
Read(A)
Read(B)

Schedule11
T9 is a transaction that performs only one instruction: read (A). Suppose that the system allows
T9 to commit immediately after executing the read (A) instruction. Thus, T9 commits before T8
does. Now suppose that T8 fails before it commits. Since T9 has read the value of data item A
written by T8, we must abort T9 to ensure transaction atomicity. However, T9 has already
committed and cannot be aborted. Thus, we have a situation where it is impossible to recover
correctly from the failure of T8.Schedule 11, with the commit happening immediately after the
read (A) instruction, is an example of a non recoverable schedule.
14. Explain cascade schedule?
Ans: T10 T11 T12
read(A)
read(B)
write(A) Read(A)
Write(A)
Read(A)

128
Transaction T10 writes a value of A that is read by transaction T11.Transaction T11 writes
a value of A that is read by transaction T12. Suppose that, at this point, T10 fails. T10 must be
rolled back. Since T11 is dependent on T10, T11 must be rolled back. Since T12 is dependent on
T11, T12 must be rolled back. This phenomenon, in which a single transaction failure leads to a
series of transaction rollbacks, is called cascading rollback.
15. Explain cascade-less schedule?
Ans:
T10 T11 T12
read(A)
read(B)
write(A)
Read(A)
Write(A)
Read(A)
Schedule 12.

Cascadeless schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a
data item previously written by Ti, the commit operation of Ti appears before the read operation
of Tj . It is easy to verify that every cascade less schedule is also recoverable.

LONG ANSWERS:
1. Define transaction management. Explain properties of transaction with example?
Ans: A transaction is a unit of program execution that accesses and possibly updates various
data items. The transaction consists of all operations executed between the begin transaction
and end transaction.
Transaction management:
Transaction management a collection of several operations on the database appears to be
a single unit. For example, a transfer of funds from a checking account to a savings account is a
single operation from the customer’s standpoint within the database system; however, it consists
of several operations like transferring money from account A to account B. Initially account A
having 1000/-, account B having 500
1. Login to account A

129
2. Register B account details
3. Enter how much money we are transferring (500)
4. Submit
5. Successfully transferred
ACID PROPERTIES:
Database system maintains the following properties of the transactions:
 Atomicity. Either all operations of the transaction are reflected properly in the database,
or none are.
 Consistency. Execution of a transaction in isolation (that is, with no other transaction
executing concurrently) preserves the consistency of the database.
 Isolation. Even though multiple transactions may execute concurrently, the system
guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj
finished execution before Ti started, or Tj started execution after Ti finished. Thus, each
transaction is unaware of other transactions executing concurrently in the system.
 Durability. After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures.
These properties are often called the ACID properties; the acronym is derived from
the first letter of each of the four properties.
To gain a better understanding of ACID properties and the need for them, consider a simplified
banking system consisting of several accounts and a set of transactions that access and update
those accounts.
Transactions access data using two operations:
 Read(X), which transfers the data item X from the database to a local buffer belonging to
the transaction that executed the read operation.
 Write(X), which transfers the data item X from the local buffer of the transaction that
executed the write back to the database.
In a real database system, the write operation does not necessarily result in the immediate update
of the data on the disk; the write operation may be temporarily stored in memory and executed
on the disk later. For now, however, we shall assume that the write operation updates the
database immediately.
Let Ti be a transaction that transfers $50 from account A to account B.
This transaction can be defined as

130
Ti: Read (A);
A: = A − 50;
Write (A);
Read (B);
B: = B + 50;
Write (B).
Let us now consider each of the ACID requirements.
 Consistency: The consistency requirement here is that the sum of A and B be unchanged
by the execution of the transaction. Without the consistency requirement, money could be
created or destroyed by the transaction. It can be verified easily that, if the database is
consistent before an execution of the transaction, the database remains consistent after the
execution of the transaction.
 Atomicity: Suppose that, just before the execution of transaction Ti the values of
accounts A and B are $1000 and $2000, respectively. Now suppose that, during the
execution of transaction Ti, a failure occurs that prevents Ti from completing its
execution successfully. Examples of such failures include power failures, hardware
failures, and software errors. Further, suppose that the failure happened after the write(A)
operation but before the write(B) operation. In this case, the values of accounts A and B
reflected in the database are $950 and $2000. The system destroyed $50 as a result of
this failure. In particular, we note that the sum A + B is no longer preserved.
Thus, because of the failure, the state of the system no longer reflects a real state of the world
that the database is supposed to capture. We term such a state an inconsistent state. We must
ensure that such inconsistencies are not visible in a database system. Note, however, that the
system must at some point be in an inconsistent state. Even if transaction Ti is executed to
completion, there exists a point at which the value of account A is $950 and the value of account
B is $2000, which is clearly an inconsistent state. This state, however, is eventually replaced by
the consistent state where the value of account A is $950, and the value of account B is $2050.
Thus, if the transaction never started or was guaranteed to complete, such an inconsistent
state would not be visible except during the execution of the transaction. That is the reason for
the atomicity requirement: If the atomicity property is present, all actions of the transaction are
reflected in the database, or none are. The basic idea behind ensuring atomicity is this: The

131
database system keeps track (on disk) of the old values of any data on which a transaction
performs a write, and, if the transaction does not complete its execution, the database system
restores the old values to make it appear as though the transaction never executed. Ensuring
atomicity is the responsibility of the database system itself; specifically, it is handled by a
component called the transaction-management component.
 Durability: Once the execution of the transaction completes successfully, and the user
who initiated the transaction has been notified that the transfer of funds has taken place,
it must be the case that no system failure will result in a loss of data corresponding to this
transfer of funds. The durability property guarantees that, once a transaction completes
successfully, all the updates that it carried out on the database persist, even if there is a
system failure after the transaction completes execution.
We assume for now that a failure of the computer system may result in loss of data in main
memory, but data written to disk are never lost.
We can guarantee durability by ensuring that either
1. The updates carried out by the transaction have been written to disk before the transaction
completes.
2. Information about the updates carried out by the transaction and written to disk is sufficient to
enable the database to reconstruct the updates when the database system is restarted after the
failure.
Ensuring durability is the responsibility of a component of the database system called the
recovery-management component.
 Isolation: Even if the consistency and atomicity properties are ensured for each
transaction, if several transactions are executed concurrently, their operations may
interleave in some undesirable way, resulting in an inconsistent state.
For example, as we saw earlier, the database is temporarily inconsistent while the
transaction to transfer funds from A to B is executing, with the deducted total written to A and the
increased total yet to be written to B. If a second concurrently running transaction reads A and B
at this intermediate point and computes A+B, it will observe an inconsistent value. Furthermore,
if this second transaction then performs updates on A and B based on the inconsistent values that
it read, the database may be left in an inconsistent state even after both transactions have
completed.

132
A way to avoid the problem of concurrently executing transactions is to execute
transactions serially—that is, one after the other. Other solutions have therefore been developed;
they allow multiple transactions to execute concurrently. The isolation property of a transaction
ensures that the concurrent execution of transactions results in a system state that is equivalent to
a state that could have been obtained had these transactions executed one at a time in some order.
Ensuring the isolation property is the responsibility of a component of the database system called
the concurrency-control component.
2. Write short notes on transaction? Write different states of transaction?
Ans: A transaction may not always complete its execution successfully. Such a transaction is
termed aborted. If we are to ensure the atomicity property, an aborted transaction must have no
effect on the state of the database. The aborted transaction made to the database must be undone.
Once the changes caused by an aborted transaction have been undone, we say that the transaction
has been rolled back. It is part of the responsibility of the recovery scheme to manage
transaction aborts. A transaction that completes its execution successfully is said to be
committed. A committed transaction that has performed updates transforms the database into a
new consistent state, which must persist even if there is a system failure.
Once a transaction has committed, we cannot undo its effects by aborting it. The only
way to undo the effects of a committed transaction is to execute a compensating transaction.
For instance, if a transaction added $20 to an account, the compensating transaction would
subtract $20 from the account. However, it is not always possible to create such a compensating
transaction. Therefore, the responsibility of writing and executing a compensating transaction is
left to the user, and is not handled by the database system. We need to be more precise about
what we mean by successful completion of a transaction. We therefore establish a simple abstract
transaction model.
A transaction must be in one of the following states:
• Active, the initial state; the transaction stays in this state while it is executing
• Partially committed, after the final statement has been executed
• Failed, after the discovery that normal execution can no longer proceed
• Aborted, after the transaction has been rolled back and the database has been restored to its
state prior to the start of the transaction
• Committed, after successful completion

133
The state diagram corresponding to a transaction appears in below diagram. We say that a
transaction has committed only if it has entered the committed state. Similarly, we say that a
transaction has aborted only if it has entered the aborted state. A transaction is said to have
terminated if it has either committed or aborted. A transaction starts in the active state. When it
finishes its final statement, it enters the partially committed state. At this point, the transaction
has completed its execution, but it is still possible that it may have to be aborted, since the actual
output may still be temporarily residing in main memory, and thus a hardware failure may
preclude its successful completion. The database system then writes out enough information to
disk that, even in the event of a failure, the updates performed by the transaction can be re-
created when the system restarts after the failure. When the last of this information is written out,
the transaction enters the committed state. Transaction enters the failed state after the system
determines that the transaction can no longer proceed with its normal execution (for example,
because of hardware or logical errors). Such a transaction must be rolled back. Then, it enters the
aborted state. At this point, the system has two options:
State diagram of a transaction

Partially
commit Committed

Active

Failed Aborted

134
It can restart the transaction, but only if the transaction was aborted as a result of some hardware
or software error that was not created through the internal logic of the transaction. A restarted
transaction is considered to be a new transaction.
 It can kill the transaction. It usually does so because of some internal logical error that
can be corrected only by rewriting the application program, or because the input was
bad, or because the desired data were not found in the database.
3. What do you mean by concurrency execution transaction? Explain in detail?
Ans: Transaction-processing systems usually allow multiple transactions to run concurrently.
Allowing multiple transactions to update data concurrently causes several complications with
consistency of the data.
• Improved throughput and resource utilization. A transaction consists of many steps. Some
involve I/O activity; others involve CPU activity. The CPU and the disks in a computer system
can operate in parallel. Therefore, I/O activity can be done in parallel with processing at the
CPU. The parallelism of the CPU and the I/O system can therefore be exploited to run multiple
transactions in parallel. While a read or write on behalf of one transaction is in progress on one
disk, another transaction can be running in the CPU, while another disk may be executing a read
or write on behalf of a third transaction. All of this increases the throughput of the system—that
is, the number of transactions executed in a given amount of time. Correspondingly, the
processor and disk utilization also increase; in other words, the processor and disk spend less
time idle, or not performing any useful work.
• Reduced waiting time. There may be a mix of transactions running on a system, some short
and some long. If transactions run serially, a short transaction may have to wait for a preceding
long transaction to complete, which can lead to unpredictable delays in running a transaction. If
the transactions are operating on different parts of the database, it is better to let them run
concurrently, sharing the CPU cycles and disk accesses among them. Concurrent execution
reduces the unpredictable delays in running transactions. Moreover, it also reduces the average
response time: the average time for a transaction to be completed after it has been submitted.
The motivation for using concurrent execution in a database is essentially the same as the
motivation for using multiprogramming in an operating system. When several transactions run
concurrently, database consistency can be destroyed despite the correctness of each individual
transaction. The database system must control the interaction among the concurrent transactions

135
to prevent them from destroying the consistency of the database. It does so through a variety of
mechanisms called concurrency-control schemes. Let T1 and T2 be two transactions that
transfer funds from one account to another. Transaction T1 transfers $50 from account A to
account B. It is defined as
T1: read (A);
A: = A − 50;
Write (A);
Read (B);
B: = B + 50;
Write (B).
Transaction T2 transfers 10 percent of the balance from account A to account B. It is defined as
T2: read (A);
Temp: = A * 0.1;
A: = A − temp;
Write (A);
Read (B);
B: = B + temp;
Write (B).
Suppose the current values of accounts A and B are $1000 and $2000, respectively.
When the database system executes several transactions concurrently, the corresponding
schedule no longer needs to be serial. If two transactions are running concurrently, the operating
system may execute one transaction for a little while, then perform a context switch, execute the
second transaction for some time, and then switch back to the first transaction for some time, and
so on. With multiple transactions, the CPU time is shared among all the transactions several
execution sequences are possible, since the various instructions from both transactions may now
be interleaved. In general, it is not possible to predict exactly how many instructions of a
transaction will be executed before the CPU switches to another transaction. Thus, the number of
possible schedules for a set of n transactions is much larger than n! .

136
Schedule 3- A concurrent schedule equivalent to schedule 1.
T1 T2
read(A)
A := A – 50
write (A) read(A)
temp := A * 0.1
A := A – temp
read(B) write(A)
B := B + 50
write(B)
read(B)
B := B + temp
write(B)

Suppose that the two transactions are executed concurrently. One possible schedule
appears in schedule 3. After this execution takes place, we arrive at the same state as the one in
which the transactions are executed serially in the order T1 followed by T2. The sum A + B is
indeed preserved. Not all concurrent executions result in a correct state. To illustrate, consider the
schedule. After the execution of this schedule, we arrive at a state where the final values of
accounts A and B are $950 and $2100, respectively. This final state is an inconsistent state, since
we have gained $50 in the process of the concurrent execution. Indeed, the sum A + B is not
preserved by the execution of the two transactions.
Schedule 4—a concurrent schedule
T1 T2

read(A)
A := A – 50 read(A)
temp := A * 0.1
A := A – temp
write(A)

137
read(B)
write (A)
read(B)
B := B + 50
write(B) B := B + temp
write(B)

If control of concurrent execution is left entirely to the operating system, many possible
schedules, including ones that leave the database in an inconsistent state, such as the one just
described, are possible. It is the job of the database system to ensure that any schedule that gets
executed will leave the database in a consistent state. The concurrency-control component of the
database system carries out this task.
4. Explain types of serializability in detail?
Ans: Let T1 and T2 be two transactions that transfer funds from one account to another.
Transaction T1 transfers $50 from account A to account B. It is defined as
T1: read (A);
A: = A − 50;
Write (A);
Read (B);
B: = B + 50;
Write (B).
Transaction T2 transfers 10 percent of the balance from account A to account B. It is defined as
T2: read (A);
Temp: = A * 0.1;
A: = A − temp;
Write (A);
Read (B);
B: = B + temp;
Write(B).
Suppose the current values of accounts A and B are $1000 and $2000, respectively.
Suppose also that the two transactions are executed one at a time in the order T1 followed by T2.

138
This execution sequence appears in Figure. In the figure, the sequence of instruction steps is in
chronological order from top to bottom, with instructions of T1 appearing in the left column and
instructions of T2 are appearing in the right column. The final values of accounts A and B, after the
execution takes place, are $855 and $2145, respectively. Thus, the total amount of money in
accounts A and B—that is, the sum A + B—is preserved after the execution of both transactions.
Schedule 1
T1 T2

read(A)
A := A – 50
write (A)
read(B)
B := B + 50
write(B)
read(A)
temp := A * 0.1
A := A – temp
write(A)
read(B)
B := B + temp
write(B)

Similarly, if the transactions are executed one at a time in the order T2 followed by T1,
then the corresponding execution sequence is that of Figure. Again, as expected, the sum A + B is
preserved, and the final values of accounts A and B are $850 and $2150, respectively.

Schedule 2

139
T1 T2
read(A)
temp := A * 0.1
A := A – temp
write(A)
read(B)
B := B + temp
write(B)
read(A)
A := A – 50
write (A)
read(B)
B := B + 50
write(B)

The execution sequences just described are called schedules. They represent the
chronological order in which instructions are executed in the system. Clearly, a schedule for a set
of transactions must consist of all instructions of those transactions, and must preserve the order
in which the instructions appear in each individual transaction. For example, in transaction T1,
the instruction write (A) must appear before the instruction read (B), in any valid schedule. In the
following discussion, we shall refer to the first execution sequence (T1 followed by T2) as
schedule 1, and to the second execution sequence (T2 followed by T1) as schedule 2.
These schedules are serial: Each serial schedule consists of a sequence of instructions
from various transactions, where the instructions belonging to one single transaction appear
together in that schedule. We discuss different forms of schedule equivalence; they lead to the
notions of conflict serializability and view serializability.
Since transactions are programs, it is computationally difficult to determine exactly what
Operations a transaction performs and how operations of various transactions interact. For this
reason, we shall not interpret the type of operations that a transaction can perform on a data item.
instead, we consider only two operations: read and write. We thus assume that, between a read (Q)
instruction and a write (Q) instruction on a data item Q, a transaction may perform an arbitrary

140
sequence of operations on the copy of Q that is residing in the local buffer of the transaction. Thus,
the only significant operations of a transaction, from a scheduling point of view, are its read and
write instructions. We shall therefore usually show only read and write instructions in schedules,
as we do in schedule 3
Schedule 3 - Showing only the read and write instructions.
T1 T2
Read(A)
Write(A)
Read(A)
Write(A)
Read(B)
Write(B)
Read(B)
Write(B)

we discuss different forms of schedule equivalence; they lead to the notions of conflict
serializability and view serializability.
Conflict Serializability
Let us consider a schedule S in which there are two consecutive instructions Ii and Ij, of
transactions Ti and Tj , respectively (i _= j). If Ii and Ij refer to different data items, then we can
swap Ii and Ij without affecting the results of any instruction in the schedule. However, if Ii and
Ij refer to the same data item Q, then the order of the two steps may matter. Since we are dealing
with only read and write instructions, there are four cases that we need to consider are as follows
1. Ii = read (Q), Ij = read (Q). The order of Ii and Ij does not matter, since the same value of Q is
read by Ti and Tj , regardless of the order.
2. Ii = R ead (Q), Ij = write(Q). If Ii comes before Ij, then Ti does not read the value of Q that is
written by Tj in instruction Ij. If Ij comes before Ii, then Ti reads the value of Q that is written by
Tj. Thus, the order of Ii and Ij matters.
3. Ii =Write (Q), Ij = Read (Q). The order of Ii and Ij matters for reasons similar to those of the
previous case.
4. Ii = Write (Q), Ij = Write (Q). Since both instructions are write operations, the order of these
instructions does not affect either Ti or Tj . However, the value obtained by the next read (Q)

141
instruction of S is affected, since the result of only the latter of the two write instructions is
preserved in the database. If there is no other write (Q) instruction after Ii and Ij in S, then the
order of Ii and Ij directly affects the final value of Q in the database state that results from
schedule S.
Thus, only in the case where both Ii and Ij are read instructions does the relative order of
their execution not matter. We say that Ii and Ij conflict if they are operations by different
transactions on the same data item, and at least one of these instructions is a write operation.
To illustrate the concept of conflicting instructions, we consider schedule 3. The write(A)
instruction of T1 conflicts with the read(A) instruction of T2.However, the write(A) instruction of
T2 does not conflict with the read(B) instruction of T1, because the two instructions access
different data items.
Let Ii and Ij be consecutive instructions of a schedule S. If Ii and Ij are instructions of
different transactions and Ii and Ij do not conflict, then we can swap the order of Ii and Ij to
produce a new schedule S_. We expect S to be equivalent to S_, since all instructions appear in
the same order in both schedules except for Ii and Ij, whose order does not matter.
Since the write(A) instruction of T2 in schedule 3 does not conflict with the read(B)
instruction of T1, we can swap these instructions to generate an equivalent schedule, schedule 5.
Regardless of the initial system state, schedules 3 and 5 both produce the same final system state.
We continue to swap non conflicting instructions:
• Swap the read(B) instruction of T1 with the read(A) instruction of T2.
• Swap the write(B) instruction of T1 with the write(A) instruction of T2.
• Swap the write(B) instruction of T1 with the read(A) instruction of T2.
The final result of these swaps, schedule 6 of Figure, is a serial schedule. Thus, we have shown
that schedule 3 is equivalent to a serial schedule. This equivalence implies that, regardless of the
initial system state, schedule 3 will produce the same final state as will some serial schedule. If a
schedule S can be transformed into a schedule S_ by a series of swaps of non conflicting
instructions, we say that S and S_ are conflict equivalent. In our previous examples, schedule 1
is not conflict equivalent to schedule 2.However, schedule 1 is conflict equivalent to schedule 3,
because the read(B) and write(B) instruction of T1 can be swapped with the read(A) and write(A)
instruction of T2. The concept of conflict equivalence leads to the concept of conflict
serializability.

142
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
schedule. Thus, schedule 3 is conflict serializable, since it is conflict equivalent to the serial
schedule 1
Schedule 6 —a serial schedule that is equivalent to schedule 3.
T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
.
View Serializability
In this section, we consider a form of equivalence that is less stringent than conflict
equivalence, but that, like conflict equivalence, is based on only the read and write operations of
transactions.
Schedule 8
T1 T2
read(A)
A := A – 50
write(A) read(B)
B := B 10
write(B)
read(B)
B := B + 50
write(B) read(A)
A := A + 10
write(A)

Consider two schedules S and S_, where the same set of transactions participates in both

143
schedules. The schedules S and S_ are said to be view equivalent if three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S_, also read the initial value of Q.
2. For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that value was
produced by a write(Q) operation executed by transaction Tj , then the read(Q) operation of
transaction Ti must, in schedule S_, also read the value of Q that was produced by the same
write(Q) operation of transaction Tj .
3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in
schedule S must perform the final write(Q) operation in schedule S_.
Conditions 1 and 2 ensure that each transaction reads the same values in both schedules
and, therefore, performs the same computation. Condition 3, coupled with conditions 1 and 2,
ensures that both schedules result in the same final system state. In our previous examples,
schedule 1 is not view equivalent to schedule 2, since, in schedule 1, the value of account A read
by transaction T2 was produced by T1, whereas this case does not hold in schedule 2. However,
schedule 1 is view equivalent to schedule 3, because the values of account A and B read by
transaction T2 were produced by T1 in both schedules.
The concept of view equivalence leads to the concept of view serializability. We say that a
schedule S is view serializable if it is view equivalent to a serial schedule
5. Explain shadow copy technique for atomicity and durability?
Ans: The recovery-management component of a database system can support atomicity and
durability by a variety of schemes. We first consider a simple, but extremely inefficient, scheme
called the shadow copy scheme. This scheme, which is based on making copies of the database,
called shadow copies, assumes that only one transaction is active at a time. The scheme also
assumes that the database is simply a file on disk. A pointer called db-pointer is maintained on
disk; it points to the current copy of the database.
In the shadow-copy scheme, a transaction that wants to update the database first creates a
complete copy of the database. All updates are done on the new database copy, leaving the
original copy, the shadow copy, untouched. If at any point the transaction has to be aborted, the
system merely deletes the new copy. The old copy of the database has not been affected.
If the transaction completes, it is committed as follows. First, the operating system is
asked to make sure that all pages of the new copy of the database have been written out to disk.

144
(Unix systems use the flush command for this purpose.) After the operating system has written
all the pages to disk, the database system updates the pointer db-pointer to point to the new copy
of the database; the new copy then becomes the current copy of the database. The old copy of the
database is then deleted. Figure depicts the scheme, showing the database state before and after
the update.
Db-pointer
Db-pointer

old copy of new copy of


old copy of database database
database (to be deleted)

(a) Before update (b) After update


The transaction is said to have been committed at the point where the updated db pointer
is written to disk. We now consider how the technique handles transaction and system failures.
First, consider transaction failure. If the transaction fails at any time before db-pointer is updated,
the old contents of the database are not affected. We can abort the transaction by just deleting the
new copy of the database. Once the transaction has been committed, all the updates that it
performed are in the database pointed to by db pointer. Thus, either all updates of the transaction
are reflected, or none of the effects are reflected, regardless of transaction failure.
Now consider the issue of system failure. Suppose that the system fails at any time before
the updated db-pointer is written to disk. Then, when the system restarts, it will read db-pointer
and will thus see the original contents of the database, and none of the effects of the transaction
will be visible on the database. Next, suppose that the system fails after db-pointer has been
updated on disk. Before the pointer is updated, all updated pages of the new copy of the database
were written to disk. Again, we assume that, once a file is written to disk, its contents will not be
damaged even if there is a system failure. Therefore, when the system restarts, it will read db-
pointer and will thus see the contents of the database after all the updates performed by the
transaction.
The implementation actually depends on the write to db-pointer being atomic; that is,

145
either all its bytes are written or none of its bytes are written. If some of the bytes of the pointer
were updated by the write, but others were not, the pointer is meaningless, and neither old nor
new versions of the database may be found when the system restarts. Luckily, disk systems
provide atomic updates to entire blocks, or at least to a disk sector. In other words, the disk
system guarantees that it will update db-pointer atomically, as long as we make sure that db-
pointer lies entirely in a single sector, which we can ensure by storing db-pointer at the beginning
of a block. Thus, the atomicity and durability properties of transactions are ensured by the
shadow-copy implementation of the recovery-management component.
As a simple example of a transaction outside the database domain, consider a text editing
session. An entire editing session can be modeled as a transaction. The actions executed by the
transaction are reading and updating the file. Saving the file at the end of editing corresponds to a
commit of the editing transaction; quitting the editing session without saving the file corresponds
to an abort of the editing transaction.
Many text editors use essentially the implementation just described, to ensure that an
editing session is transactional. A new file is used to store the updated file. At the end of the
editing session, if the updated file is to be saved, the text editor uses a file rename command to
rename the new file to have the actual file name. The rename, assumed to be implemented as an
atomic operation by the underlying file system, deletes the old file as well.

Gokaraju Rangaraju Institute of Engineering and Technology


Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16
SUB:DBMS
Tutorial Sheet: UNIT V-2

146
Short answer questions

1. Write short notes on Locks?


2. What is mean by compatibility function?
3. What is mean by deadlocks?
4. Write two phases in 2-phase locking protocol?
5. What is meant by strict 2- phase locking protocol?
6. What is meant by rigorous 2- phase locking protocol?
7. Explain the importance of lock manager?
8. Define timestamp? Write two methods for implementing timestamp?
9. Write difference between timestamp based and Thomas write rule?
10. Define intention lock mode?
11. Define log and log records? write different fields in log record?
12. Write short notes on types of log records?
13. Write short notes on checkpoints?
14. Explain briefly transaction rollback?
15. Explain about buffer management?

Descriptive questions/Programs/Experiments

7. Explain lock based concurrency control in detail?


8. Explain Timestamp based protocol in detail? Explain Thomas write rule?
9. Explain about validation based protocol?
10. Explain about multiple granularity?
11. Explain types of log based recovery?

Tutor Faculty HOD

Gokaraju Rangaraju Institute of Engineering and Technology


Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16
SUB:DBMS
Tutorial Sheet: V-2

SHORT ANSWERS:

147
1. Write short notes on Locks?
Ans: Transaction is accessing a data item, no other transaction can modify that data item. The
most common method used to implement this requirement is to allow a transaction to access a
data item only if it is currently holding a lock on that item.
Locks
There are various modes in which a data item may be locked. In this section, we restrict our
attention to two modes:
1. Shared. If a transaction Ti has obtained a shared-mode lock (denoted by S) on item Q, then
Ti can read, but cannot write, Q.
2. Exclusive. If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on item Q,
then Ti can both read and write Q.
2.What is meant by compatibility function?
Ans: Every transaction request a lock in an appropriate mode on data item Q, depending on the
types of operations that it will perform on Q. The transaction makes the request to the
concurrency-control manager. The transaction can proceed with the operation only after the
concurrency-control manager grants the lock to the transaction.
Given a set of lock modes, we can define a compatibility function on them as follows.
Let A and B represent arbitrary lock modes. Suppose that a transaction Ti requests a lock of
mode A on item Q on which transaction Tj (Ti _= Tj ) currently holds a lock of mode B. If
transaction Ti can be granted a lock on Q immediately, in spite of the presence of the mode B
lock, then we say mode A is compatible with mode B. Such a function can be represented
conveniently by a matrix.
S X

S True false

X False false

Lock-compatibility matrix comp.


3.What is meant by deadlocks?
Ans: Consider the partial schedule of for T3 and T4 in below schedule. Since T3 is holding an
exclusive-mode lock on B and T4 is requesting a shared-mode lock on B, T4 is waiting for T3 to
unlock B. Similarly, since T4 is holding a shared-mode lock on A and T3 is requesting an

148
exclusive-mode lock on A, T3 is waiting for T4 to unlock A. Thus, we have arrived at a state
where neither of these transactions can ever proceed with its normal execution. This situation is
called deadlock. When deadlock occurs, the system must roll back one of the two transactions. Once a
transaction has been rolled back, the data items that were locked by that transaction are unlocked.
These data items are then available to the other transaction, which can continue with its execution.
T3 T4
lock-X(B)
read(B)
B := B 50
write(B)

lock-S(A)
read(A)
lock-S(B)
lock-X(A)
Schedule

4.Write two phases in 2-phase locking protocol?


Ans: One protocol that ensures serializability is the two-phase locking protocol. This protocol
requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as
needed. Once the transaction releases a lock, it enters the shrinking phase, and it can issue no
more lock requests.
5. What is meant by strict 2- phase locking protocol?
Ans: Cascading rollbacks can be avoided by a modification of two-phase locking called the
strict two-phase locking protocol. This protocol requires not only that locking be two phase,
but also that all exclusive-mode locks taken by a transaction be held until that transaction

149
commits. This requirement ensures that any data written by an uncommitted transaction are
locked in exclusive mode until the transaction commits, preventing any other transaction from
reading the data.
6.What is meant by rigorous 2- phase locking protocol?
Ans: Another variant of two-phase locking is the rigorous two-phase locking protocol, which
requires that all locks be held until the transaction commits.
Rigorous two-phase locking, transactions can be serialized in the order in which they
commit. Most database systems implement either strict or rigorous two-phase locking.
7.Explain the importance of lock manager?
Ans: A lock manager can be implemented as a process that receives messages from transactions
and sends messages in reply. The lock-manager process replies to lock-request messages with
lock-grant messages, or with messages requesting rollback of the transaction (in case of
deadlocks). Unlock messages require only an acknowledgment in response, but may result in a
grant message to another waiting transaction.
The lock manager uses this data structure: For each data item that is currently locked, it
maintains a linked list of records, one for each request, in the order in which the requests arrived.
It uses a hash table, indexed on the name of a data item, to find the linked list (if any) for a data
item; this table is called the lock table. Each record of the linked list for a data item notes which
transaction made the request, and what lock mode it requested. The record also notes if the
request has currently been granted.
8. Define timestamp? Write two methods for implementing timestamp?
Ans: Timestamps
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by
TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts
execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters
the system, then TS(Ti) < TS(Tj ). There are two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s timestamp is equal
to the value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a
transaction’s timestamp is equal to the value of the coun when the transaction enters the system.
The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) <

150
TS(Tj ), then the system must ensure that the produced schedule is equivalent to a serial
schedule in which transaction Ti appears before transaction Tj . To implement this scheme, we
associate with each data item Q two timestamp values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q)
successfully.
• R-timestamp(Q) denotes the largest timestamp of any transaction that executed read(Q)
successfully.
These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed.
9.Write difference between timestamp based and Thomas write rule?
Ans:The modification to the timestamp-ordering protocol, called Thomas’ write rule, is this:
Suppose that transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was previously needed,
and it had been assumed that the value would never be produced. Hence, the system rejects the
write operation and rolls Ti back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this
write operation can be ignored.
3. Otherwise, the system executes the write operation and sets W-timestamp(Q) to TS(Ti).
The difference between these rules and those of lies in the second rule. The timestamp-
ordering protocol requires that Ti be rolled back if Ti issues write(Q) and TS(Ti) < W-
timestamp(Q). However, here, in those cases where TS(Ti) ≥ R-timestamp(Q), we ignore the
obsolete write.

10. Define intention lock mode?


Ans : A more efficient way to gain the knowledge is to introduce a new class of lock modes,
called intention lock modes. If a node is locked in an intention mode, explicit locking is being
done at a lower level of the tree (that is, at a finer granularity). Intention locks are put on all the
ancestors of a node before that node is locked explicitly. Thus, a transaction does not need to
search the entire tree to determine whether it can lock a node successfully.
A transaction wishing to lock a node—say, Q—must traverse a path in the tree from the
root to Q. While traversing the tree, the transaction locks the various nodes in an intention mode.
There is an intention mode associated with shared mode, and there is one with exclusive mode. If

151
a node is locked in intention-shared (IS) mode, explicit locking is being done at a lower level
of the tree, but with only shared-mode locks. Similarly, if a node is locked in intention-exclusive
(IX) mode, then explicit locking is being done at a lower level, with exclusive-mode or shared-
mode locks. Finally, if a node is locked in shared and intention-exclusive (SIX) mode, the sub
tree rooted by that node is locked explicitly in shared mode, and that explicit locking is being
done at a lower level with exclusive-mode locks.
11.Define log and log records? write different fields in log record?
Ans : The most widely used structure for recording database modifications is the log. The log is
a sequence of log records, recording all the update activities in the database. There are several
types of log records. An update log record describes a single database write. It has these fields:
 Transaction identifier is the unique identifier of the transaction that performed the write
operation.
 Data-item identifier is the unique identifier of the data item written. Typically, it is the
location on disk of the data item.
 Old value is the value of the data item prior to the write.
 New value is the value that the data item will have after the write.
12.Write short notes on types of log records?
Ans : Other special log records exist to record significant events during transaction processing,
such as the start of a transaction and the commit or abort of a transaction.
We denote the various types of log records as:
 <Ti start>. Transaction Ti has started.
 <Ti, Xj, V1, V2>. Transaction Ti has performed a write on data item Xj . Xj had value V1
before the write, and will have value V2 after the write.
 <Ti commit>. Transaction Ti has committed.
 <Ti abort>. Transaction Ti has aborted.
Whenever a transaction performs a write, it is essential that the log record for that
write be created before the database is modified. Once a log record exists, we can output the
modification to the database if that is desirable. Also, we have the ability to undo a modification
that has already been output to the database. We undo it by using the old-value field in log
records.
13.Write short notes on checkpoints?

152
Ans : When a system failure occurs, we must consult the log to determine those transactions that
need to be redone and those that need to be undone. There are two major difficulties with this
approach:
1. The search process is time consuming.
2. Most of the transactions that, according to our algorithm, need to be redone have already
written their updates into the database. Although redoing them will cause no harm, it will
nevertheless cause recovery to take longer.
To reduce these types of overhead, we introduce checkpoints. During execution, the system
maintains the log. In addition, the system periodically performs checkpoints, which require the
following sequence of actions to take place:
1. Output onto stable storage all log records currently residing in main memory.
2. Output to the disk all modified buffer blocks.
3. Output onto stable storage a log record <checkpoint>.
The presence of a <checkpoint> record in the log allows the system to streamline its
recovery procedure. Consider a transaction Ti that committed prior to the checkpoint. For such a
transaction, the <Ti commit> record appears in the log before the <checkpoint> record. Any
database modifications made by Ti must have been written to the database either prior to the
checkpoint or as part of the checkpoint itself. Thus, at recovery time, there is no need to perform
a redo operation on Ti.

14.Explain briefly transaction rollback?


Ans : We roll back a failed transaction, Ti, by using the log. The system scans the log backward;
for every log record of the form <Ti, Xj, V1, V2> found in the log, the system restores the data
item Xj to its old value V1. Scanning of the log terminates when the log record <Ti, start> is
found. Scanning the log backward is important, since a transaction may have updated a data item
more than once. As an illustration, consider the pair of log records
<Ti, A, 10, 20>
<Ti, A, 20, 30>
The log records represent a modification of data item A by Ti, followed by another modification
of A by Ti. Scanning the log backward sets A correctly to 10. If the log were scanned in the
forward direction, A would be set to 20, which is incorrect.

153
If strict two-phase locking is used for concurrency control, locks held by a transaction T
may be released only after the transaction has been rolled back as described. Once transaction T
(that is being rolled back) has updated a data item, no other transaction could have updated the
same data item, because of the concurrency-control requirements . Therefore, restoring the old
value of the data item will not erase the effects of any other transaction.
15.Explain about buffer management?
Ans : The cost of performing the output of a block to stable storage is sufficiently high that it is
desirable to output multiple log records at once. To do so, we write log records to a log buffer in
main memory, where they stay temporarily until they are output to stable storage. Multiple log
records can be gathered in the log buffer, and output to stable storage in a single output
operation. The order of log records in the stable storage must be exactly the same as the order in
which they were written to the log buffer. As a result of log buffering, a log record may reside in
only main memory (volatile storage) for a considerable time before it is output to stable storage.

LONG ANSWERS:
1.Explain lock based concurrency control in detail?
Ans:
Lock-Based Protocols
One way to ensure serializability is to require that data items be accessed in a mutually
exclusive manner; that is, while one transaction is accessing a data item, no other transaction can
modify that data item. The most common method used to implement this requirement is to allow
a transaction to access a data item only if it is currently holding a lock on that item.
Locks
There are various modes in which a data item may be locked. In this section, we restrict our
attention to two modes:
1. Shared. If a transaction Ti has obtained a shared-mode lock (denoted by S) on item Q, then
Ti can read, but cannot write, Q.
2. Exclusive. If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on item Q,
then Ti can both read and write Q.
Every transaction request a lock in an appropriate mode on data item Q, depending
on the types of operations that it will perform on Q. The transaction makes the request to the

154
concurrency-control manager. The transaction can proceed with the operation only after the
concurrency-control manager grants the lock to the transaction.
Given a set of lock modes, we can define a compatibility function on them as follows.
Let A and B represent arbitrary lock modes. Suppose that a transaction Ti requests a lock of
mode A on item Q on which transaction Tj (Ti _= Tj ) currently holds a lock of mode B. If
transaction Ti can be granted a lock on Q immediately, in spite of the presence of the mode B
lock, then we say mode A is compatible with mode B. Such a function can be represented
conveniently by a matrix.

S X

S True false

X False false

Lock-compatibility matrix comp.


The compatibility relation between the two modes of locking appears in the matrix
comp . An element comp(A, B) of the matrix has the value true if and only if mode A is
compatible with mode B.
Note that shared mode is compatible with shared mode, but not with exclusive mode. At
any time, several shared-mode locks can be held simultaneously (by different transactions) on a
particular data item. A subsequent exclusive-mode lock request has to wait until the currently
held shared-mode locks are released.
A transaction requests a shared lock on data item Q by executing the lock-S(Q)
instruction. Similarly, a transaction requests an exclusive lock through the lock-X(Q) instruction.
A transaction can unlock a data item Q by the unlock(Q) instruction.
To access a data item, transaction Ti must first lock that item. If the data item is already
locked by another transaction in an incompatible mode, the concurrency control manager will not
grant the lock until all incompatible locks held by other transactions have been released. Thus, Ti
is made to wait until all incompatible locks held by other transactions have been released.
T1: lock-X(B);
read(B);
B := B − 50;

155
write(B);
unlock(B);
lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(A).
Transaction T1.
T2: lock-S(A);
read(A);
unlock(A);
lock-S(B);
read(B);
unlock(B);
display(A + B).
Transaction T2
Transaction Ti may unlock a data item that it had locked at some earlier point. Note that
a transaction must hold a lock on a data item as long as it accesses that item. Moreover, for a
transaction to unlock a data item immediately after its final access of that data item is not always
desirable, since serializability may not be ensured.
The Two-Phase Locking Protocol
One protocol that ensures serializability is the two-phase locking protocol. This protocol
requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as
needed. Once the transaction releases a lock, it enters the shrinking phase, and it can issue no
more lock requests.
Whenever , a transaction enters the system ,it is said to be in growing phase, where it acquires
the locks as per the requirement. But , when the locks are released ,transaction enters the second
phase i.e., shrinking phase where no more lock requests are processed. If lock conversion is

156
applied in 2PL, then lock upgrading and downgrading can be done during growing and shrinking
phase respectively.
T1 T2 Lock Manager
Lock-S(P)
Grant-S(P,T1)
Read(P)
Unlock(P)
Lock-X(Q)
Grant-X(Q,T1)
Read(Q)
Q:=Q+P
Write(Q)
Unlock(Q)
Lock-S(Q)
Grant-S(Q,T2)
Read(Q)
Unlock(Q)
Lock-X(P)
Grant-X(P,T2)
Read(P)
P:=P+Q
Write(P)
Unlock(P)
SCHEDULE1

The transaction in schedule S1(T1,T2) doesn’t obey 2PL protocol since the lock-X(Q) is
executed after the execution of unlock(P) operation in T1 .similarly ,the lock-X(P)
Operation is executed after the execution of unlock(Q) operation in T2.
On the other hand ,the transaction in schedule S2 follows 2PL protocol since lock-X(Q)
operation is executed before the execution of unlock(P) instruction and lock-X(P) is executed
before the execution of unlock(Q).

157
T3 T4 Lock Manager
Lock-S(P)
Grant-S(P,T3)
Read(P)
Lock-X(Q)

Grant-X(Q,T3)
Unlock(P)
Read(Q)
Q:=Q+P
Write(Q)
Unlock(Q) Lock-S(Q)
Grant-S(Q,T4)
Read(Q)
Lock-X(P)

Unlock(Q) Grant-X(P,T2)
Read(P)
P:=P+Q
Write(P)
Unlock(P)

SCHEDULE2

Cascading rollbacks can be avoided by a modification of two-phase locking called the


strict two-phase locking protocol. This protocol requires not only that locking be two phase,
but also that all exclusive-mode locks taken by a transaction be held until that transaction
commits. This requirement ensures that any data written by an uncommitted transaction are
locked in exclusive mode until the transaction commits, preventing any other transaction from
reading the data.
Another variant of two-phase locking is the rigorous two-phase locking protocol, which
requires that all locks be held until the transaction commits.

158
2.Explain Timestamp based protocol in detail? Explain Thomas write rule?
Ans: Timestamp-Based Protocols
Another method for determining the serializability order is to select an ordering among
transactions in advance. The most common method for doing so is to use a timestamp-ordering
scheme.
Timestamps
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted
by TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts
execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters
the system, then TS(Ti) < TS(Tj ). There are two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s timestamp is equal
to the value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a
transaction’s timestamp is equal to the value of the counter when the transaction enters the
system.
The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) < TS(Tj ),
then the system must ensure that the produced schedule is equivalent to a serial schedule in
which transaction Ti appears before transaction Tj .
To implement this scheme, we associate with each data item Q two timestamp values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q)
successfully.
• R-timestamp(Q) denotes the largest timestamp of any transaction that executed read(Q)
successfully.
These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed.
The Timestamp-Ordering Protocol
The timestamp-ordering protocol ensures that any conflicting read and write operations are
executed in timestamp order. This protocol operates as follows:
1. Suppose that transaction Ti issues read(Q).
a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten.
Hence, the read operation is rejected, and Ti is rolled back.
b. If TS(Ti) ≥ W-timestamp(Q), then the read operation is executed, and R timestamp( Q) is set

159
to the maximum of R-timestamp(Q) and TS(Ti).
2. Suppose that transaction Ti issues write(Q).
a. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously,
and the system assumed that that value would never be produced. Hence, the system rejects the
write operation and rolls Ti back.
b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the
system rejects this write operation and rolls Ti back.
c. Otherwise, the system executes the write operation and sets W-timestamp( Q) to TS(Ti).
Thomas’ Write Rule
We now present a modification to the timestamp-ordering protocol that allows greater
potential concurrency than does the protocol The observation leads to a modified version of the
timestamp-ordering protocol in which obsolete write operations can be ignored under certain
circumstances. The protocol rules for read operations remain unchanged. The protocol rules for
write operations, however, are slightly different from the timestamp-ordering protocol. The
modification to the timestamp-ordering protocol, called Thomas’ write rule, is this:
Suppose that transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was previously needed,
and it had been assumed that the value would never be produced. Hence, the system rejects the
write operation and rolls Ti back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this
write operation can be ignored.
3. Otherwise, the system executes the write operation and sets W-timestamp(Q) to TS(Ti).
3.Explain about validation based protocol?
Ans: In cases where a majority of transactions are read-only transactions, the rate of conflicts
among transactions may be low. Thus, many of these transactions, if executed without the
supervision of a concurrency-control scheme, would nevertheless leave the system in a
consistent state.
A difficulty in reducing the overhead is that we do not know in advance which
transactions will be involved in a conflict. To gain that knowledge, we need a scheme for
monitoring the system.We assume that each transaction Ti executes in two or three different
phases in its lifetime, depending on whether it is a read-only or an update transaction.

160
The phases are, in order,
1. Read phase. During this phase, the system executes transaction Ti. It reads the values of the
various data items and stores them in variables local to Ti. It performs all write operations on
temporary local variables, without updates of the actual database.
2. Validation phase. Transaction Ti performs a validation test to determine whether it can copy
to the database the temporary local variables that hold the results of write operations without
causing a violation of serializability.
3. Write phase. If transaction Ti succeeds in validation (step 2), then the system applies the
actual updates to the database. Otherwise, the system rolls back Ti.
Each transaction must go through the three phases in the order shown. However, all
three phases of concurrently executing transactions can be interleaved.
To perform the validation test, we need to know when the various phases of transactions
Ti took place. We shall, therefore, associate three different timestamps with transaction Ti:
1. Start(Ti), the time when Ti started its execution.
2. Validation(Ti), the time when Ti finished its read phase and started its validation phase.
3. Finish(Ti), the time when Ti finished its write phase.
We determine the serializability order by the timestamp-ordering technique, using
the value of the timestamp Validation(Ti). Thus, the value TS(Ti) = Validation(Ti) and, if TS(Tj )
< TS(Tk), then any produced schedule must be equivalent to a serial schedule in which
transaction Tj appears before transaction Tk. The reason we have chosen Validation(Ti), rather
than Start(Ti), as the timestamp of transaction Ti is that we can expect faster response time
provided that conflict rates among transactions are indeed low.
The validation test for transaction Tj requires t hat, for all transactions Ti with TS(Ti) <
TS(Tj ), one of the following two conditions must hold:
1. Finish(Ti) < Start(Tj ). Since Ti completes its execution before Tj started, the serializability
order is indeed maintained.
2. The set of data items written by Ti does not intersect with the set of data items read by Tj, and
Ti completes its write phase before Tj starts its validation phase (Start(Tj ) < Finish(Ti) <
Validation(Tj )). This condition ensures that the writes of Ti and Tj do not overlap. Since the
writes of Ti do not affect the read of Tj , and since Tj cannot affect the read of Ti, the
serializability order is indeed maintained

161
T14 T15
read(B)
read(B)
B := B – 50
read(A)
A := A + 50
read(A)
<validate>
display(A + B)

<validate>
write(B)
write(A)

Schedule 5, a schedule produced by using validation.


As an illustration, consider again transactions T14 and T15. Suppose that TS(T14)<
TS(T15). Then, the validation phase succeeds in the schedule 5 . Note that the writes to the actual
variables are performed only after the validation phase 5. Thus, T14 reads the old values of B and
A, and this schedule is serializable.
The validation scheme automatically guards against cascading rollbacks, since the actual
writes take place only after the transaction issuing the write has committed. However, there is a
possibility of starvation of long transactions, due to a sequence of conflicting short transactions
that cause repeated restarts of the long transaction. To avoid starvation, conflicting transactions
must be temporarily blocked, to enable the long transaction to finish. This validation scheme is
called the optimistic concurrency control scheme since transactions execute optimistically,
assuming they will be able to finish execution and validate at the end. In contrast, locking and
timestamp ordering are pessimistic in that they force a wait or a rollback whenever a conflict is
detected, even though there is a chance that the schedule may be conflict serializable.
4.Explain about multiple granularity?
Ans: Multiple Granularity:
Transactions were capable of acquiring locks on only a single data item . Suppose ,if that
transaction wishes to lock every data item of the database ,then it must send a lock request to the

162
lock manager. Once the request is granted ,then the transaction acquires lock on every single
data item . The disadvantage of that locking mechanism is that, a lot of time is consumed by a
transaction while executing these data classes .This problem can be overcome if the
transaction sends a single request that lock request that locks the entire database to the lock
manager. However if another transaction wants to access only few data item, then this transaction
should not send a lock request to lock the entire database because by doing so, the concurrency
will be lost.
Therefore ,a mechanism that allows the system to define many levels of granularity is
required. One such mechanism is to allow data items of different sizes to define granularity
hierarchy in which smaller granularities lies within larger granularities .This granularity
hierarchy can be graphically represented in the form of tree. Every non leaf node in the signifies
the data that is associated with its child nodes.
Consider a database hierarchy that consist of four levels. The top most level signifies the
root node. The children of root node are area nodes, each containing different file nodes as its
children .The lower level nodes after the file nodes corresponds to record nodes. Every file node
within an area node is unique and every record node within the file node is unique.
Such hierarchy is referred to as containment hierarchy. An example of such hierarchy is
a University database.

C
C
2
1

Cr
Cr 4d
Cr Cr2
3c
1a b

S1 S S Sn
S2 S S S2 S S
S S1 1 2d
a d
na nc
1b nb 2c
a b c d

163
Granularity hierarchy For University Database
A University contains several colleges ,each college has many courses and each course
have several students. A student can select a course of his choice in a particular college.
In multiple granularity mechanism ,when every individual node acquires an explicit lock in
either shared or exclusive mode ,then all its child nodes also acquires an implicit lock in the same
lock mode. For example if transaction T1 acquires an explicit lock on the course Cr 2b file in
exclusive mode ,then an implicit lock is acquired by all the student records that are the
descendants of Cr2b file.
Now, if another transaction T2 wants to acquire an explicit lock on the student record S 2b,it
fails to do so. This is because, transaction T1 already holds an explicit lock on its parent node ie.,
Cr2b . To know whether T 2 can acquire a lock on S 2b ,it needs to traverse the tree from root node
student node ,while traversing if an incompatible mode is detected then transaction T2 must wait
until T1 releases the lock on Cr2b.
If a transaction T3 wants to acquire an explicit lock on entire database U, then it can simply
send a single lock request to lock the root of the hierarchy. But , the system fails to lock the root
node since , T1 already holds a lock on a part of the database . Therefore, T3 needs to wait until
T1 releases the lock. System can determine whether a root node can be locked or not by
traversing the entire tree. Such traversing requires more searching time . Therefore , to avoid
such problem, a new class of lock nodes are created call the intention lock modes.
If a node explicitly acquires a lock in an intention mode , then all the lower levels
nodes of the hierarchy are locked explicitly. Intention locks are applied on all the parent nodes ,
before acquiring a lock explicitly on the child nodes. Instead of , traversing the entire tree for
determining whether a node can be locked or not, the transaction can be simply traverse a path
from a root node to the node i.e, .to be locked. While traversing ,all the nodes are locked in
intention mode.
Shared and exclusive mode can be associated with intention mode in the following ways:
1)Intention Shared Mode(IS): If a node explicitly acquires a lock in IS mode , then all the
lower level nodes are locked explicitly in shared mode
2)Intention exclusive mode(IX): if a node explicitly acquires a lock in IX mode , then all the
lower level nodes are locked explicitly either in exclusive or shared mode.

164
3)Shared and Intention exclusive mode(ISX): if a node explicitly acquires a lock in SIX
mode ,then all the descendants of that nodes are locked in shared mode explicitly and the
remaining lower level nodes are locked in exclusive mode.
IS IX S SIX X

1 1 1 1 0

1 1 0 0 0

1 0 1 0 0

1 0 0 0 0

0 0 0 0 0
Compatibility Matrix
5.Explain types of log based recovery?
Ans: The most widely used structure for recording database modifications is the log. The log is a
sequence of log records, recording all the update activities in the database. There are several
types of log records. An update log record describes a single database write. It has these fields:
• Transaction identifier is the unique identifier of the transaction that performed the write
operation.
• Data-item identifier is the unique identifier of the data item written. Typically, it is the
location on disk of the data item.
• Old value is the value of the data item prior to the write.
• New value is the value that the data item will have after the write.
Other special log records exist to record significant events during transaction processing,
such as the start of a transaction and the commit or abort of a transaction.
We denote the various types of log records as:
• <Ti start>. Transaction Ti has started.
• <Ti, Xj, V1, V2>. Transaction Ti has performed a write on data item Xj . Xj had value V1 before
the write, and will have value V2 after the write.

165
• <Ti commit>. Transaction Ti has committed.
• <Ti abort>. Transaction Ti has aborted.
Whenever a transaction performs a write, it is essential that the log record for that write
be created before the database is modified. Once a log record exists, we can output the
modification to the database if that is desirable. Also, we have the ability to undo a modification
that has already been output to the database. We undo it by using the old-value field in log
records.
Deferred Database Modification
The deferred-modification technique ensures transaction atomicity by recording all
database modifications in the log, but deferring the execution of all write operations of a
transaction until the transaction partially commits. Recall that a transaction is said to be partially
committed once the final action of the transaction has been executed. The version of the
deferred-modification technique that we describe in this section assumes that transactions are
executed serially.
When a transaction partially commits, the information on the log associated with the
transaction is used in executing the deferred writes. If the system crashes before the transaction
completes its execution, or if the transaction aborts, then the information on the log is simply
ignored. The execution of transaction Ti proceeds as follows. Before Ti starts its execution, a
record <Ti start> is written to the log. A write(X) operation by Ti results in the writing of a new
record to the log. Finally, when Ti partially commits, a record <Ti commit> is written to the log.
Observe that only the new value of the data item is required by the deferred modification. we can
simplify the general update-log record structure that we saw in the previous section, by omitting
the old-value field.
To illustrate, reconsider our simplified banking system. Let T0 be a transaction that
transfers $50 from account A to account B:
T0: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).

166
Let T1 be a transaction that withdraws $100 from account C:
T1: read(C);
C := C − 100;
write(C).
Suppose that these transactions are executed serially, in the order T0 followed by T1, and
that the values of accounts A, B, and C before the execution took place were $1000, $2000, and
$700, respectively. There are various orders in which the actual outputs can take place to both the
database system and the log as a result of the execution of T0 and T1.
<T0 start>
<T0 , A, 950>
<T0 , B, 2050>
<T0 commit>
<T1 start>
<T1 , C, 600>
<T1 commit>
Portion of the database log corresponding to T0 and T1.
Note that the value of A is changed in the database only after the record <T0, A, 950> has
been placed in the log. Using the log, the system can handle any failure that results in the loss of
information on volatile storage. The recovery scheme uses the following recovery procedure:
• redo(Ti) sets the value of all data items updated by transaction Ti to the new values. The set of
data items updated by Ti and their respective new values can be found in the log.
The redo operation must be idempotent; that is, executing it several times must be
equivalent to executing it once. This characteristic is required if we are to guarantee correct
behavior even if a failure occurs during the recovery process.
After a failure, the recovery subsystem consults the log to determine which
transactions need to be redone. Transaction Ti needs to be redone if and only if the log contains
both the record <Ti start> and the record <Ti commit>. Thus, if the system crashes after the
transaction completes its execution, the recovery scheme uses the information in the log to
restore the system to a previous consistent state after the transaction had completed.
As an illustration, let us return to our banking example with transactions T0 andT1
executed one after the other in the order T0 shows the log that results from the complete

167
execution of T0 and T1. Let us suppose that the
Log Database
<T0 start>
<T0 , A, 950>
<T0 , B, 2050>
<T0 commit>
A = 950
B = 2050
<T1 start>
<T1 , C, 600>
<T1 commit>
C = 600

State of the log and database corresponding to T0 and T1.

<T0 start> <T0 start> <T0 start>


<T0 , A, 950> <T0 , A, 950> <T0 , A, 950>
<T0 , B, 2050> <T0 , B, 2050> <T0 , B, 2050>
<T0 commit> <T0 commit>
<T1 start> <T1 start>
<T1 , C, 600> <T1 , C, 600>
<T0 commit>
(a) (b) (c)
The same log as that in Figure above, shown at three different times.
System crashes before the completion of the transactions, so that we can see how
the recovery technique restores the database to a consistent state. Assume that the crash occurs
just after the log record for the step
write(B)
of transaction T0 has been written to stable storage. The log at the time of the crash appears
in Figure (a). When the system comes back up, no redo actions need to be taken, since no

168
commit record appears in the log. The values of accounts A and B remain $1000 and $2000,
respectively. The log records of the incomplete transaction T0 can be deleted from the log.
Immediate Database Modification
The immediate-modification technique allows database modifications to be output to the
database while the transaction is still in the active state. Data modifications written by active
transactions are called uncommitted modifications. Before a transaction Ti starts its execution,
the system writes the record <Ti start> to the log. During its execution, any write(X) operation
by Ti is preceded by the writing of the appropriate new update record to the log. When Ti
partially commits, the system writes the record <Ti commit> to the log As an illustration, let us
reconsider our simplified banking system, with transactions T0 and T1 executed one after the
other in the order T0 followed by T1. The portion of the log containing the relevant information
concerning these two transactions one possible order in which the actual outputs took place in
both the database system and the log as a result of the execution of T0 and T1. Notice that
<T0 start>
<T0 , A, 1000, 950>
<T0 , B, 2000, 2050>
<T0 commit>
<T1 start>
<T1 , C, 700, 600>
<T1 commit>
Portion of the system log corresponding to T0 and T1.

Log Database
<T0 start>
<T0 , A, 1000,950>
<T0 , B,2000,2050>
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1 , C, 700,600>

169
C = 600
<T1 commit>
State of the log and database corresponding to T0 and T1.
This order could not be obtained in the deferred-modification technique Using the log, the
system can handle any failure that does not result in the loss of information in nonvolatile
storage. The recovery scheme uses two recovery procedures:
• undo(Ti) restores the value of all data items updated by transaction Ti to the old values.
• redo(Ti) sets the value of all data items updated by transaction Ti to the new values.
The set of data items updated by Ti and their respective old and new values can be found
in the log.
The undo and redo operations must be idempotent to guarantee correct behavior even if a failure
occurs during the recovery process. After a failure has occurred, the recovery scheme consults
the log to determine which transactions need to be redone, and which need to be undone:
• Transaction Ti needs to be undone if the log contains the record <Ti start>, but does not contain
the record <Ti commit>.
• Transaction Ti needs to be redone if the log contains both the record <Ti start> and the record
<Ti commit>.
<T0 start> <T0 start> <T0 start>
<T0 , A,1000, 950> <T0 , A, 1000,950> <T0 , A, 950>
<T0 , B, 2000,2050> <T0 , B, 2000,2050> <T0 , B, 2050>
<T0 commit> <T0 commit>
<T1 start> <T1 start>
<T1 , C, 700,600> <T1 , C, 600>
<T0 commit>
(a) (b) (c)
The same log as that in Figure above, shown at three different times

170
Gokaraju Rangaraju Institute of Engineering and Technology
Department of Computer Science and Engineering
Year/Semester : II / I Academic year: 2015-16
DBMS Assignment Sheet: V

Descriptive questions/Programs/Experiments

1. Define Transaction. Explain the properties of Transaction.


2. Differentiate between Serial Execution and Concurrent Execution of Transactions.
3. How do you implement Atomicity and Durability using Shadow Copy Technique?
4. Explain about Serializability with suitable examples.
5. Describe Conflict Serializability with example.
6. Describe View Serializability with example.
7. Differentiate between Conflict Serializability and View Serializability?
8. Differentiate between Recoverable Schedule and Non Recoverable Schedule?
9. How do you test whether Schedules can be Serialized?
10. What is Concurrency Control? what are the various protocols used for Concurrency
Control?
11. Explain Lock Based Protocols with example?
171
12. Write about Two phase Locking Protocol with suitable example?
13. Differentiate Strict Two phase Locking Protocol and Rigorous Two phase Locking
Protocol?
14. Explain about Time Stamp based Protocol with example?
15. Explain about Thomas’ Write Rule?
16. Differentiate between Time Stamp based Protocol and Thomas’ Write Rule?
17. Explain about Validation based Protocol?
18. Describe Multiple Granularity with example?
19. What is Log-Based Recovery? Explain about the various types of Log-Based Recovery
Techniques?
20. Explain how Checkpoints are used in Transaction Management?
21. Explain about Buffer Management?.

Tutor Faculty HOD

172

Vous aimerez peut-être aussi