Important Questions

Q. 1. What do you mean by database?
Ans. A database is a collection of occurrence of multiple record types containing the relationship between records, data aggregate and data items. A database may be defined as A database is a collection of interrelated data store together without harmful and unnecessary redundancy (duplicate data) to serve multiple applications The data is stored so that they are independent of programs, which use the data. A common and control approach is used in adding the new data, modifying and retrieving existing data or deletion of data within the database A running database has function in a corporation, factory, government department and other organization. Database is used for searching the data to answer some ueries. A database may be design for batch processing, real time processing or on line processing. DATA!A"# "$"T#% Database "ystem is an integrated collection of related files along with the detail about their definition, interpretation, manipulation and maintenance. &t is a system, which satisfied the data need for various applications in an organization without unnecessary redundancy. A database system is based on the data. Also a database system can be run or executed by using software called D!%" (Database %anagement "ystem). A database system controls the data from unauthorized access. 'oundation Data (oncept A hierarchy of several levels of data has been devised that differentiates between different groupings, or elements, of data. Data are logically organized into)
(haracter
&t is the most basic logical data element. &t consists of a single alphabetic, numeric, or other symbol.
'ield
&t consists of a grouping of characters. A data field represents an attribute (a characteristic or uality) of some entity (ob*ect, person, place, or event).
+ecord
The related fields of data are grouped to form a record. Thus, a record represents a collection of attributes that describe an entity. 'ixed,length records contain, a fixed number of fixed,length data fields. -ariable,length records contain a variable number of fields and field lengths.
'ile
A group of related records is .nown as a data file, or table. 'iles are fre uently classified by the application for which they ar primarily used, such as a payroll file or an inventory file, or the type of data they contain, such as a document file or a graphical image file. 'iles are also classified by their permanence, for example, a master file versus a transaction file. A transaction file would contain records of all transactions occurring during a period, whereas a master file contains all the permanent records. A history file is an obsolete transaction or master file retained for bac.up purposes or for long,term historical storage called archival storage.
Database
&t is an integrated collection of logically related records or ob*ects. A database consolidates records previously stored in separate files into a common pool of data records that provides data for many applications. The data stored in a database is independent of the application programs using it and o the /type of secondary storage devices on which it is stored.
Q. 2. What are the various characteristics of DBMS?
Ans. The ma*or characteristics of database approach are) 0 "elf,describing 1ature of a Database "ystem 0 &nsulation between 2rograms and Data, and Data Abstraction 0 "upport of %ultiple -iews of the Data 0 "haring of Data and %ulti user Transaction 2rocessing
Q. 3. What are the various characteristics of DBMS approach?
Ans. 3. "elf,contained nature D!%" system contains data plus a full description of the data (called 4metadata5) 4metadata5 is data about data , data formats, record structures, locations, how to access, indexes metadata is stored in a catalog and is used by D!%" software to .now how to access the data. (ontrast this with the file processing approach where application programs need to .now the structure and format of records and data. 6. 2rogram,data independence Data independence is immunity of application programs to changes in storage structures and access techni ues. #.g. adding a new field, changing index structure, changing data format, &n a D!%" environment these changes are reflected in the catalog. Applications aren7t affected. Traditional file processing programs would all have to change, possibly substantially. 8. Data abstraction A D!%" provides users with a conceptual representation of data (for example, as ob*ects with properties and inter,relationships). "torage details are hidden. (onceptual representation is provided in terms of a data model. 9. "upport for multiple views D!%" may allow different users to see different 4views5 of the D!, according to the perspective each one re uires. #.g. a subset of the data , 'or example: the people using the payroll system need not;should not see data about students and class schedules. #.g. data presented in a different form from the way it is stored , 'or example someone interested in student transcripts might get a view which is formed by combining information from separate files or tables.
<. (entralized control of the data resource The D!%" provides centralized control of data in an organization. This brings a number of advantages) (a) reduces redundancy (b) avoids inconsistencies (c) data can be shared (d) standards can be enforced (e) security restrictions can be applied (f) integrity can be maintained a, b. +edundancy and &nconsistencies
+edundancy is unnecessary duplication of data. 'or example if accounts department and registration department both .eep student name, number and address. +edundancy wastes space and duplicates effort in maintaining the data. +edundancy also leads to inconsistency. &nconsistent data is data which contradicts itself , e.g. two different addresses for a given student number. &nconsistency cannot occur if data is represented by a single entry (i.e. if there is no redundancy). (ontrolled redundancy) "ome redundancy may be desirable (for efficiency). A D!%" should be aware of it, and ta.e care of propagating updates to all copies of a data item. This is an ob*ective, not yet currently supported. c. "haring 0 1eed concurrency control 0 %ultiple user views d. "tandards #.g. data formats, record structures, naming, documentation &nternational, organizational, departmental ... standards e. "ecurity , restricting unauthorized access D!%" should perform security chec.s on all accesses. f. &ntegrity %aintaining validity of data: e.g. employee numbers must be in some range e.g. every course must have an instructor e.g.. student number must be uni ue e.g. hours wor.ed cannot be more than 3<= These things are expressed as constraints. D!%" should perform integrity chec.s on all updates. (urrently D!%"s provide limited integrity chec.s.
Q. 3. What are the various types of databases?
Ans. Types of Databases (ontinuing developments in information technology and its business applications have resulted in the evolution of several ma*or types of databases. "everal ma*or conceptual categories of databases that may be found in computer,using organizations include) >perational Databases The databases store detailed data needed to support the operations of the entire organization. They are also called sub*ect area databases ("AD!), transaction databases, and production databases) #xamples are customer databases, personnel databases, inventory databases, and other databases containing data generated by business operations Distributed Databases %any organizations replicate and distribute copies or parts of databases to networ. sewers at a variety of sites. These distributed databases can reside on networ. servers on the ?orld ?ide ?eb, on corporate &ntranets or extranets, or on other company networ.s. Distributed databases may be copies of operational or analytical. databases, hypermedia or discussion databases, or any other type of database. +eplication and distribution of databases is done to improve database performance and security. #xternal Databases Access to external, privately owned online databases or data ban.s is available for a fee to end users and organizations from commercial online services, and with or without charge from many sources on the &nternet, especially the ?eb. @ypermedia Databases &t consists of hyperlin.ed pages of multimedia (text, graphics, and photographic images, video clips, audio segments, etc.). 'rom a database management point of view, the set of interconnected multimedia pages at a website is a database of interrelated hypermedia page elements, rather than interrelated data records.
Q. . What do you mean by DBMS?
Ans. A D!%" is best described as a collection of programs that manage the database structure and that control shared access to the data in the database. (urrent D!%"es also store the relationships between the database components: they also ta.e care of defining the re uired access paths to those components A database management system (D!%") is the combination of data, hardware, software and users to help an enterprise manage its operational data. The main function of a D!%" is to provide efficient and reliable methods of data retrieval to many users. #fficient data retrieval is an essential function of database systems. D!%" must be able to deal with several users who try to simultaneously access several items and most fre uently, the same data item A D!%" is a set of programs that is used to store and manipulation data that include the following) 0 Adding new data, for example adding details of new student. 0 Deleting unwanted data, for example deleting the details of students who have completed course. 0 (hanging existing data, for example modifying the fee paid by the student. A database is the information to be stored whereas the database management system is the system used to manage the database. . This structure may be regarded in terms of its hardware implementation, called the physical structure, or this structure may be regarded independently of its hardware implementation, called the logical structure. &n either case, the data structure is regarded as static because a database cannot /process anything. The D!%" is regarded as dynamic because it is through the D!%" that all database processing ta.es place. @ow the D!%" presents data to the user is called the view structure. There are two general modes for data use) ueries and transactions. !oth forms use the D!%" for processing. The uery is processed for presentation in views and none of these processes are written to the database. The transactional is processed for updating values in the database variables. These updates are written to the database. A D!%" provides various functions li.e data security, data integrity, data sharing, data concurrence, data independence, data recovery etc. @owever, all database management systems that are now available in the mar.et li.e "ybase, >racle, and %",Access do not provide the same set of functions, though all are meant for data management.
Q. !. What are the various components of DBMS?
Ans. !asic (omponents) A database system has four components. These four
components are important for understanding and designing the database system. These are) 3. Data 6. @ardware 8. "oftware 9. Asers 3. Data As we have discussed above, data is raw hand information collected by us. Data is made up of data item or data aggregate. A Data item is the smallest unit of named data) &t may consist of bits or bytes. A Data item is often referred to as field or data element. A Data aggregate is the collection of data items within the record, which is given a name and referred as a whole. Data can be collected orally or written. A database can be integrated and shared. Data stored in a system is partition into one or two databases. "o if by chance data lost or damaged at one place, then it can be accessed from the second place by using the sharing facility of data base system. "o a shared data also cane be reused according to the user7s re uirement. Also data must be in the integrated form. &ntegration means data should be in uni ue form i.e. data collected by using a well,defined manner with no redundancy, for example +oll number in a class is non,redundant form and so these have uni ue resistance, but names in class may be in the redundant form and can create lot of problems later on in using and accessing the data. 6. @ardware @ardware is also a ma*or and primary part of the database. ?ithout hardware nothing can be done. The definition of @ardware is 4which we can touch and see5, i.e. it has physical existences. All physical uantity or items are in this category. 'or example, all the hardware input;output and storage devices li.e .eyboard, mouse, scanner, monitor, storage devices (hard dis., floppy dis., magnetic dis., and magnetic drum) etc. are commonly used with a computer system. 8. "oftware "oftware is another ma*or part of the database system. &t is the other side of hardware. @ardware and software are two sides of a coin. They go side by side. "oftware is a system. "oftware are further subdivided into two categories, 'irst type is system software (li.e all the operating systems, all the languages and system pac.ages etc.) and second one is an application software (payroll, electricity billing, hospital management and hostel administration etc.). ?e can define software as which we cannot touch and see. "oftware only can execute. !y using software, data can be manipulated, organized and stored. , 9. Asers
?ithout user all of the above said components (data, hardware B software) are meaning less. Aser can collect the data, operate and handle the hardware. Also operator feeds the data and arranges the data in order by executing the software. >ther components 3. 2eople , Database administrator: system developer: end user. 6. (A"# tools) (omputer,aided "oftware #ngineering ((A"#) tools. 8. Aser interface , %icrosoft Access: 2ower!uilder. 9. Application 2rograms , 2ower!uilder script language: -isual !asic: (CC: (>!>D. <. +epository , "tore definitions of data called %#TADATA, screen and report formats, menu definitions, etc. E. Database , "tore actual occurrences data. F. D!%" , 2rovide tools to manage all of this , create data, maintain data, control security access to data and to the repository, etc.
Q. ".What are the various functions of DBMS?
Ans. These functions will include support for at least all of the following) 0 Data definition) The D!%" must be able to accept data definitions (external schemas, the conceptual schema, the internal schema, and all associated mappings) in source form and convert them to the appropriate ob*ect form. 0 Data manipu3ation) The D!%" must be able to handle re uests from the users to retrieve, update, or delete existing data the database, or to add new data to the database. &n other words, the D!%" must include a data manipulation language (D%D) processor component. 0 Data security and integrity) The D!%" must monitor user re uests and re*ect any attempt to violate the security and integrity rules defined by the D!A. 0 Data recovery and concurrency) The D!%" , or else some other related software component, usually called the transaction manager , must enforce certain recovery and concurrency controls. 0 Data Dictionary) The D!%" must provide a data dictionary function. The data dictionary can be regarded as a database in its own right (but a system database, rather than a user database). The dictionary contains 4data about the data5 (sometimes called metadata) , that is, definitions of other ob*ects in the system , rather than *ust5raw data.5 &n particular, all the various schemas and mapping (external, conceptual, etc.) will physically be stored, in both source and
ob*ect form, in the dictionary. A comprehensive dictionary will also include cross, reference information, showing, for instance, which programs use which pieces of the database, which users re uire which reports, which terminals are connected to the system, and so on. The dictionary might even , in fact, probably should G be integrated into the database it defines, and thus include its own definition. &t should certainly be possible to uery the dictionary *ust li.e any other database, so that, for example, it is possible to tell which programs and or users are li.ely to be affected by some proposed change to the system. 2erformance) &t goes without saying that the D!%" should perform all of the functions identified above as efficiently as possible.
Q#. What are the advanta$es and disadvanta$es of a database approach?
Ans. AD-A1TAH#" >' D!%" >ne of the ma*or advantages of using a database system is that the organization can be handled easily and have centralized management and control over the data by the D!A. "ome more and main advantages of database management system are given below) The main advantages of D!%" are) 3. (ontrolling +edundancy &n a D!%" there is no redundancy (duplicate data). &f any type of duplicate data arises, then D!A can control and arrange data in non,redundant way. &t stores the data on the basis of a primary .ey, which is always uni ue .ey and have non,redundant information. 'or example, +oll no is the primary .ey to store the student data. &n traditional file processing, every user group maintains its own files. #ach group independently .eeps files on their db e.g., students. Therefore, much of the data is stored twice or more. +edundancy leads to several problems)
0 Duplication of effort 0 "torage space wasted when the same data is stored repeatedly 'iles that represent the same data may become inconsistent (since the updates are applied independently by each users group).?e can use controlled redundancy.
6. +estricting Anauthorized Access A D!%" should provide a security and authorization subsystem. 0 "ome db users will not be authorized to access all information in the db (e.g., financial data). 0 "ome users are allowed only to retrieve data. 0 "ome users are allowed both to retrieve and to update database. 8. 2roviding 2ersistent "torage for 2rogram >b*ects and Data "tructures Data structure provided by D!%" must be compatible with the programming language7s data structures. #.g., ob*ect oriented D!%" are compatible with programming languages such as (CC, "%ADD TADI, and the D!%" software automatically performs conversions between programming data structure and file formats. 9. 2ermitting &nferencing and Actions Asing Deduction +ules Deductive database systems provide capabilities for defining deduction rules for inferencing new information from the stored database facts. <. &nconsistency can be reduced &n a database system to some extent data is stored in, inconsistent way. &nconsistency is another form of delicacy. "uppose that an em3oyee 4Japneet5 wor. in department 4(omputer5 is represented by two distinct entries in a database. "o way inconsistent data is stored and D!A can remove this inconsistent data by using D!%". E. Data can be shared &n a database system data can be easily shared by different users. 'or example, student data can be share by teacher department, administrative bloc., accounts branch arid laboratory etc. F. "tandard can be enforced or maintained !y using database system, standard can be maintained in an organization. D!A is overall controller of database system. Database is manually computed, but when D!A uses a D!%" and enter the data in computer, then standard can be enforced or maintained by using the computerized system. K. "ecurity can be maintained 2asswords can be applied in a database system or file can be secured by D!A. Also in a database system, there are different coding techni ues to code the data i.e. safe the data from unauthorized access. Also it provides login facility to use for securing and saving the data either
by accidental threat or by intentional threat. "ame recovery procedure can be also maintained to access the data by using the D!%" facility. L. &ntegrity can be maintained &n a database system, data can be written or stored in integrated way. &ntegration means unification and se uencing of data. &n other words it can be defined as 4the data contained in the data base is both accurate and consistent5. /Data can be accessed if it is compiled in a uni ue form. ?e can ta.e primary .ey ad some secondary .ey for integration of data. (entralized control can also ensure that ade uate chec.s are incorporated in the D!%" to provide data integrity. 3=. (onfliction can be removed &n a database system, data can be written or arranged in a well,defined manner by D!A. "o there is no confliction between the databases. D!A select the best file structure and accessing strategy to get better performance for the representation and use of the data. 33. 2roviding %ultiple Aser &nterfaces 'or example uery languages, programming languages interfaces, forms, menu, driven interfaces, etc. 36. +epresenting (omplex +elationships Among Data &t is used to represent (omplex +elationships Among Data 38. 2roviding !ac.up and +ecovery The D!%" also provides bac. up and recovery features. D&"AD-A1TAH#" >' D!%" Database management system has many advantages, but due to some ma*or problem arise in using the D!%", it has some disadvantages. These are explained as)
3.(ost A significant disadvantage of D!%" is cost. &n addition to the cost of purchasing or developing the software, the organization M333 also purchase or upgrade the hardware and so it becomes a costly system. Also additional cost occurs due to migration of data from one environment of D!%" to another environment.
6. 2roblems associated with centralization (entralization also means that data is accessible from a single source. As we .now the centralized data can be accessed by each user, so there is no security of data from unauthorized access and data can be damaged or lost. 8. (omplexity of bac.up and recovery !ac.up and recovery are fairly complex in D!%" environment. As in a D!%", if you ta.e a bac.up of the data then it may affect the multi,user database system which is in operation. Damage database can be recovered from the bac.up floppy, but iterate duplicacy in loading to the concurrent multi,user database system. 9. (onfidentiality, 2rivacy and "ecurity ?hen information is centralized and is made available to users from remote locations, the possibilities of abuse are often more than in a conventional system. To reduce the chances of unauthorized users accessing sensitive information, it is necessary to ta.e technical, administrative and, possibly, legal measures. %ost, databases store valuable information that must be protected against deliberate trespass and destruction. <. Data Nuality "ince the database is accessible to users remotely, ade uate controls are needed to control users updating data and to control data uality. ?ith increased number of users accessing data directly, there are enormous opportunities for users to damage the data. Anless there are suitable controls, the data uality may be compromised. E. Data &ntegrity "ince a large number of users could be using .a database concurrently, technical safeguards are necessary to ensure that the data remain correct during operation. The main threat to data integrity comes from several different users attempting to update the same data at the same time. The database therefore needs to be protected against inadvertent changes by the users. F. #nterprise -ulnerability (entralizing all data of an enterprise in one database may mean that the database becomes an indispensable resource. The survival of the enterprise may depend on reliable information being available from its database. The enterprise therefore becomes vulnerable to the destruction of the database or to unauthorized modification of the database. K. The (ost of using a D!%" (onventional data processing systems are typically designed to run a number of well, defined, preplanned processes. "uch systems are often 4tuned5 to run efficiently for the processes that they were designed for. Although the conventional systems are usually fairly
inflexible in that new applications may be difficult to implement and;or expensive to run, they are usually very efficient for the applications they are designed for. The database approach on the other hand provides a flexible alternative where new applications can be developed relatively inexpensively. The flexible approach is not without its costs and one of these costs is the additional cost of running applications that the conventional system was designed for. Asing standardized software is almost always less machine efficient than specialized software.
Q. %. &ist five si$nificant differences bet'een a fi(e)processin$ system and a DBMS.
Ans. !efore differentiating between file and database systems, there be need to understand the D!%" and its component. Det us consider an organization have a huge amount (collection) of data on its different departments, its employees, its products, sale and purchase order etc. As we .now such type of data is accessed simultaneously by different and several employees. 1ow some users apply number of ueries and want answers uic.ly. &f data is stored in the files, then it will create a problem of slow processing. As we try to deal with this type of data management problem by storing the data in a collection of operating system files. "uch type of techni ues creates number of problems or drawbac.s, which are discussed as below) 3. As we have not 3===H! main memory (primary memory) to store the data, so we store the data in some permanent storage device (secondary memory) li.e magnetic dis. or magnetic tape etc. "o file,oriented system fails in primary memory cases and we apply data base management system to store the data files permanently. 6. "uppose if we have such a large amount of primary memory on a 3E bit or 86 bit computer system, then there be a problem occur in file based system to use the data by direct or random addressing. Also we cannot call more then 6H! or 9Hb of data direct to the primary memory at a time. "o there be need a database program to identify the data. 8. "ome programs are too lengthy and complex which cannot store large amount of data in the files related to the operating systems. !ut a database system made it simple and fast. 9. ?e cannot change and access file,oriented data simultaneously, so we have re uirement a type of system which can be used to access the large amount of data concurrently. <. Also we cannot recall or recover the file,oriented data, but centralized database management solve such type of problem. E. 'ile oriented operating system provide only a password mechanism for security, but this is not successful in case of number of users are accessing the same data by using the same login.
At end we can sat that a D!%" is a piece of software that is designed to ma.e the processing faster and easier.
Q * Describe ma+or advanta$es of a database system over fi(e system ,r Discuss the DBMS and -i(e processin$ system A(so $ive the (imitations of fi(e processin$ system
Ans. T+AD&T&>1AD '&D# 2+>(#""&1H Data are organized, stored, and processed in independent files of data records. &n the traditional file processing approach, each business application was designed to use one or more specialized data files containing only specific types of data records T+AD&T&>1AD '&D# "$"T#% >+ '&D# >+&#1T#D A22+>A(@ The business computers of 3LK= were used in processing of business records and produce information using file oriented approach or file processing environment At that time that system was reliable and faster than the manual system of record .eeping and processing &n this system the data is organized in the form of different files. "ince that system was the collection of files , so we can say it was a file,oriented system. 'ollowing terms was commonly used in this approach or the features of 'ile oriented system. 3. %aster file The file that is created only once i.e. at the starting of computerization or a file which rarely changes. 'or example) &n a ban. master file the account no, name and balance are entered only once and less fre uently changes. 6. 'ile activity ratio The number of records processed one run divided by total number of records. 'or example) if we changes 3== records from a ban. file containing 6== records then file activity ratio is 3==;6== =.<. &t should be noted that this ratio of master file is less. 8. Transaction file A file that is created repeatedly after regular interval of time. 'or example) the payroll file of employee is updated at the end of every month. 9. 'ile volatility ratio &t is the number of records updated in a transaction file divided by total number of records. The file volatility ratio of transaction file is very high. <. ?or. file
A temporary file that helps in sorting and merging of records from one file to other. E. 'ile organization &t means the arrangement of records in a particular order. There were three types of file organizations 3. "e uential 6. Direct 8. &ndexed se uential F. Data island &n this system each dept has its own files designed for local applications. #ach department has its own data processing staff, set of policies, wor.ing rules and report formats. &t means programs were depending on the file structure or format of file. &f the structure of file changes, the program has also to be changed. These days the file oriented approach is still used but has following limitations) D&%&TAT&>1" >' '&D# >+&#1T#D A22+>A(@ 0 Duplicate data "ince all the files are independent of each other. "o some of the fields or files are stored more than once. @ence duplicacy is more in case of file approach but dbms has controlled duplicacy. 0 "eparated and isolated data To ma.e a decision, a user might need data from two separate files. 'irst, analysts and programmers to determine the specific data re uired from each file and the relationships between the data evaluated the files. Then applications could be written in a third generation language to process and extract the needed data. &magine the wor. involved if data from several files was neededO 0 &nconsistency &n this system, data is not consistent. &f a data item is changed the all the files containing that data item need to be changed and updated properly. &f all the files are not updated properly there may be high ris. of inconsistency. D!%" have data consistency. 0 2oor data integrity A collection of data has integrity. A file is said to be have data integrity , it means a item is not be stored in duplicate manner. &t has been seen that file oriented system have poor data integrity control. Data integrity has been achieved in D!%". 0 #very operation is programmable
The processing tas.s li.e searching, editing, deletion etc should have separate programs. &t means there were no functions available for these operations. D!%" have ready,made commands for such operations. 0 Data inflexibility 2rogram,data interdependency and data isolation limited the flexibility of file processing systems in providing users with ad hoc information re uests. !ecause designing applications was so programming,intensive, %&" department staff usually restricted information re uests Therefore, users often resorted to manual methods to obtain needed information. 0 (oncurrency problem &t means using a same record at same time. This problem was common in file approach but can be controlled in D!%". 0 Application programs are dependent on the file format) &n file processing system the physical formats of the files are entered in the programs. The change in file means change in program and vice versa. 1o such problem in D!%". 0 2oor data security All the files are stored in the flat form or text files. These files can be easily located and trapped because file approach, has no data security. 0 Difficult to represent the complex ob*ects) "ome the ob*ects may be of variable length records can be computerized using this approach. D!%" has capability to handle fixed,length records as well as variable,length records. 0 (an not support heavy databases) The databases on the &nternet can be handled by the files system , but D!%" li.e oracle is used for heavy data base applications. >n the other hand the D!%" have following advantages. 0 Difficulty in representing data from the user7s view To create useful applications for the user, often data from various files must be combined. &n file processing it was difficult to determine relationships between isolated data in order to meet user re uirements.
2+>!D#%" >' '&D# 2+>(#""&1H
The file processing approach finally became too cumbersome, costly, and inflexible to supply the information needed to manage modem businesses. &t was replaced by the database management approach. 'ile processing systems had the following ma*or problems)
0 Data +edundancy &ndependent data files included a lot of duplicated data: the same data was recorded and stored in several files. This data redundancy caused problems when data had to be updated, since separate file maintenance programs had to be developed and coordinated to ensure that each file was properly updated. Anfortunately, a lot of inconsistencies occurred among data stored in separate files. 0 Dac. of Data &ntegration @aving independent files made it difficult to provide end users with information for ad hoc re uests that re uired accessing data stored in several different files. "pecial computer programs had to be written to retrieve data from each independent file. This was so difficult, time,consuming, and costly for some organizations that it was impossible to provide end users or management with such information. 0 Data Dependence &n file processing systems, ma*or components of the system , the organization of files, their physical locations of storage hardware, and the application software used to access those files G depended on one another in significant ways. (hanges in the format and structure of data and records in a file re uired that changes be made to all of the programs that used that file. This program maintenance effort was a ma*or burden of file processing systems. 0 >ther 2roblems &t was easy for data elements to be defined differently by different end users and applications. &ntegrity of the data was suspect because there was no control over their
use and maintenance by authorized end users.
Q.1.. What are the various types of database uses?
Ans. ?ithout user all o the above said components (data, hardware B software) are meaning less. Aser can collect the data, operate and handle the hardware. Also operator feeds the data and arranges the data in order by executing the software. Asers are of mainly of four types. These are) (a) 1aPve user 1aPve user has no .nowledge of database system and its any supporting software. These are used at the end form. These are li.e a layman, which have little bit .nowledge or computer system. These users are mainly used for collecting the data on the noteboo.s or on the pre, deigned forms. An automated teller machine (AT%s) user are in these categories. 1aPve user can wor. on any simple HA& base menu driven system. &nternet using non,computer based person are in this form. (b) #nd Aser or Data #ntry >perators Data entry operators are preliminary computer based users. The function of data entry operators are only to operate the computer (startO stop the computer) and feed or type the collected information (data) in menu driven application program and to execute it according to the analyst7 re uirement. These user are also called >n line users. These user communicate the database directly via an on line terminal or indirectly via a user interface. These users re uire certain amount of expertise in the computer programming language, but re uire complete .nowledge of computer operations. (c) Application programmer @e is also called simple programmer. The wor.ing of application programmer is to develop a new pro*ect i.e. program for a particular application or modify an existing program. Application programmer wor.s according to some instructions given by database administrator (D!A). Application programmer can handle all the programming language li.e 'ortran, (obol, dbase etc. (d) D!A (Data !ase Administrator) D!A is a ma*or user. D!A either a single person or a group of persons. D!A is only the custodian of the business firm or organization but not the owner of the organization. As ban. manager is the D!A of a ban., who ta.es care about the ban. money and not use it. >nly D!A can handle the information collected by end user and give the instructions to the application programmer for developing a new program or modifying an existing program. D!A is also called
an overall controller of the organization. &n computer department of a firm either system analysts or an #D2 (#lectronic Data 2rocessing) %anager wor.s as D!A. &n other words D!A is the overall controller of complete hardware and software. +#"2>1"&!&D&T&#" >' D!A As we .now D!A is the overall commander of a computer system, so it has number of duties, but some of his;her ma*or responsibilities are as follows) 3. D!A can control the data, hardware, and software and gives the instructions to the application programmer, end user and naive user. 6. D!A decides the information contents of the database. @e decides the suitable database file structure for arrangement of data. @e;"he uses the proper DDD techni ues. 8. D!A compiles the whole data in a particular order and se uence. 9. D!A decides where data can be stored i.e. ta.e decision about the storage structure. <. D!A decides which access strategy and techni ue should be used for accessing the data. E. D!A communicates with the user by appropriate meeting, D!A co,operates with user. 3. D!A also define and, apply authorized chec.s and validation procedures. 6. D!A also ta.es bac.up of the data on a bac.up storage device so that if data can be lost then it can be again recovered and compiled. D!A also recovers the damaged data. 8. D!A also changes the environment according to user or industry re uirement and monitor the performance. 9. D!A should be good decision,ma.er. The decision ta.en by D!A should be correct, accurate B efficient. <. D!A should have leadership uality. E. D!A liaise with the user in the business to ta.e confidence of the customer about availability of data.
Q11. Discuss the architecture of database mana$ement system.
Ans. D!%" A+(@&T#(TA+#
There are many different framewor. have been suggested for the D!%" over the last several year. The generalized architecture of a database system is called A1"&;"2A+( (American 1ational "tandards &nstitute;"tandards 2lanning and +e uirements (ommittee) model. &n 3LF6, a final report about database is submitted by A1"& (American 1ational "tandard &nstitute) and "2A+( ("tandard 2lanning And +e uirement (ommittee). According to this approach, three levels of a database system was suggested and they are) 0 #xternal view (&ndividual user view) 0 (onceptual -iew (Hlobal or community user view) 0 &nternal level (physical or storage view). 'or the system to be usable, it must retrieve data efficiently. This concern has led to the design of complex data structures for the representation of data in the database. "ince many database systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users7 interactions with the system. These three views or levels of the architecture are as shown in the diagram as follows)
>!J#(T&-#" >' T@+## D#-#D A+(@&T#(TA+#
The database views were suggested because of following reasons or ob*ectives of levels of a database) 3. %a.e the changes easy in database when some changes needed by environment. 6. The external view or user views do not depend upon any change made ii other view. 'or example changes in hardware, operating system or internal view should not change the external view. 8. The users of database should not worry about the physical implementation and internal wor.ing of database system. 9. The data should reside at same place and all the users can access it as per their re uirements. <. D!A can change the internal structure without effecting the user7s view. E. The database should be simple and changes can be easily made. F. &t is independent of all hardware and software. All the three levels are shown below
#xternal;-iew level The highest level of abstraction where only those parts of the entire database are included which are of concern to a user. Despite the use of simpler structures at the logical level, some complexity remains, because of the large size of the database. %any users of the database system will not be concerned with all this information. &nstead, such users need to access only a part of the database. "o that their interaction with the system is simplified, the view level of abstraction is defined. The system may provide many views for the same database.
Databases change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema. "chemas are changed infre uently, if at all. Database systems have several schemas, partitioned according to the levels of abstraction that we discussed. At the lowest level is the physical schema: at the intermediate level is the logical schema and at the highest level is a subschema. The features of this view are 0 The external or user view is at the highest level of database architecture. 0 @ere only one portion of database will be given to user. 0 >ne portion may have many views. 0 %any users and program can use the interested part of data base. 0 !y creating separate view of database, we can maintain security. 0 >nly limited access (read only, write only etc) can be provided in this view. 'or example) The head of account department is interested only in accounts but in library information, the library department is only interested in boo.s, staff and students etc. !ut all such data li.e student, boo.s, accounts, staff etc is present at one place and every department can use it as per need. (onceptual;Dogical level Database administrators, who must decide what information is to be .ept in the database, use this level of abstraction. >ne conceptual view represents the entire database. There is only one conceptual view per database. The description of data at this level is in a format independent of its physical representation. &t also includes features that specify the chec.s to retain data consistence and integrity. The features are) 0 The conceptual or logical view describes the structure of many users. 0 >nly D!A can be defined it. 0 &t is the global view seen by many users. 0 &t is represented at middle level out of three level architecture. 0 &t is defined by defining the name, types, length of each data item. The create table commands of >racle creates this view.
0 &t is independent of all hardware and software. &nternal;2hysical level The lowest level of abstraction describes how the data are stored in the database, and what relationships exist among those data. The entire database is thus described in terms of a small number of relatively simple structures, although implementation of the simple structures at the logical level may involve complex physical,level structures, the user of the logical level does not need to be aware of this complexity. The features are ) 0 &t describes the actual or physical storage of data. 0 &t stores the data on hardware so that can be stored in optimal time and accessed in optimal time. 0 &t is the third level in three level architecture. 0 &t stores the concepts li.e) 0 !,tree and @ashing techni ues for storage of data. 0 2rimary .eys, secondary .eys, pointers, se uences for data search. 0 Data compression techni ues. 0 &t is represented as '&D# #%2 Q &1D#R >1 #%21> '&#DD S T (#%21>) !$T# (9), #1A%# !$T#(6<))U %apping between views 0 The conceptual;internal mapping) o defines conceptual and internal view correspondence 0 specifies mapping from conceptual records to their stored counterparts o An external;conceptual mapping) 0 defines a particular external and conceptual view correspondence
0 A change to the storage structure definition means that the conceptual;internal mapping must be changed accordingly, so that the conceptual schema may remain invariant, achieving physical data independence. 0 A change to the conceptual definition means that the conceptual;external mapping must be changed accordingly, so that the external schema may remain invariant, achieving logical data independence.
Q. 12. Write a note on Database &an$ua$e And /nterfaces.
Ans. "ome main types of languages and facilities are provided by D!%". 3. 2rogramming Danguage 6. Data %anipulation Danguage 8. Data Definition Danguage 9. "chema Description Danguage <. "ub,"chema Description Danguage E. "ND ("tructured Nuery Danguage) 3. 2rogramming Danguage All the programming language li.e (obol, 'ortran, (, (CC, 2ascal etc. has syntax and semantics. These all have structured and logical structure, so these all commonly used to solve general and scientific problems. All the business,oriented problems can be solved by the three HD and 'ourth Ht. 6. D%D "ome language that gives instructions to the programming language and other languages is called data manipulation language (D%D). This language creates interface (lin.age) between user and application program. This is extension of the program of the language used to manipulate data in the database. D%D involves7 retrieval of data from the database, insertion of new data into the database and deletion or modification of the existing data. "ome data manipulation operations are also called NA#+$7 or0 NA#+$ >2#+AT&>1". A Nuery is a statement in D%D that re uest the retrieval of data from the database i.e. to search the data according to the user re uirement. The subset of the D%D used to operate the uery is .nown
as Nuery Danguage. D%D provides commands to select B retrieve data from the database. (ommands used in the D%D are to insert, to update B to delete the records. The commands have different syntax for different programming language. 'or example, 'ortran, (obol, ( etc. provide such type of facility with the help of database management system. The data manipulation function provided by D!%" can be invo.ed in a application program directly by procedural calls or by processors statement. This procedure can be done by the compiler. The D%D can become procedural language according to the user re uirement. &f the D%D is non,procedural than user will indicate only what is to be retrieved. &n both the cases the D!%" optimize the exact answer by using D%D. 8. DDD Database management system provides a facility .nown as Data Definition Danguage or data description language (DDD). DDD can be used to define conceptual schema (Hlobal) and also give some details about how to implement this schema in the physical devices used to store the data. The definition includes all the entity sets and their associated attributes as well as the relationship among the entities set. The definitions also have some constraints which are used in D%D. DDD also have some meta,data (it is data about the data in database). %eta,data have data dictionary, directory, system catalog to describe data about data. The dictionary contains the information about the data stored in the database and it is consulted by D!%" before any data manipulation operations. The D!%" maintain the information on the file structure and also used some access method to access the data efficiently. DDD is used for the help of D%D. ?e can say that there is another language , Data "ub Danguage (D"D) which is the combination of both D%D and DDD. D"D S D%D C DDD 9. "chema Description Danguage ("DD) or "chema &t is necessary to describe the organization of the data in a formal manner. The logical and physical database descriptions are used by D!%" software. The complete and overall description of data is referred to as schema. The schema and subschema words are brought into D!%" by (>DA"$D ((onference on data system language committee) and also by the (>DA"$D7s database tas. group. "chema is also referred to as conceptual model or global view (community view) of data. "uppose a complete description of collected data having all classes and student data, all employees (teaching B non,teaching) data and other concept of data related to the college is called "chema of the college. ?e can say that we relate whole college data logically, which is called schema.
< "ub "chema Description language The term schema is used to mean an overall chart of the data items, types and record type stored in a database. The term sub,schema refers to an application programmer7s view of data he uses. "ub,schema is the part of schema. %any different sub,schemas can be derived from one schema. An application programmer does not use whole data i.e. full schema, e.g. As in an organization, purchase,order for the maintenance department is the sub,schema of the whole schema description of the purchase department in the hole industry. Two or more than two application, programmers use the different sub,schemas. >ne person named A uses the sub, schema purchase,order whereas programmer ! uses the sub,schema supplier. Their operations and views are different according to their own sub,schema but both combined these two sub,schemas on the basis of a common .ey. E. "tructured Nuery Danguage ("ND)) "ND organized with the system +. "ystem + means it is relational language. "ND is also called "tructure Nuery Danguage. This language was developed in 3LF9 at &!%7s "an Jose +esearch (enter. The purpose of this language is to provide such non,procedural commands which are used for validation of the data and for searching the data. !y using this language we can do any uery about the data. "ND is sometimes named by "NAA+# language. This language was helpful for both DDD and D%D for the system +. "ome "ND are also called +elational languages and used in a commercial +D!%". "ome commonly used "ND are >+A(D#, &1H+#", "$!A"# etc. "ND resembles relational algebra and relational calculus in a relational system approach. D!%" &1T#+'A(#" Types of interfaces provided by the D!%" include) %enu,!ased interfaces for ?eb (lients or !rowsing 0 2resent users with list of options (menus) 0 Dead user through formulation of re uest 0 Nuery is composed of selection options from menu displayed by system. 'orms,!ased &nterfaces 0 Displays a form to each user 0 Aser can fill out form to insert new data or fill out only certain entries. 0 Designed and programmed for naPve users as interfaces to canned transactions. Hraphical Aser &nterfaces
0 Displays a schema to the user in diagram form. The user can specify a uery by manipulating the diagram. HA&s use both forms and menus. 1atural Danguage &nterfaces 0 Accept re uests in written #nglish or other languages and attempt to understand them. 0 &nterface has its own schema, and a dictionary of important words. Ases the schema and dictionary to interpret a natural language re uest. &nterfaces for 2arametric Asers 0 2arametric users have small set of operations they perform. 0 Analysts and programmers design and implement a special interface for each class of naPve users. 0 >ften a small set of commands included to minimize the number of .eystro.es re uired. (&.e. function .eys) &nterfaces for the D!A 0 "ystems contain privileged commands only for D!A staff. 0 &nclude commands for creating accounts, setting parameters, authorizing accounts, changing the schema, reorganizing the storage structures etc.
Q.13. Describe the 0(assification of Database Mana$ement Systems.
Ans. (ategories of D!%" D!%" (Database %anagement "ystem) &t is software to manage many databases. A D!%" is a software component or logical tool to handle the databases. All the ueries from user about the data stored in the database will be handled by D!%". There are many D!%"s available in mar.et li.e d!ase, 'ox!A"#, 'ox2ro, >racle, Anify, Access etc. +D!%" (+elational Data !ase %anagement "ystem) #ach database system uses a approach to store and maintain the data. 'or this purpose three data models were developed li.e @ierarchical model, 1etwor. %odel and +elational %odel. &n the hierarchical model the data were arranged in the form of trees, in networ. model the data was arranged in the form of pointers and networ. and in relational model the data was
arranged in the form of tables. The data stored in the form tables is easy to stored, maintain and understand. %any D!%" has been developed using approach of hierarchical and networ. models. Any D!%" that uses the relational data model for data storage and modeling &s called +D!%". &n +D!%" we can create relations among tables and can access the information from tables , while tables store stored in separately file and may or may not have identical structures. The +D!%" is based upon the rules given by Dr. (odd .nown as Dr. (odd7s +ules. @D!%" (@eterogeneous D!%") &n +D!%" we store the information related to the same .ind of data li.e student data, teacher data, employee data etc. &n @D!%" we store the data in the database which is entirely different. DD!%" (Distributed D!%") During 3L<=s B 3LE=s there was trend to use independent or decentralized system. There was a duplication of hardware and facilities. &n a centralized database system, the D!%" B data reside at a single place and all the control B location is limited to a single location, but the 2(s are distributed geographically. Distributed system is parallel computing using multiple independent computers communicating over a networ. to accomplish a common ob*ective or tas.. The type of hardware, programming languages, operating systems and other resources may vary drastically. &t is similar to computer clustering with the main difference being a wide geographic dispersion of the resources 'or example an organization may have an office in a building and have many sub, buildings that are connected using DA1. The current trend is towards distributed systems. This is a centralized system connected to intelligent remote sites. #ach remote site have own storage and processing capabilities , but in a centralized or networ. there is a single storage. >>D!%" (>b*ect >riented D!%") >b*ect,>riented Database %anagement "ystems (>>D!%"s) have been developed to support new .inds of applications for which semantic and content are represented more efficiently with the ob*ect model. Therefore, the >>D!%"s present the two main problems) 0 &mpedance mismatch) &t is basically due to two reasons. 'irstly, the no suitable abstractions of the operating systems, so when a client ob*ect has to invo.e a method that is offered by a server ob*ect, and both ob*ects are not into the same address space, it is necessary to use the mechanisms that are offered by the operating system, and these mechanisms do not became proper to the ob*ect oriented paradigm since they are oriented to communicate processes. &n order to solve this problem intermediate software is included (e.g. (>% or (>+!A).&n the second place, an impedance mismatch is also caused every time that the ob*ect,oriented applications need to use the operating system services. 0 &nteroperability problem between ob*ect models) Although different system elements use the ob*ect,oriented paradigm, an interoperability problem can exist between them. "o, an application implemented using the (CC language, with the (CC ob*ect model, can easily interact
with its ob*ects, but when it wants to use ob*ects that have been created with another programming language or another ob*ect,oriented database an interoperability problem appears. The programming DA1HAAH#" li.e (, '>+T+A1, 2A"(AD B '>+T+A1 use the 2>2 (2rocedure >riented Approach) to develop applications, but the current trend is towards >>2 (>b*ect >riented 2rogramming). The languages li.e (CC, Java, >racle, (V (( "harp). -isual !asic E use this approach. %any databases have been developed that follows this approach (>& approach) li.e >racle. "o the D!%" which follow >>2 approach is called >>D!%".
Q. 1 . 12p(ain the difference bet'een physica( and (o$ica( data independence.
Ans. >ne of the biggest advantages of database is data independence. &t means we can change the conceptual schema at one level without affecting the data at other level. &t means we can change the structure of a database without affecting the data re uired by users and program. This feature was not available in file oriented approach. There are two types of data independence and they are) 3. 2hysical data independence 6. Dogical data independence Data &ndependence The ability to modify schema definition in on level without affecting schema definition in the next higher level is called data independence. There are two levels of data independence) 3. 2hysical data independence is the ability to modify the physical schema without causing application programs to be rewritten. %odifications at the physical level are occasionally necessary to improve performance. &t means we change the physical storage;level without affecting the conceptual or external view of the data. The new changes are absorbed by mapping techni ues. 6. Dogical data independence in the ability to modify the logical schema without causing application program to be rewritten. %odifications at the logical level are necessary whenever the logical structure of the database is altered (for example, when money,mar.et accounts are added to ban.ing system). Dogical Data independence means if we add some new columns or remove some columns from table then the user view and programs should not changes. &t is called the logical independence. 'or example) consider two users A B !. !oth are selecting the empno and ename. &f user ! add a new column salary in his view;table then it will not effect the external
view user: user A, but internal view of database has been changed for both users A B !. 1ow user A can also print the salary. Aser A7s #xternal -iew
(-iew before adding a new column)
Aser !7s external view
(-iew after adding a new column salary) &t means if we change in view then program which use this view need not to be changed. Dogical data independence is more difficult to achieve than is physical data independence, since application programs are heavily dependent on the logical structure of the data that they access. Dogical data independence means we change the physical storage;level without effecting the conceptual or external view of the data. %apping techni ues absorbs the new changes.
Q. 1!. What is physica( data independence?
Ans. 2hysical data independence is the ability to modify the physical schema without causing application programs to be rewritten. %odifications at the physical level are occasionally necessary to improve performance. &t means we change the physical storage;level without affecting the conceptual or external view of the data. The new changes are absorbed by mapping techni ues.
Q. 1". What do you mean by data redundancy?
Ans. +edundancy is unnecessary duplication of data. 'or example if accounts department and registration department both .eep student name, number and address. +edundancy wastes space and duplicates effort in maintaining the data. +edundancy also leads to inconsistency. &nconsistent data is data which contradicts itself , e.g. two different addresses for a given student number. &nconsistency cannot occur if data is represented by a single entry (i.e. if there is no redundancy). (ontrolled redundancy "ome redundancy may be desirable (for efficiency). A D!%" should be aware of it, and ta.e care of propagating updates to all copies of a data item. This is an ob*ective, not yet currently supported.
Q. 1#. What do you man by database schema?
Ans. &t is necessary to describe the organization, of the data in a formal manner. The logical and physical database descriptions are used by D!%" software. The complete and overall description of data is referred to as schema, The schema and subschema words are brought into D!%" by (>DA"$D ((onference on data system language3 committee) and also by the (>DA"$D7s database tas. group. "chema is also referred to as conceptual model or global view (community view) of data. "uppose a complete description of collected data having all classes and student data9 all employees (teaching B non,teaching) data and other concept of data related to the college is called "chema of the college. ?e can say that we relate whole college data logically, which is called schema.
Q. 1%. 12p(ain the distinctions amon$ the terms primary 3ey4 candidate 3ey and super3ey. ,r What is the si$nificance of forei$n 3ey? ,r What are the various 3eys?
Ans. Ieys) As there are number of .eys can be defined, but some commonly and mainly used .eys are explained as below) 3. 2rimary Iey A .ey is a single attribute or combination of two or more, attributes of an entity that is used to identify one or more instances of the set. The attribute +oll V uni uely identifies an instance of the entity set "TAD#1T. &t tells about student Amrita having address 3=3, Iashmir Avenue and phone no. 336F9E and have paid fees 3<== on basis of +oll 1o. 3<. The 3< is uni ue value and it gives uni ue identification of students "o here +oll 1o is uni ue attribute and such a uni ue entity identifies called 2rimary Iey. 2rimary .ey cannot be duplicate. 'rom the definition of candidate .ey, it should be clear that each relation must have at least one candidate .ey even if it is the combination of all the attributes in the relation since all tuples in a relation are distinct. "ome relations may have more t one candidate .eys. As discussed earlier, the primary .ey of a relation is an arbitrarily but permanently selected candidate .ey. The primary .ey is important since it is the sole identifier for the tuples in a relation. Any tuple in a database may be identified by specifying relation name, primary .ey and its value. Also for a tuple to exist in a relation, it must be identifiable and therefore it must have a primary .ey. The relational data model therefore imposes the following two integrity constraints) (a) 1o component of a primary .ey value can be null: (b) Attempts to change the value of a primary .ey must be carefully controlled. The first constraint is necessary because if we want to store information about some entity, then we must be able to identify it, otherwise difficulties are li.ely to arise. 'or example, if a relation (DA"" ("TA1>, D#(TA+#+, (1>) has ("TA1>, D#(TA+#+) as the primary .ey then allowing tuples li.e 8368 1ADD 1ADD "%&T@ (28=6 (28=6
is going to lead to ambiguity since the two tuples above may or may not be identical and the integrity of the database may be compromised. Anfortunately most commercial database systems do not support the concept of primary .ey and it would be possible to have a database state when integrity of the database is violated. The second constraint above deals with changing of primary .ey values. "ince the primary .ey is the tuple identifier, changing it needs very careful controls. (odd has suggested three possible approaches)
%ethod 3 >nly a select group of users be authorised to change primary .ey values. %ethod 6 Apdates on primary .ey values be banned. &f it was necessary to change a primary .ey, the tuple would first be deleted and then a new tuple with new primary .ey value but same other values would be inserted. >f course, this does re uire that the old values of attributes be remembered and be reinserted in the database. %ethod 8 A different command for updating primary .eys be made available. %a.ing a distinction in altering the primary .ey and another attribute of a relation would remind users that care needs to be ta.en in updating primary .eys. 6. "econdary Iey The .e3 which is not giving the uni ue identification and have duplicate infonnaEo is called secondary .ey, e g in a "TAD#1T entity if +oll 1umber is the primary .ey, then 1ame of the student, address of the student, 2hone number of the student and the fees paid by the student all are secondary .eys. A secondary .ey is an attribute or combination of attributes that not be primary .ey and have duplicate data. &n otherworlds secondary .ey is used after the identification of the primary .ey. Also we can identify the data from the combination of the secondary .eys. 8. "uper Iey &f we add additional attributes to a primary .ey, the resulting combination would still uni uely identify an instance of the entity set "uch .eys are called super .eys A primary .ey is therefore a minimum super .ey 'or example, if D>! (date of birth field or attribute) is the primary .ey, then by adding some additional information about the day of the month .ey in the D>! field, this field or attribute becomes more powerful and useful "uch type of .ey is called super .ey "uper .ey are less used in a small database file. 1ow these days it has less importance, but due to its feature, this .ey gives the complete description of the database. 9. (andidate Iey There may be two or more attributes or combination of attributes that uni uely identify an instance of an entity set These attributes or combination of attributes are called candidate .eys. (andidate .ey also gives uni ue identification. (andidate .ey comes with primary .ey. A candidate is a combination of two or more attributes e.g. if +oll 1o. and student name are two different attributes then we combine these two attribute and form a single attribute +oll 1o. B 1ame, then this combination is the candidate .ey and it is uni ue and gives uni ue identification about a particular roll no. and about particular name. <. Alternative Iey
A candidate .ey which is not the primary .ey is called alternative .ey, e.g. if +oll 1o. and 1ame combination is the candidate .ey, then if +oll 1o, is the primary .ey, other .ey in the candidate .ey is 1ame. 1ame attribute wor. as the alternative .ey. E 'oreign Iey "uppose there are some relations as) "2 ("V, 2V, NT$), relation " ("V, " 1ame, status, city) and relation 2 (2V, 21ame, (olor, ?eight, (ity). ?e .now entity "2 is defined as the relationship of the relation " and the relation 2. These two relations has sand 2V as the 2rimary Ieys in relation " and 2 respectively, but in the relation "2 we can ta.e either V as the primary .ey or 2V as the primary .ey. "uppose if we ta.e 2V as the primary .ey, then other primary .ey "V which is actually the primary .ey, but do not wor. as primary .ey in the relation "' is called the 'oreign Iey. &f "V is the primary .ey then 2V is the 'oreign Iey. "imilarly in the relation A""&H1%#1T, attribute #mp V, 2rod V, Job V are given and if "V and 2V are the primary .eys, then the Job V .ey is the 'oreign Ieys.
Q. 1*. What are the ma+or functions of a database administrator?
Ans. +#"2>1"&!&D&T&#" >' D!A As we .now D!A is the overall commander of a computer system, so it has number of duties, but some of his;her ma*or responsibilities are as follows) 3. D!A can control the data, hardware, and software and gives the instructions to the application programmer, end user and naive user. 6. D!A decides the information contents of the database. @e decides the suitable database file structure for arrangement of data. @e;"he uses the proper DDD techni ues. 8. D!A compiles the whole data in a particular order and se uence. 9. D!A decides where data can be stored i.e. ta.e decision about the storage structure. <. D!A decides which access strategy and techni ue should be used for accessing the data. E. D!A communicates with the user by appropriate meeting. D!A co,operates with user. F. D!A also define and apply authorized chec.s and validation procedures. K. D!A also ta.es bac.up of the data on a bac.up storage device so that if data can then lost then it can be again recovered and compiled. D!A also recovers the damaged data.
L. D!A also changes the environment according to user or industry re uirement and monitor the performance. 3=. D!A should be good decision,ma.er. The decision ta.en by D!A should be correct, accurate B efficient. 33. D!A should have leadership uality. 36. D!A liaise with the user in the business to ta.e confidence of the customer about the availability of data.
Q. 2.. What do you mean by re(ationships? 12p(ain different types of re(ationships.
Ans. +elationships) >ne table (relation) may be lin.ed with another in what is .nown as a relationship. +elationships may be built into the database structure to facilitate the operation of relational *oins at runtime. 3. A relationship is between two tables in what is .nown as a one,to,many or parent, child or master,detail relationship where an occurrence on the /one7 or /parent7 or /master7 table may have any number of associated occurrences on the /many7 or /child7 or /detail7 table. To achieve this, the child table must contain fields which lin. bac. the primary .ey on the parent table. These fields on the child table are .nown as a foreign .ey, and the parent table is referred to as the foreign table (from the viewpoint of the child). 6. &t is possible for a record on the parent table to exist without corresponding records on the child table, but it should not be possible for an entry on the child table to exist without a corresponding entry on the parent table. 8. A child record without a corresponding parent record is .nown as an orphan. 9. &t is possible for a table to be related to itself. 'or this to be possible it needs a foreign .ey which points bac. to the primary .ey. 1ote that these two .eys cannot be comprised of exactly the same fields otherwise the record could only ever point to itself. <. A table may be the sub*ect of any number of relationships, and it may be the parent in some and the child in others. 3. "ome database engines allow a parent table to be lin.ed via a candidate .ey, but if this were changed it could result in the lin. to the child table being bro.en. 6. "ome database engines allow relationships to be managed by rules .nown as referential integrity or foreign .ey restraints. These will prevent entries onchild tables from being
created if the foreign .ey does not exist on the parent table, or will deal with entries on child tables when the entry on the parent table is updated or deleted.
+elational Joins The *oin operator is used to combine data from two or more relations (tables) in order to satisfy a particular uery. Two relations may be *oined when they share at least one common attribute. The *oin is implemented by considering each row in an instance of each relation. A row in relation +3 is *oined to a row in relation +6 when the value of the common attribute(s) is e ual in the two relations. The *oin of two relations is often called a binary *oin. The *oin of two relations creates a new relation. The notation /+3 x +67 indicates the *oin of relations +3 and +6. 'or example, consider the following)
1ote that the instances of relation +& and +6 contain the same data values for attribute !. Data normalisation is concerned with decomposing a relation (e.g. +(A,!,(,D,#) into smaller relations (e.g. +3 and +6). The data values for attribute ! in this context will be identical in +3 and +6. The instances of +3 and +6 are pro*ections of the instances of +(A,!,(,D,#) onto the attributes (A,!,() and (!,D,#) respectively. A pro*ection will not eliminate data values duplicate rows are removed, but this will not remove a data value from any attribute. The *oin of relations +& and +6 is possible because ! is a common attribute. The result of the *oin is)
The row (6 9 < F 9) was formed by *oining the row (6 9 <) from relation +3 to the row (9 F 9) from relation +6. The two rows were *oined since each contained the same value for the common attribute !. The row (6 9 <) was not *oined to the row (E 6 8) since the values of the common attribute (9 and E) are not the same. The relations *oined in7 the preceding example shared exactly one common attribute. @owever, relations may share multiple common attributes. All of these common attributes must be used in creating a *oin. 'or example, the instances of relations +3 and +6 in the following example are *oined using the common attributes ! and () !efore the *oin)
After the *oin)
The row (E 3 9 L) was formed by *oining the row (E 3 9) from relation +3 to the row (3 9 L) from relation +6. The *oin was created since the common set of attributes (! and () contained identical values (3 and 9). The row (E 3 9) from +3 was not *oined to the row (3 6 3) from +6 since the common attributes did not share identical values , (3 9) in +3 and (3 6) in +6. The *oin operation provides a method for reconstructing a relation that was decomposed into two relations during the normalisation process. The *oin of two rows, however, can create a new row that was not a member of the original relation. Thus invalid information can be created during the *oin process.
1ow suppose that a list of courses with their corresponding room numbers is re uired. +elations +3 and +9 contain the necessary information and can be *oined using the attribute @>A+. The result of this *oin is)
This *oin creates the following invalid information (denoted by the coloured rows)) 0 "mith, Jones, and !rown ta.e the same class at the same time from two different instructors in two different rooms. 0 Jen.ins (the %aths teacher) teaches #nglish. 0 Holdman (the #nglish teacher) teaches %aths. 0 !oth instructors teach different courses at the same time. Another possibility for a *oin is +8 and +9 (*oined on &1"T+A(T>+). The result would be)
This *oin creates the following invalid information) 0 Jen.ins teaches %ath & and Algebra simultaneously at both K)== and L)==. A correct se uence is to *oin +3 and +8 (using (>A+"#) and then *oin the resulting relation with +9 (using both &1"T+A(T>+ and @>A+). The result would be)
#xtracting the (>A+"# and +>>% attributes (and eliminating the duplicate row produced for the #nglish course) would yield the desired result)
The correct result is obtained since the se uence (+3 x r8) x +9 satisfies the lossless (gainlessW) *oin property A relational database is in 9th normal form when the lossless *oin property can be used to answer unanticipated ueries. @owever, the choice of *oins must be evaluated carefully. %any different se uences of *oins will recreate an instance of a relation. "ome se uences are more desirable since they result in the creation of less invalid data during the *oin operation. "uppose that a relation is decomposed using functional dependencies and multi, valued dependencies. Then at least one se uence of *oins on the resulting relations exists that recreates the original instance with no invalid data created during any of the *oin operations. 'or example, suppose that a list of grades by room number is desired. This uestion, which was probably not anticipated during database design, can be answered without creating invalid data by either of the following two *oin se uences)
The re uired information is contained with relations +6 and +9, but these relations cannot be *oined directly. &n this case the solution re uires *oining all 9 relations. The database may re uire a /lossless *oin7 relation, which is constructed to assure that any ad hoc in uiry7 can be answered with relational operators. This relation may contain attributes that are not logically related to each other. This occurs because the relation must serve as a bridge between the other relations in the database. 'or example, the lossless *oin relation will contain all attributes that appear only on the left side of a functional dependency. >ther attributes may also be re uired, however, in developing the lossless *oin relation.
(onsider relational schema + (A, !, (, D), A
! and (
D. +elations
and are in 9th normal form. A third relation however, is re uired to satisfy the lossless *oin property. This relation can be used to *oin attributes ! and D. This is accomplished by *oining relations +3 and +8 and then *oining the result to relation +6. 1o invalid data is created during these *oins. The relation relation for this database design. is the lossless *oin
A relation is usually developed by combining attributes about a particular sub*ect or entity. The lossless *oin relation, however, is developed to represent a relationship among various relations. The lossless *oin relation may be difficult to populate initially and difficult to maintain , a result of including attributes that are not logically associated with each other. The attributes within a lossless *oin relation often contain multi,valued dependencies. (onsideration of 9th normal form is important in this situation. The lossless *oin relation can sometimes be decomposed into smaller relations by eliminating the multi,valued dependencies. These smaller relations are easier to populate and maintain.
Q. 21. What is an 15)dia$ram? 0onstruct an 15 dia$ram for a hospita( 'ith a set of patients and a set of doctors. Associate 'ith each patient a (o$ o1 the various tests and e2aminations conducted. ,r Discuss in detai( the 15 dia$ram. ,r What is one to many re(ationship? 6ive e2amp(es. ,r Dra' an 15 dia$ram for a (ibrary mana$ement system4 ma3e suitab(e assumptions. Describe various symbo(s used in 15. dia$ram. ,r 0onstruct an 15 dia$ram for a university re$istrar7s office. 8he office maintains data about each c(ass4 inc(udin$ the instructor4 the enro((ment and the time and p(ace of the c(ass meetin$s. -or each student c(ass pair4 a $rade is recorded a(so desi$n a re(ationa( database for the said /.5. dia$ram.
Ans. #,+ model grew out of the exercise of using commercially available D!%" to model application database. #arlier D!%" were based on hierarchical and networ. approach. #,+ is a generalization of these models. Although it has some means of describing the physical database model, it is basically useful in the design of logical database model. This analysis is then used to organize data as a relation, normalizing relations and finally obtaining a relational database model. The entity,relationship model for data uses three features to describe data. These are) 3. #ntities, which specify distinct real,world items in an application. 6. +elationships, which connect entities and represent meaningful dependencies between them. 8. Attributes, which specify properties of entities and relationships. ?e illustrate these terms with an example. A vendor supplying items to a company, for example, is an entity. The item he supplies is another entity. A vendor supplying items are related in the sense that a vendor supplies an item. The act of supplying0 defines a relationship between a vendor and an item. An entity set is a collection of similar entities. ?e can thus define a vendor set and an item set. #ach member of an entity set is described by some attributes. 'or example, a vendor may be described by the attributes) (vendor code, vendor name, address) An item may be described by the attributes) (item code, item name) +elationship also can be characterized by a number of attributes. ?e can thin. of the relationship as supply between vendor and item entities) The relationship supply can be described by the attributes) (order no. date of supply)
+elationship between #ntity "ets The relationship between entity sets may be many,to,many (%) 1), one,to,many (3) %), many,to,one (%) 3) or one,to,one (3)3). The 3)3 relationship between entity sets #3 and #6 indicates that for each entity in either set there is at most one entity in the second set that is associated with it. The 3) % relationship from entity set #3 to #6 indicates that for an occurrence of the entity from the set #3, there could be zero, one or more entities from the entity set #6 associated with it. #ach entity in #6 is associated with at most one entity in the entity set #3. &n the %) 1 relationship between entity sets #3 and #6, there is no restriction to the number of entities in one set associated with an entity in the other set. The database structure, employing the #,+ model is usually shown pictorially using entity,relationship (#,+) diagram. To illustrate these different types of relationships consider the following entity sets) D#2A+T%#1T, %A1AH#+, #%2D>$##, and 2+>J#(T
The relationship between a D#2A+T%#1T and a %A1AH#+ is usually one,to, one: there is only one manager per department and a manager manages only one department. This relationship between entities is shown in 'igure. #ach entity is represented by a rectangle and the relationship between them is indicated by a direct line. The relationship for %A1AH#+ to D#2A+T%#1T and from D#2A+T%#1T to %A1AH#+ is both 3)3. 1ote that a one,to,one relationship between two entity sets does not imply that for an occurrence of an entity from one set at any time there must be an occurrence of an entity in the other set. &n the case of an organization, there could be times when a department is without a manager or when an employee who is classified as a manager may be without a department to manage. 'igure shows some instances of one,to,one relationships between the entities D#2A+T%#1T and %A1AH#+.
A one,to,many relationship exists from the entity %A1AH#+ to the entity #%2D>$## because there are several employees reporting to the manager. As we *ust pointed out, there could be an occurrence of the entity type %A1AH#+ having zero occurrences of the entity type #%2D>$## reporting to him or her. A reverse relationship, from #%2D>$## to %A1AH#+, would be many to one, since many employees may be supervised by a single manager. @owever, given an instance of the entity set #%2D>$##, there could be only one instance of the entity set %A1AH#+ to whom that employee reports (assuming that no employee reports to more than one manager). The relationship between entities is illustrated in 'igures shows some instances of this relationship. 'igure) 3)% +elationship
'igure) &nstances of 3) % +elationship The relationship between the entity #%2D>$## and the entity 2+>J#(T can be derived as follows) #ach employee could be involved in a number of different pro*ects, and a number of employees could be wor.ing on a given pro*ect. This relationship between #%2D>$## and 2+>J#(T is many,to,many. &t is illustrated in 'igures shows some instances of such a relationship. 'igure) % ) 1 +elationship
'igure) &nstances of %)1 +elationship &n the entity,relationship (#,+) diagram, entities are represented by rectangles, relationships by a diamond,shaped box and attributes by ellipses or ovals. The following #,+ diagram for vendor, item and their relationship is illustrated in 'igure (a).
'igure (a)) #,+ diagram for vendor: item and their +elationship
+epresentation of #ntity "ets in the form of +elations
The entity relationship diagrams are useful in representing the relationship among entities they show the logical model of the database. #,+ diagrams allow us to have an overview of the important entities for developing an information system and other relationship. @aving obtained #,+ diagrams, the next step is to replace each entity set and relationship set by a table or a relation. #ach table has a name. The name used is the entity name. #ach table has a number of rows and columns. #ach row contains a number of the entity set. #ach column corresponds to an attribute. Thus in the #,+ diagram, the vendor entity is replaced by table below. Table) Table 'or the #ntity -endor
The above table is also .nown as a relation. -endor is the relation name. #ach row of a relation is called a tuple. The titles used for the columns of a relation are .nown as relation attributes. #ach tuple in the above example describes one vendor. #ach element of a tuple gives specific property of that vendor. #ach property is identified by the title used for an Attribute
column. &n a relation the rows may be in any order. The columns may also be depicted in any order. 1o two rows can be identical. "ince it is inconvenient to show the whole table corresponding to a relation, a more concise notation is used to depict a relation. &t consists of the relation name and its attributes. The identifier of the relation is shown in bold face. A specified value of a relation identifier uni uely identifies the row of a relation. &f a relationship is %) 1, then the identifier of the relationship entity is a composite identifier, which includes the identifiers of the entity sets, which are related. >n the other hand, if the relationship is 3)1, then the identifier of the relationship entity is the identifier of one of the entity sets in the relationship.. 'or example, the relations and identifiers corresponding to the #,+ diagram of 'igure are as shown)
'igure) #,+ Diagram for Teacher, "tudent and their relationship Teacher (Teacher,id, name, department, address) Teaches (Teacher,id, "tudent,id) "tudent ("tudent,id, name, department, address) >ne may as. why an entity set is being represented as a relation. The main reasons are case of storing relations as flat files in a computer and, more importantly, the existence of a sound theory on relations, which ensures good database design. The raw relations obtained as a first step in the above examples are transformed into normal relations. The rules for transformations called normalization are based on sound theoretical principles and ensure that
the final normalized relations obtained reduce duplication of data, ensure that no mista.e occur when data are added or, deleted and simplify retrieval of re uired data.
Q. 22. Discuss re(ationa( approach of database mana$ement system? 12p(ain 'ith the he(p of suitab(e re(ationa( operations to demonstrate insert4 de(ete and update functions. ,r What is re(ationa( mode( compare and contrast it 'ith net'or3 and hierarchica( mode(.
Ans. Database models are collection of conceptual tools for describing data semantics and data constraints. D!%" has number of ways to represent the data, !ut some important and commonly used model are of four types, among which three are mainly used. These are) &. +elational %odel or +elational Approach &&. @ierarchical %odel or @ierarchical Approach &&&. 1etwor. %odel or 1etwor. Approach &. +elational Data %odel +elational Data %odel has been developed from the research in deep and by testing and by trying through many stages. This model has advantages that it is simple to implement and easy to understand. ?e can express ueries by using uery language in this model. &n this model relation is only constructed by setting the association among the attributes of an entity as well the relationship among different entities. >ne of the main reasons for introducing this model was to increase the productivity of the application programmers by eliminating the need to change application programmer, when a change is mode to the database. &n this user need not .now the exact physical structure. Data structure used in the data model represented by both entities and relationship between them. ?e can explain relation view of data on relational approach on the basis of following example. "uppose there are three tables in which data is organized. These tables are "upplier tables or " table or " relation, 2art table or 2 table or 2 relation, "hipment table of "2 table or "2 relation. The " table further has some fields or attributes. These are supplier number ("V), supplier name, status of the supplier and the city in which the supplier resides. "imilarly 2 table
has field part number (2V), part name, part color, weight of the part and location where the part is stored. Also "2 table contains field supplier number ("V), part number (2V) and the uantity which supplier can ship. #ach supplier s uni ue supplier number "V and similarly each part has uni ue part umber 2V. These three tables are called relational table. " table is also called ", relation because it gives the relationship between different attributes. These attributes are field name and in the form of column. +ows of such table are called tuples. 2ool of values in a particular w and attributes called domain. &n other words domain is a pool of values from which actual value appearing in a given column are drawn. 'or example, in " table , "V, "name, ", status are the attributes and s3, s6, s8 are domains. A relational table or relationship can be defined as) Definition) A relation represented by table having n column, defined on domain Dl, D6, .... Dn is a subset of cartesian product D& x D6 xXXx Dn. Another definition is ) &t is collection of Dl, D6, D8,XX.Dn then + is relation on these n sets if these n sets are ordered in n tuples such that each value of attribute belong to Dl, D6,XX.Dn. These three relations are represented by diagram) " table (#ntity) or " +elation)
2 table (#ntity) or 2 +elation)
As in the ." table insertion, deletion and modifications can be done easily. &&. @ierarchical %odel &t is a tree structure. &t has one root and many branches, we call it parent child relationship. &n this a single file has relation with many files and similarly we can say that it is the
arrangement of individual data with group data. &n an organization chart manager is the parent root and employees wor.ing under the manager are their children The representation of this model is expressed by lin.ing different tables. "uch type of representation is better for a lin.age have many relationships with one. "ome times it will create ambiguity in designing and defining the association and relationship between "2 table (#ntity) or "2 +elation)
&n hierarchical approach, insertion can be done if a child has a parent and insertion on the child side is easy. Deletion and insertion is easy, but you can7t delete a parent) parent has one or more child. &n the parent child relationship updation in parent and child both are difficult. &&&. 1etwor. Approach &t is a complex approach of D!%". &n this we lin. all the records by using a chain or pointer. &t has many to many relationships. 1etwor. approach is created when there are more than one relations in the database system. 1etwor. approach starts from on point and after connecting similar type of data it returns bac. to the same record. 1etwor. approach is more symmetric than the hierarchical structure. &n networ. model insertion at any point is very complex. ?e can insert only by creating a new record having lin.age with other record. "imilarly deletion is also complex if we delete any record than chain disconnect and whole structure vanish. Apdation is also complex because we cannot change name or any data record because it connected with each other. Difference between +elational, @ierarchical and 1etwor. Approaches) (A) +elational Approach) +elational Approach (+A) has relationship between different entities and attribute in a particular entity. +A is in tabular form. +A
has one to one relationships. +,A has table in asymmetric form. &nsertion, deletion, updation in + table is very easy. Danguages used in +A are "ND, &ngress, >racle, "ybase. +A is simple in nature. +elational approach creates relationship between different entities and different attributes in the same entity. &t is the best approach to represent the data than the other models. (!) @ierarchical Approach) @ierarchical Approach (@A) creates a lin.age between two or more entities. @A has parent child relationship. @A has one to many relationships. @A relationship is in symmetric form by defining parent and their child. &nsertion, deletion, updation is little difficult than the +A. @A has &%" language, which is theoretical. &t is (omplex in nature. (() 1etwor. Approach) 1etwor. Approach (1A) has chain among many entities. 1A has chaining techni ue or pointer techni ue. 1A has many to many relationships. 1A relationship is full or completely symmetric form because it has one chain symmetry. &nsertion, deletion, updation is very difficult. 1A has D!TH (Database Tas. Hroup) set hiving different classes B members. %ore complex than +A B @A.
Q. 23. What is the usa$e of unified mode((in$ (an$ua$e 9:M&;?
Ans. A%D is a graphical language for visualizing, specifying, constructing and documenting an ob*ect oriented software,intensive system7s artifacts.
Q. 2 . What are $raphica( user interfaces?
Ans. A $raphica( user interface 96:/; is sometime pronounced <$ooey= is a method of interactin$ 'ith a computer throu$h a metaphor of direct manipu(ation of $raphica( ima$es and 'id$ets in addition to te2t. 6:/ disp(ay visua( e(ements such as icon4 Windo's and other $ad$ets
Q. 2!. Define the term dan$(in$ pointer.
Ans. The pointers that points to nothing is called dangling pointer.
Q. 2". Write a short note on Mappin$.
Ans. %appings 0 The conceptual;internal mapping) defines conceptual and internal view correspondence specifies mapping from conceptual records to their stored counterparts 0 An external;conceptual mapping) defines a particular external and conceptual view correspondence 0 A change to the storage structure definition means that the conceptual;internal mapping must be changed accordingly, so that the conceptual schema may remain invariant, achieving physical data independence. 0 A change to the conceptual definition means that the conceptual;external mapping must be changed accordingly, so that the external schema may remain invariant, achieving logical data independence.
Q. 2#. Distin$uish bet'een 5DBMS and DBMS.
Ans.
Q. 1. What is re(ationa( a($ebra?
Ans. +elational Algebra and +elational (alculus are two approaches to specifying manipulations on relational databases. The distinction between them is somewhat analogous to that between procedural and declarative programming. Algebra is e uivalent to +elational (alculus, in that every expression in one has an e uivalent expression in the other. Thus relational completeness of a database language can also be established by showing that it can define any relation expressible in +elational Algebra. +elational Algebra comprises a set of basic operations. An operation is the application of an operator to one or more source (or input) relations to produce a new relation as a result. %ore abstractly, we can thin. of such an operation as a function that maps arguments from specified domains to a result in a specified range. &n this case, the domain and range happen to be the same, i.e. relations.
Q. 2. Define re(ationa( a($ebra. 12p(ain the various traditiona( set operations and re(ationa( operations of it. ,r Discuss the basic operations that can performed usin$ re(ationa( a($ebra and SQ&.
Ans. +elational Algebra comprises a set of basic operations. An operation is the application of an operator to one or more source (or input) relations to produce a new relation as a result. This is illustrated in 'igure K.3 below. %ore abstractly, we can thin. of such an operation as a function that maps arguments from specified domains to a result in a specified range. &n this case, the domain and range happen to be the same, i.e. relations. +elational Algebra is a procedural language. &t specifies the operations to be reformed on. existing relations in derived result relations. Therefore, it defines the complete schema for each of the result relations. The relational algebraic operations can be divided into basic set,oriented operations and relational,oriented operations. The former are the traditional set operations, the latter, those for performing *oins. selection, pro*ection, and division. +elational Algebra is a collection of operations to manipulate relations. #ach operation ta.es one or more relations as its operands and produce another relation as its results. "ome mainly used operations are *oin, selection and pro*ection. +elational algebra is a procedural language. &t specifies the operations to be performed on existing relations to derive relations. The relational algebra operations can be divided into basic set oriented operations and relational oriented operations. The former are the traditional set operations, the latter are *oins, selections, pro*ection and division. !asic >perations !asic operations are the traditional set operations) union, difference, intersection and cartesian product. Three of these four basic operations , union, intersection, and differenceG
re uire that operand relations be union compatible. Two relations are union compatible if they have the same parity and one,to,one correspondence of the attributes with the corresponding attributes defined over the same domain. The cartesian product can be defined on any two relations. Two relations 2 and N are said to be union compatible if both 2 and N are of the same degree n and the domain of the corresponding n attributes are identical, i.e. if 2S 2 Q23,XX. 2nU and N S QN3, ....... NnU then Dom (2i) S Dom (Ni) for i S (3,6,n) ?here Dom (2i) represents the domain of the attribute 2i. "ome basic operations used in +elational Algebra are) Traditional "et >perations) 'urther traditional set operations are subdivided as) (a) A1&>1 (b) &1T#+"#(T&>1 (c) D&''#+#1(# (d) (A+T#"&A1 2+>DA(T +elational "et >perators) "imilarly further +elational "et >perations subdivided as) (a) 2+>J#(T&>1 (b) "#D#(T&>1 (c) J>&1 (d) D&-&"&>1 Traditional "et >perations (i) A1&>1 (A)) The union of two relations A and ! is done by A1&>1 command as) A A1&>1 ! &t is the set of all types belonging to either A or ! or both. Det us consider set A and ! as) Table A)
Table !)
&f 2 and N are two sets, then + is the resultant set by union operations. The + is resented by)
'or example, let us consider A be the set of suppliers tuples for suppliers in Dondon and ! is the set of supplier who supply part 23. Then A A1&>1 ! is the set of supplier samples for suppliers who are either located in Dondon city or supply part 23 (or both). &t is denoted by the symbol A (union). ?e can combine it as) A A ! (in mathematical form.)
(ii) &ntersection
The intersection operation selects the common tuples from the two . The intersection of two relations and ! is defined as)
relations. &t is denoted by the symbol
or
A &1T#+"#(T ! 'or example, if A and ! are two sets, then intersection between these two are in +3.
(iii) Difference (G)) The difference operation removes common tuples from t first relation. The difference between two relations A and ! be defined as) A %&1A" ! &t is the set of all tuples belonging to set A but not belonging to !. &t is denoted by (,). ?e can represent it as A , !. 'or example, from the above said two A and ! sets, the difference between A and ! be represented as)
(iv) (artesian 2roduct) &t is denoted by /R7 or /x7 ((ross). The cartesian product of two relations A and ! is defined as) A T&%#" ! >r Ax!
The extended cartesian or simply the cartesian product of two relations is the concatenation of tuples belonging to the two relations. A new resultant relation schema is created consisting of all possible combinations of the tuples is represented as) +S2xN 'or example, let us consider A be the set of all supplier number and ! is set of all part number. Then A T&%#" ! is the set of all possible supplier number ; part number pairs as)
+elational "et >perations) The basic set operations, which provide a very limited data manipulating facility have been supplemented by the definition of the following operations) (i) 2ro*ection (x)) The pro*ection of a relation is defined as the pro*ection of all its tuples over some set of attributes i.e. it yields a vertical subset of a relation. The pro*ection operation is used to either reduce the number of attributes in the resultant or the reorder attributes. 'or example, if 2 is the table, then we can pro*ect on the field name as and get resultant pro*ected table)
(ii) "election ) "election is the selection of some tuples based on some condition. &t is horizontal subset of relation. &t is denoted by s. &t reduces the number of tuples. from a relation, e.g. if 2 is the relation then + is the resultant table after selection on 2. (ondition is to select all tuples having roll no. Y 3=<. + S s 2 (+1 Y 3=<)
(iii) Join The *oin operator allows the combining of two relations to form a single new relations. These are of three types) (i) Theta Join (ii) 1atural Join (iii) # ui Join Theta Join is the *oining of two tables on the basis of a condition. 1atural Join is the *oining of two tables without any condition and e uality. # ui Join is the *oining of two tables of both having common e ual .ey field. 'or example, if " and 2 are two tables and these are *oined on (&T$ field as ".(&T$ and 2.(&T$. (iv) Division (C)) The division operator divides a dividend relation A of degree m C n by a divider relation ! of degree n. &t will produce a result relation of degree m. "uppose A is relational table of supplier having supplier number and ! is the relational tables of different types of parts, then A D&-&D# !$ ! gives the resultant table +.
Q. 3. What are sin$(e)va(ued and mu(tiva(ued attributes?
Ans. %ulti,-alued Dependencies and 'ourth 1ormal 'orm &t was proposed as a sample form of 81' but it was found to be stricter than 81' because every relation in !(1' is also in 81': however a relation in 81' is not necessarily in !(1'. Definition ) A normalized relation scheme +Y", 'Z is in !(1' if for every nontrivial 'D in ' of the form R[A where R & " and A&", R is a super .ey of +. !(1' is a special case in 81'. 'eatures) 3. Iey attributes (candidate .eys) are composite (there is no single .ey which identify record). 6. %ore than one candidate .eys are there. 8. &n each candidate .ey at least one attribute is overlapping.
Any table if follow above mentioned three features of !(1', then we will say that this table is in !(1'.
<. (ourse &D is such an attribute which is overlapping. @ence relation shown is in 81' and also in !(1'. A relation "chema +(", ') is in !(1' ("Sset of Attributes, 'SAll of functional dependency), if a set of attributes R which is subset of " and an attribute $ which belongs to Ds.
>ne of the following two conditions hold. (i) #ither $ belongs to DR ($, R) is a Trivial Attribute. (ii) >r R is a "uper .ey. ?hereas, Trivial dependency) &f the right hand side is a subset of the left hand side is .nown as trivial dependency.
"uper Iey) Adding primary .ey with any attribute is .nown as super .ey.
A relation is in !(1' if every determinant is a candidate .ey.
G &t is in &1' by definition. G &t is in 61' since any non .ey attributes are dependent on the entire .ey. G &t is in 81' because it has no transitive dependencies. 3. &t is not in !(1' because it has a determinant '1ame, that is not a candidate .ey.
"TA,AD- Iey) ("&D,'1ame) AD-G"A!J Iey) ('1ame) +elations in !(1' 1ow we can say that a relation is in !(1' if and only if every nontrivial left, irreducible 'D has a candidate .ey as its determinant. >r less formally, A relation is in !(1' if and only if the only determinant are candidate .eys.
Q. . Define the term data manipu(ation (an$ua$e.
Ans. D%D(Data %anipulation Danguage)) "ome language that gives instructions to the programming language and other languages is called data manipulation language (D%D). DDD (Data Definition Danguage)) Database management systems provide a facility .nown as Data Definition Danguage or data description language (DDD). DDD can be used to define conceptual schema (Hlobal) and also give some details about how to implement this schema in the physical devices used to store the data.
Q. !. What is 5DBMS?
Ans. +elational Data !ase %anagement "ystems (+D!%") are database management systems that maintain data records and indices in tables. +elationships may be created and maintained across and among the data and tables. &n recent years, database management systems (D!%") have established themselves as the primary means of data storage for information systems ranging from large commercial transaction processing applications to 2(, based des.top applications. At the heart of most of today7s information systems is a relational database management system (+D!%"). +D!%"s have been the wor.house for data management operations for over a decade and continue to evolve and mature, providing sophisticated storage, retrieval, and distribution functions to enterprise,wide data processing and information management systems. (ompared to the file systems, relational database management systems provide organizations with the capability to easily integrate and leverage the massive amounts of operational data into meaningful information systems. The evolution of high,powered database engines such as >racleF has fostered the development of advanced 4enabling5 technologies including client;server, data warehousing, and online analytical processing, all of which comprise the core of today7s state,of,the,art information management systems. +elational Database %anagement "ystem is a software pac.age which manages a relational database, optimized for rapid and flexible retrieval of data: also called a database engine. &n other words +elational Database %anagement "ystem is a computer program that lets you store, index, and retrieve tables of data. The simplest way to loo. at an +D!%" is as a spreadsheet that multiple users can update. The most important thing that an +D!%" does is provide transaction. +elational Database %anagement "ystem is used to store, process and manage data arranged in relational tables. >ften used for transaction processing and data warehouses. +D!%" has ability to access data organized in tabular files that can be related to each other by a common field (item). An +D!%" has the capability to recombine the data items from different files, providing powerful tools for data usage. +elational databases are powerful because they re uire few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways. Almost all full,scale database systems are +D!%"7s. A database management system (li.e >racle) in which the database is organized and accessed according to the relationships between data items. &n a relational database, relationships between data items are expressed by means of tables. &nterdependencies among these tables are expressed by data values rather than by pointers. This allows a high degree of data independence. "ome of the best,.nown +D!%"7s include >racle, &nformix, "ybase, 2ostgre"ND and %icrosoft Access. (haracteristics of A +elational Database 0 +elational databases consist of one or more tables:, these can be /*oined7 by the
database software in ueries. 0 #ach table consists of7 rows and fields. 0 #ach table is about one aspect (or sub*ect) of the database. Thus contexts and finds are different sub*ects and are in different tables. 0 #ach row corresponds to one instance of the sub*ect of the table. Thus each row is about one context. 0 #ach row must be uni ue. This is a logical result of the row being about one instance. &f you have duplicate rows, the results of searching are unpredictable. 0 #ach field corresponds to a variable and is named to indicate its role. 'or example finds have a name and a size;weight. 0 #ach cell (where the fields and rows intersect) contains only one value. This is important because otherwise it is not possible properly to search. Asing a relational database, if you find a need for two values per cell , the design has to be altered. 0 &f fields in different tables have the same range of values and are thus about the same ob*ect, there is an association Eetween the fields and thus the tables , they are called /.eys7. The rows corresponding to matching values can be retrieved from different tables.
Q. ". What do you mean by 5e(ationa( 0onstraints?
Ans. The integrity of the data in a relational database must be maintained as multiple users7 access and change the data. ?henever data is shared, there is a need to ensure the accuracy of the values within database tables. The term data integrity has the following meanings) 3. The condition in which data is identically maintained during any operation, such as transfer, storage, and retrieval. 6. The preservation of data for their intended use. 8. +elative to specified operations, the a priori expectation of data uality.
Another aspect of data integrity is the assurance that data can only be accessed and altered by those authorized to do so. Data integrity means, in part, that you can correctly and consistently navigate and manipulate the tables in the database. There are two basic rules to ensure data integrity: entity integrity and referential integrity. The entity integrity rule states that the value of the primary .ey can never be a null value (a null value is one that has no value and is not the same as a blan.). !ecause a primary .ey is used to identify a uni ue row in a relational table, its value must always be specified and should never be un.nown. The integrity rule re uires that insert, update, and delete operations maintain the uni ueness and existence of all primary .eys. The referential integrity rule states that if a relational table has a foreign .ey, then every value of the foreign .ey must either be null or match the values in the relational table in which that foreign .ey is a primary .ey. Types of Data &ntegrity 3. 1ull +ule ) A null rule is a rule defined on a single column that allows or disallows inserts or updates of rows containing a null (the absence of a value) in that column. 6. Ani ue (olumn -alues) A uni ue value rule defined on a column (or set of columns) allows the insert or update of a row only if it contains a uni ue value in that column (or set of columns). 8. 2rimary Iey -alues) A primary .ey value rule defined on a .ey (a column or set of columns) specifies that each row in the table can be uni uely identified by the values in the .ey. 9. +eferential &ntegrity +ules) A referential integrity rule is a rule defined on a .ey (a column or set of columns) in one table that guarantees that the values in that .ey match the values in a .ey in a related table (the referenced value). +eferential integrity also includes the rules that dictate what types of data manipulation are allowed on referenced values and how these actions affect dependent values. +ules for +eferential &ntegrity The rules associated with referential integrity are) 0 +estrict) Disallows the update or deletion of referenced data. 0 "et to 1ull) ?hen referenced data is updated or deleted, all associated dependent data is set to 1ADD. 0 "et to Default) ?hen referenced data is updated or deleted, all associated dependent data is set to a default value. . 0 (ascade) ?hen referenced data is updated, all associated dependent data is
correspondingly updated. ?hen a referenced row is deleted, all associated dependent rows are deleted. 0 1o Action) Disallows the update or deletion of referenced data. This differs from +#"T+&(T in that it is chec.ed t the end of the statement, or at the end of the transaction if the constraint is deferred. (>racle uses 1o Action as its default action) (omplex &ntegrity (hec.ing) (omplex integrity chec.ing is a user,defined rule for a column (or set of columns) that allows or disallows inserts, updates, or deletes of a row based on the value it contains for the column (or set of columns). &ntegrity (onstraints Description An integrity constraint is a declarative method of defining a rule for a column of a table. >racle supports the following integrity constraints) 0 1>T 1ADD constraints for the rules associated with nulls in a column 0 A1&NA# .ey constraints for the rule associated with uni ue column values 0 2+&%A+$ I#$ constraints for the rule associated with primary identification values 0 '>+#&H1 I#$ constraints for the rules associated with referential integrity. >racle supports the use of '>+#&H1 I#$ integrity constraints to define the referential integrity actions, including) o Apdate and delete 1o Action o Delete (A"(AD# o Delete "#T 1ADD 0 (@#(I constraints for complex integrity rules $ou cannot enforce referential integrity using declarative integrity constraints if child and parent tables are on different nodes of a distributed database. @owever, you can enforce referential integrity in a distributed database using database triggers (see next section). Advantages of &ntegrity (onstraints This section describes some of the advantages that integrity constraints have over other alternatives, which include) 0 #nforcing business rules in the code of a database application
0 Asing stored procedures to completely control access to data 0 #nforcing business rules with triggered stored database procedures Types of &ntegrity (onstraints $ou can use the following integrity constraints to impose restrictions on the input of column values) 0 1>T. 1ADD &ntegrity (onstraints 0 A1&NA# Iey &ntegrity (onstraints 0 2+&%A+$ I#$ &ntegrity (onstraints 0 +eferential &ntegrity (onstraints 0 (@#(I &ntegrity (onstraints
3. 1>T 1ADD &ntegrity (onstraints. !y default, all columns in a table allow nulls. 1ull means the absence of a value. A 1>T 1ADD constraint re uires a column of a table contain no null values. 'or example, you can define a 1>T 1ADD constraint to re uire that a value be input in the last name column for every row of the employees table. 6. A1&NA# Iey &ntegrity (onstraints A A1&NA# .ey integrity constraint re uires that every value in a column or set of columns (.ey) be uni ue G that is, no two rows of a table have duplicate values in a specified column or set of columns. Ani ue Ieys ) The columns included in the definition of the A1&NA# .ey constraint are called the uni ue .ey. Ani ue .ey is often incorrectly used as a synonym for the terms A1&NA# .ey constraint or A1&NA# index. @owever, note that .ey refers only to the column or set of columns used in the definition of the integrity constraint. &f the A1&NA# .ey consists of more than one column, then that group of columns is said to be a composite uni ue .ey. This A1&NA# .ey constraint lets you enter an area, code and telephone number any number of times, but the combination of a given area code and given telephone number cannot be duplicated in the table. This eliminates unintentional duplication of a telephone number. A1&NA# Iey (onstraints and &ndexes ) >racle enforces uni ue integrity constraints with indexes. 'or example >racle enforces the A1&NA# .ey constraint by implicitly
creating a uni ue index on the composite uni ue .ey. Therefore, composite A1&NA# .ey constraints have the same limitations imposed on composite indexes) up to 86 columns can constitute a composite uni ue .ey. (ombine A1&NA# Iey /and 1>T 1ADD &ntegrity (onstraints) &n A1&NA# .ey constraints allow the input of nulls unless you also define 1>T 1ADD constraints for the same columns. &n fact, any number of rows can include nulls for columns without 1>T 1ADD constraints because nulls are not considered e ual to anything. A null in a column (or in all columns of a composite A1&NA# .ey) always satisfies a A1&NA# .ey constraint. (olumns with both uni ue .eys and 1>T 1ADD integrity constraints are common. This combination forces the user to enter values in the uni ue .ey and also eliminates the possibility that any new row7s data will ever conflict with an existing row7s data. 8. 2+&%A+$ I#$ &ntegrity (onstraints #ach table in the database can have at most one 2+&%A+$ I#$ constraint. The values in the group of one or more columns sub*ect to this constraint constitute the uni ue identifier of the row. &n effect, each row is named by its primary .ey values. The >racle implementation of the 2+&%A+$ I#$ integrity constraint guarantees that both of the following are true) 0 1o two rows of a table have duplicate values in the specified column or set of columns. 0 The primary .ey columns do not allow nulls. That is, a value must exist for the primary .ey columns in each row. 2rimary Ieys ) The columns included in the definition of a table7s 2+&%A+$ I#$ integrity constraint are called the primary .ey. Although it is not re uired, every table should have a primary .ey so that) 0 #ach row in the table can be uni uely identified 0 1o duplicate rows exist in the table 2+&%A+$ I#$ (onstraints and &ndexes) >racle enforces all 2+&%A+$ I#$ constraints using indexes. &n the primary .ey constraint created for the deptno column is enforced by the implicit creation of) 0 A uni ue index on that column
0 A 1>T 1ADD constraint for that column
(omposite primary .ey constraints are limited to 86 columns, which is the same limitation imposed on composite indexes. The name of the index is the same as the name of the constraint. Also, you can specify the storage options for the index by including the #1A!D# clause in the (+#AT# TA!D# or ADT#+ TA!D# statement used to create the constraint. &f a usable index exists when a primary .ey constraint is created, then the primary .ey constraint uses that index rather than implicitly creating a new one.
9. +eferential &ntegrity (onstraints Different tables in a relational database can be related by common columns, and the rules that govern the relationship of the columns must be maintained. +eferential integrity rules guarantee that these relationships are preserved. The following terms are associated with referential integrity constraints.
A referential integrity constraint re uires that for each row of a table, the value in the foreign .ey matches a value in a parent .ey.
"elf,+eferential &ntegrity (onstraints) Another type of referential integrity constraint is called a self,referential integrity constraint. This type o, foreign .ey references a parent .ey in the same table. &n the referential integrity constraint ensures that every value in the mgr column of the emp table corresponds to a value that currently exists in the empno column of the same table, but not necessarily in the same row, because every manager must also be an employee. This integrity constraint eliminates the possibility of erroneous employee numbers in the mgr column. 1ulls and 'oreign Ieys) The relational model permits the value of foreign .eys either to match the referenced primary or uni ue .ey value, or be null. &f any column of a composite foreign .ey is null, then the non,null portions of the .ey do not have to match any corresponding portion of a parent .ey.
Q.#. What is the difference bet'een the 5e(ationa( a($ebra and the 5e(ationa( 0a(cu(us?
Ans. 3. +elational algebra operations manipulate some relations and provide some expression in the form of ueries where as relational calculus are formed ueries on the basis of pairs of expressions. 6. +A have operator li.e *oin, union, intersection, division, difference, pro*ection, selection etc. where as +( has tuples and domain oriented expressions. 8. +A is procedural language where as +( is non procedural uery system. 9. #xpressive power of +A and +( are e uivalent. This means any uery that could be expressed in +A could be expressed by formula in +(. <. Any I( formula is translated in Algebric uery. E. There is modification which is easy in ueries in +A than the +(. F +A formed the mathematical form and have no specific*uer3 language +( also has mathematical form but has one uery language NA#D. K. +elational algebra is easy to manipulate and understand than +(. L. +A ueries are more powerful than the +(. 3=. +( are formed ?''s where as +A does not form any formula.
Q. %. Write a note on SQ& basic >ueries.
Ans. "tructured Nuery Danguage ("ND) is the language used to manipulate relational databases. "ND is tied very closely with the relational model. &n the relational model, data is stored in structures called relations or tables. #ach table has one or more attributes or columns that describe the table. &n relational databases, the table is the fundamental building bloc. of a database application. Tables are used to store data on #mployees, # uipment, %aterials, ?arehouses, 2urchase >rders, (ustomer >rders, etc. (olumns in the #mployee table, for example, might be Dast 1ame, 'irst 1ame, "alary, @ire Date, "ocial "ecurity 1umber, etc. "ND statements are issued for the purpose of) 0 Data definition , Defining tables and structures in the database (D!). 0 Data manipulation , &nserting new data, Apdating existing data, Deleting existing data, and Nuerying the Database ( +etrieving existing data from the database). Another way to say this is the "ND language is actually made up of 3) the Data Definition Danguage (DDD) used to create, alter and drop scema ob*ects such as tables and indexes, and 6) The Data %anipulation Danguage (D%D) used to manipulate the data within those schema ob*ects. "NDM2lus commands allow a user to manipulate and submit "ND statements. "pecifically, they enable a user to) 0 #nter, edit, store, retrieve, and run "ND statements 0 Dist the column definitions for any table 0 'ormat, perform calculations on, store, and print uery results in the form of reports 0 Access and copy data between "ND databases The following is a list of "NDM2lus commands and their functions. The most commonly used commands are emphasized in italics) 0 #xecute the current "ND statement in the buffer , same as +A1 0 A((#2T , Accept a value from the user and place it into a variable 0 A22#1D , Add text to the end of the current line of the "ND statement in the buffer.
0 AAT>T+A(# , Trace the execution plan of the "ND statement and gather statistics 0 !+#AI , "et the formatting behavior for the output of "ND statements 0 !T&TD# , 2lace a title on the bottom of each page in the printout from a "ND statement 0 (@A1H# , +eplace text on the current line of the "ND statement with new text 0 (D#A+ , (lear the buffer 0 (>DA%1 , (hange the appearance of an output column from a uery 0 (>%2AT# , Does calculations on rows returned from a "ND statement 0 (>11#(T , (onnect to another >racle database or to the same >racle database under a different user name 0 (>2$ , (opy data from one table to another in the same or different databases 0 D#D , Delete the current line in the buffer 0 D#"(+&!# , Dist the columns with data types of a table ((an be abbreviated as D#"() 0 #D&T , #dit the current "ND statement in the buffer using an external editor such as vi or emacs 0 #R&T , #xit the "NDM2lus program 0 H#T , Doad a "ND statement into the buffer but do not execute it 0 @#D2 , >btain help for a "NDM2lus command (&n some installations) 0 @>"T , Drop to the operating system shell 0 &12AT , Add one or more lines to the "ND statement in the buffer 0 D&"T , Dist the current "ND statement in the buffer 0 NA&T , #xit the "NDM2lus program 0 +#%A+I , 2lace a comment following the +#%A+I .eyword 0 +A1 , #xecute the current "ND statement in the buffer 0 "A-# , "ave the current "ND statement to a script file 0 "#T , "et an environment variable to a new value 0 "@>? , "how the current value of an environment variable 0 "2>>D , "end the output from a "ND statement to a file
0 "TA+T , Doad a "ND statement located in a script file and then run that "ND statement 0 T&%&1H , Ased to time the execution of "ND statements for performance analysis 0 T&TD# ,2lace a title on the top of each page in the printout from a "ND statement 0 A1D#'&1# , Delete a user defined variable
Q *. What are the various features of SQ&?
Ans. "ND 'eatures , 3. &t is meant to be an #nglish li.e Danguage using set #nglish phrases to manipulate the database @ow well it achieves this is uestionable 6. &t is non procedural. $ou specify the information re uired not the navigation and operations re uired to access the data. #ach +D!%" has an inbuilt uery optimiser which parses your "ND statements and wor.s out the optimum path to the re uired data. 8. ?hen you uery data, all the rows affected by your statement are dealt with in one go as a set, they are not dealt with separately. The wor. area that holds the set is .nown as a (A+">+. 9. "ND encompasses a range of uses and users. D!A7s, application programmers, management and end users can use "ND. <. &t provides commands for the following tas.s ), 0 uerying data 0 inserting, updating and deleting data 0 creating, modifying and deleting database ob*ects 0 controlling access to the database and database ob*ects 0 guaranteeing database consistency 0 monitoring database performance and configuration
Q. 1.. What is a tri$$er?
Ans. Triggers are special stored procedures that are executed when a table undergoes an &1"#+T, a D#D#T#, or an A2DAT# operation. Triggers often enforce referential integrity and can also call other stored procedures. >r Triggers are parameter,less procedures that are triggered (fired) either before or after inserting, updating or deleting rows from a table. !ecause they are fired by the event and not by choice they cannot have parameters
Q. 11. What is the difference bet'een a procedura( and a non)procedura( (an$ua$e?
Ans. 1on procedural 0 (an be used on its own to specify complex database operations. 0 D%!"s allow D%D statements to be entered interactively from a terminal, or to be embedded in a programming language. &f the commands are embedded in a general purpose programming language, the statements must be identified so they can be extracted by a pre, compiler and processed by the D!%". 2rocedural 0 %ust be embedded in a general purpose programming language. 0 Typically retrieves individual records or ob*ects from the database and processes each separately. 0 Therefore it needs to use programming language constructs such as loops. 0 Dow,level D%Ds are also called record at a time D%D" because of this. 0 @igh,level D%Ds, such as "ND can specify and retrieve many records in a single D%D statement, and are called set at a time or set oriented D%Ds. 0 @igh,level languages are often called declarative, because the D%D often specifies what to retrieve, rather than how to retrieve it.
Q. 12. 0onsider the fo((o'in$ emp(oyee database4 'here the primary 3eys are under(ined. 1mp(oyee 9person)name4 street4 city; Wor3s 9person)name4 company)name4 sa(ary;
0ompany 9company)name4 city; Mana$ers 9person)name4 mana$er)name; 6ive an e2pression in SQ& for each of the fo((o'in$ >ueries. 9i; -ind the names of a(( emp(oyees 'ho 'or3 for -irst Ban3 0orporation and (ive in &as ?e$as. 9ii; -ind the names4 street address and cities of residences of a(( emp(oyees 'ho 'or3 for -irst Ban3 0orporation and earn more than @1..... 9iii; -ind a(( emp(oyees 'ho do not 'or3 for -irst Ban3 0orporation. 9iv; -ind the company that has the sma((est payro((. 9v; -ind a(( emp(oyees in the database 'ho do not (ive in the same cities and on the same streets as do their mana$ers.
Ans. 9i; Se(ect person name from emp(oyee 'here company name A < first ban3 cooperation= and city A< &as ?e$as <. (ii) "elect person name, street and city from employee where company name 4 first ban. cooperation5 and salary Z 4\3=,=== 4. (iii) "elect M from employee where compant O S 4first ban. cooperation5. (iv) "elect Mfrom wor.s where5 salary Y\3,=== 4. (v) "elect Mfrom employee where 4employee city nameO S manager city name
Q. 13. 0onsider the fo((o'in$ re(ationa( database and $ive an e2pression in re(ationa( a($ebra to e2press each of the fo((o'in$ >ueriesB emp(oyee 9person)name4 street4 city; 'or3s 9person)name4 company)name4 sa(ary; company 9company)name4 city; mana$ers 9person)name4 mana$er)name; 9a; -ind the names of a(( emp(oyees 'ho 'or3 for -irst Ban3 0orporation. 9b; -ind the names and cities of residences of a(( emp(oyees 'ho 'or3 for -irst Ban3 0orporation.
9c; -ind the names of a(( emp(oyees 'ho do not 'or3 for -irst Ban3 0orporation. 9d; -ind names of a(( emp(oyees 'ho earn more than @1.... per annum. 9e; -ind names of a(( emp(oyees 'ho earn more than every emp(oyee of Sma(( Ban3 0orporation.
Ans. (i) "elect person]name from wor.s where company,name S 4first ban. cooperation5. (ii) "elect person]name and city from employee where company name S 4first ban. cooperation5. (iii) "elect person]name form wor.s where company,nameO 4first ban. cooperation5. (iv) "elect person]name from wor.s where salary Z 4\3=,===.5 (v) "elect M from wor.s where 4salary Z small ban. cooperation5.
Q. 1 . &ist any t'o procedura( pro$rammin$ (an$ua$es.
Ans. 3. 2ostgre "ND 6. D!6 "ND 8. 2D;"ND.
Q. 1!. What are ro' tri$$ers?
Ans. A row level trigger is fired each time the table is affected by the triggering statement e.g., if an update statement updates multiple rows of a table, a row trigger is fired once for each row affected by the update statement. &f a triggering statement affects no rows, a row trigger is not executed at all.
Q. 1". Define the term DD&.
Ans. DDD is data definition languages.
Q. 1#. 0onsider the fo((o'in$ re(ationa( database 1mp(oyee 9person)name4 street4 city; Wor3s 9person)name4 company)name4 sa(ary; 0ompany 9company)name4 city; Mana$er 9person)name4 mana$er)name;
Ans. "imilar Nuestion (hapter,6, Nuestion 1o. 66.
Q. 1%. 6ive re(ationa( a($ebra e2pression for each of the fo((o'in$ >ueriesB 9a; -ind the names of a(( emp(oyees 'ho 'or3 for first Ban3 corporation. 9b; -ind names4 cities of residence of a(( emp(oyees 'ho 'or3 for first Ban3 corporation and earn more than @1.4....
Ans. "imilar Nuestion (hapter,6, Nuestion 1o. 38.
Q. 1*. Define C?ie'7.
Ans. A view is a method of organising table data to meet a specific need. -iews are based on select statement which derive their data from real tables (+#AT# -&#? (reate a new view based on b3s in the database. The table names must already exist. The new view name must not exist. (+#AT# -&#? has the following syntax)
Additional information on the "#D#(T statement and "ND ueries can be found in the next section. 1ote that an >+D#+ !$ clause may not be added to the s l select statement when defining a view. &n general, views are read,only. That is, one may uery a view but it is normally the case that views can not be operated on with &1"#+T, A2DAT# or D#D#T#. This is especially true in cases where views *oining two or more tables together or when a view contains an aggregate function. D+>2 -&#? Drop a view from the database. The view name must already exist in the database. The syntax for the D+>2 -&#? command is) D+>2 -&#?:
Q. 2.. Define entity and attribute. ,r What do you mean by 1ntities and Attributes?
Ans. #ntities and their Attributes) #ntities are the basic units in modeling classes of concrete (real) or abstract ob*ects. #ntities have concrete existence or it contains ideas or concept e.g. a building, a room, a chair, employee etc. are all different entities. An entity type or entity set is a group of similar ob*ects of an organization, which is used for maintaining the data. #xamples of entity sets are transactions, *ob positions, employees, inventories of raw and finished products, students, academic staff, non,academic staff, manager etc.
An ob*ect can belong to different entity sets simultaneously. A person can be a student as well as a part time employee. (onsider the modeling of flight crew. &t consists of a group of individuals employed by an organization who belong to the entity sets #%2D>$## B 2#+">1. The individual numbers of the flight crew have different s.ills and functions. "o the entity set #%2D>$## add the attribute s.ill with possible values. "o entity set of #%2D>$## has relationship with attribute s.ill. To store data on an entity set, we have to create a model for it. 'or example, employees of an organization are modeled by the entity set #%2D>$##. ?e must have some properties as characteristics of employee that may be useful to the organization. "ome of these properties are employee, name, employee no., employee address, employee s.ill and employees pay. The properties that characterize an entity set are called its attribute. An attribute is also referred to by the term data item, date element data field, item, elementary item of ob*ect property. 0hapter 3 B Database Desi$n 8heory And Methodo(o$y 9Dart 1;
Q. 1. What is norma(iEation? Discuss various Forma( forms 'ith the he(p of e2amp(es.
Ans. 1ormalization is a design techni ue that is widely used as a guide in design relational databases. 1ormalization is essentially a two,step process that puts data in tabular form by removing repeating groups and then removes duplicated data fro the relational tables. 1ormalization theory is based on the concepts of normal forms. A relational tab is said to be a particular normal form if it satisfied a certain set of constraints. There currently five normal forms that have been defined. &n this section, we will cover first three normal forms that were defined by #. '. (odd "ignificance of 1ormalization 0 &mproves update efficiency, 0 +emoves many causes of anomalous data dependencies 0 Allows better chec.s for consistency. 0 &s (usually) better for uery handling. 0 !ut computational penalties in some "ND operations.
1ormalization is also significant due to following reason 3. To ma.e feasible represent any relation in the database 6. To obtain powerful relational retrieval using relational operator 8. To free relation from undesirable insertion, update and deletion anomalies 9. To reduce the need for restructuring the relations as new data types are introduced 1ormalization Avoids. 0 Duplication of Data G The same data is listed in multiple lines of the database
0 &nsert Anomaly , A record about an entity cannot be inserted into the table without first inserting information about another entity , (annot enter a customer without a sales order 0 Delete Anomaly , A record cannot be deleted without deleting a record about a related entity. (annot delete a sales order without deleting all of the customer7s information. 0 Apdate Anomaly , (annot update information without changing information in many places. To update customer information, it must be updated for each sales order the customer has placed !efore 1ormalization 3. !egin with a list of all of the fields that must appear in the database. Thin. of this as one big table. 6. Do not include computed fields 8. >ne place to begin getting this information is from a printed document used by the system. 9. Additional attributes besides those for the entities described on the document can be added to the database. 1ormal 'orms The normalization process as first proposed by (odd (3LF6), ta.es a relation schema through a series of tests to 4certify5 whether it satisfies a certain normal form. The process, which proceed in a top,down fashion by evaluating each relation against the criteria for normal form decomposing relation, as necessary, can thus be considered as relational design by analysis. &nitially, (odd proposed three normal forms, which he called first, second and third normal form. A stronger definition of 81'Gcalled !oyce (odd normal form (!(1')Gwas proposed later by !oyce (odd. All these normal forms are based on the functional dependencies among the attributes of a relation. Dater, a 91' and <1' were proposed, base on the concept of multivalued dependencies and *oin dependencies, respectively.
1eed of 1ormalization 1ormalization of data can hence be loo.ed upon as a process of analyzing the given relation schemas based on their 'Ds and primary .eys to achieve the desirable properties of) 3. %inimizing redundancy. 6. %inimizing the insertion, deletion, and updation. 1ormal forms are based on primary .ey. 1ormalization) &t is the process of structuring an unstructured relation into structural one with the purpose of removing redundancy and anomalies. 'irst 1ormal 'orm (&1 ') Definition ) A relation schema is said to be in &1' if the values in the domain each attribute of the relation are atomic. &n other words, only one value is associates with each attribute and the value is not a set. of values or a list of values. A database schema is in &1' if every relation schema included in database scheme is in &1'. A relation is in 31' if and only if all underlying domains contain scalar value only. @ere scalar is atomicity, meaning there should be single value at the intersecting of each row and column as shown in the '&+"T relation obtained by original relations
The functional dependencies in relation '&+"T is as follows)
!ut problem occurs with each of the three operations. &1"#+T) ?e cannot insert the fact that a particular supplier is located in a particular city until that supplier supplies at least one part. '&+"T relation does not show that supplier "< is located in Athens. The reason is that, until "< supplies some part, we have no appropriate primary .ey values. D#D#T#) &f we delete only the '&+"T tuple for a particular supplier, we destroy not only the shipment connecting that supplier to some port but also the information that the supplier is located in particular city. 'or example, if we delete the '&+"T tuple with "V value "8 2V value 26, we lose the information that "V is located in 2aris.
A2DAT#) The city value for a given supplier appears in '&+"T many times, in general. This redundancy causes update problems. 'or example, if supplier "& moves from Dondon to Amsterdam, we are faced with either the problem of reaching '&+"T to final every tuple connecting "3 and Dondon (and changing it) or the possibility of producing an inconsistent result (the city for "& might be given as Amsterdam in one tuple, Dondon in another). Therefore, to overcome this problem we ma.e 61'. !efore 2roceeding to next form let us denote) + S +elation "cheme " S "et of attributes ' S All of functional dependencies "econd 1ormal 'ormal (61') Definition) A relation schema +Y", 'Z is in second normal form (61') if it is in the &1' and if all nonprime attributes are fully functionally dependent on the relation .eys). A database schema is in 61' if every relation schema included in the data base schema is in 61'. 'eature) 3. A relation is in 61' if it is &1' and every non.ey attribute is fully dependent on the .ey. 6. &f the .ey is a single attribute then the relation is automatically in the 61'.
"econd 1ormal 'orm (definition assuming only one candidate .ey, which is thus the primary .ey)) A relation is in 61' if and only if it is in &1' and every non.ey attribute is irreducibly dependent on the primary .ey. "o we decompose '&+"T relation in two table word. &t should be clear that revised structure overcomes all the problems with update operation s.etched earlier. &1"#+T) ?e can insert the information that "< is located in Athens, even though "< does not currently supply any parts, by simply inserting the appropriate tuple into "#(>1D. D#D#T#) ?e can delete the shipment. (onnecting "8 and 26 by deleting the opposite tuple from "6: we do not lose the information that "8 is located in 2aris. A2DAT#) The "V , (&T$ redundancy has been eliminated. Thus we can change the city for "& from Dondon to Amsterdam by changing it once and for all in the relevant "#(>1D tuple. "till we have problem with their operations in the following ways) &1"#+T) ?e cannot insert the fact that a particular city has a particular status e.g., we cannot state that any supplier in +ome must have a status of <= G until we have some supplier actually located in that city.
D#D#T#) &f we delete the only "#(>1D tuple for a particular city, we destroy not only the information for the supplier concerned but also the information that city has that particular status. 'or example, if we delete the "#(>1D tuple of "<, we lose the information that the status for Athens is 8=). A2DAT#) The status for a given city appears in "#(>1D many times, in general (the relation still contain some redundancy). Thus, if we need to change the status for D>1D>1 from 6= to 8=, we are faced with either the problem of searching "#(>1D to find every tuple for Dondon (and changing it) or the possibility of producing an inconsistent result (the status in Dondon might be given 6= in one tuple and 8= in another). Again to overcome such problems we replace the original relation ("#(>1D, in this case) by two pro*ections ma.ing 81'. Third 1ormal 'orm (81') Definition) A7 relation schema +Y", 'Zis in 81', if for all nontrivial functional dependencies in ' of the form R[A, either R contains a .ey (i.e., R is a super.ey) or A is a prime attribute. A database schema is in 81' if every relation schema included in the database schema is in 8 1'. 'eature) A relation + is in 81' if and only if it is in 61' and every non.ey attribute is non, transitively dependent on the primary .ey. (81') (Definition assuming only one candidate .ey, which is thus the primary .ey)) A relation .is in 81' if and only if it is in 61' and every non.ey attribute is nontransitively dependent on the primary .ey. (41o transitive dependencies5 implies no mutual dependencies). +elation, "( and (" are both in 81'
Thus, by such relation we have removed transitivity from relation "#(>1D !oyce (odd 1ormaO 'orm (!(1')
A relation is in !(1' if and only if every nontrivial, left,irreducible 'D has a candidate .ey as its determinant. >r, less formally, !1' (informal definition)) A relation is in !(1' if and only if the only determinants are candidate .eys. +elation '&+"T and "#(>1D, which are not in 81', are not in !(1# either: also that relation "2, "( and (", which were in 81' are also in !(1'. +elation '&+"T contains three determinants, namely "V, (&T$, and T"V, 2V), of these, only T"V, 2VJ is a candidate .ey, so '&+"T is not in !(1' "imilarly, "#(>1D is not in !(1' ether because the determinant (&T$ is not a candidate .ey. +elation "2, "( and (" on the other hand, are each in !(1', because in each case the (single) candidate .ey in the only determinant in the reduction %ulti,-alued Dependencies and 'ourth 1ormal 'orm &t was proposed as a sample form of 81' but it was found to be stricter than 81' because every relation in !(1' is also in 81': however a relation in 81' is not necessarily in !(1'. Definition) A normalized relation scheme +Y", 'Z is .in !(1' if for every nontrivial 'D in ' of the form x A where R WW" and AW", R is a super.ey of +. !(1' is a special case in 81'. 'eatures. 3. Iey attributes (candidate .eys) are composite (there is no single .ey which identify record). 6. %ore than one candidate .eys are there. 8. &n each candidate .ey at least one attribute is overlapping.
Any table if follow above mentioned three features of !(1', then we will say that this table is in !(1' 9. #xplanation
(omposite Iey wor.ing as a candidate .ey.
A relation is in !(1' if every determinant is a candidate .ey.
Advisor
G &t is in &1' by definition. G &t is in 61' since any non .ey attributes are dependent on the entire .ey. G &t is in 81' because it has no transitive dependencies. 3. &t is not in !(1' because it has a determinant '1ame, that is not a candidate .ey.
"TA,AD- Iey) ("&D, '1ame) AD-G"A!J Iey) ('1ame) +elations in !(1' 1ow we can say that a relation is in !(1' if and only if every nontrivial left, irreducible 'D has a candidate .ey as its determinant. >r less formally, A relation is !(1' if and only if the only determinant are candidate .eys. 'ourth 1ormal 'orm (91') Definition) Hiven a relation schema + such that the set D of 'Ds and %-D" satisfied, consider a set of attributes R and $ where R WW+ , $W +. The relation schema + is in 'ourth normal 'orm (91') if for all multivalued dependencies of the form R WWWWWW DC, either R WWW$ is a trivial %-D or R is a super.ey of +. A database scheme is in 91' if all relation schema included in the database schema are in 91'. Joint Dependencies and 'ifth 1ormal 'orm "o far in this chapter we have assumed that the sole operation necessary or available in the further normalization process is the replacement of a relation in a non,loss way by exactly two of its pro*ections. This assumption has successfully carried us as far 91'. &t comes perhaps as a surprise, therefore, to discover that there exist relatior< that cannot be non,loss,decomposed into two pro*ections but can be non,loss, decomposed into three (or more). To coin an ugly but convenient term, we will describe such a relation as 4n,decomposable5 (for some n Z 6)G meaning that the relation in uestion can be non,loss,decomposed into n pro*ections but not into m for any m Yn. A relation that can be non,loss,decomposed into two pro*ections we will call 46, decomposable,5
&n short 31' 0 A relation is in &1' if it contains no repeating groups 0 To convert an unnormalised relation to & 1' either) 0 'latten the table and change the primary .ey, or 0 Decompose the relation into smaller relations, one for the repeating groups and one for the non,repeating groups. 0 +emember to put the primary .ey from the original relation into both new relations. 0 This option is liable to give the best results. 61' 0 A relation is in 61' if it contains no repeating groups and no partial .ey functional dependencies 0 +ule) A relation in &1.' with a single .ey field must be in 61' 0 To convert a relation with partial functional dependencies to 61' create a set of new relations 0 >ne relation for the attributes that are fully dependent upon the .ey 0 >ne relation for each part of the .ey that has partially dependent attributes 81' 0 A relation is in 8' if it contains no repeating groups, no partial functional dependencies, and no transitive functional dependencies 0 To convert a relation with transitive functional dependencies to 81', remove the attributes involved in the transitive dependency and put them in a new relation 0 +ule) A relation in 61' with only one non,.ey attribute must be in 81' 0 &n a normalized relation a non,.ey field must provide a fact about the .ey, the whole .ey and nothing but the .ey. +elations in 81' are sufficient for most practical database design problems. @owever, 81' does not guarantee that all anomalies have been removed.
Q. 2. What are mu(tiva(ued dependencies? ,r Define the tern functiona( dependency.
Ans. 'A1(T&>1AD D#2#1D#1(&#" &ntroduction !asically, a functional dependence (usually abbreviated 'D) is a many to one relationship from one set of attributes to another within a given relation. &n the shipments relation "2, for example, there is a functional dependence from the set of attributes ("V, 2V) to the set attributes (NT$) what this means is that for many values of the attribute pair ("V, 2V), there is one corresponding value of the attribute (NT$, 'Ds) provide a basis for a scientific attac. on a number of practical problems. This is because 'Ds possess a rich set of interesting formal properties, which ma.e it possible to treat the problems in uestion in a formal and rigorous manner. The terms functional dependence are used interchangeably in the technical literature. (ustomary #nglish usage would suggest that the term 4dependence5 be used for the 'D concept perse and would reserve the term 4dependency5 for 4the ob*ect that depends.5 !ut we very fre uently need to refer to 'Ds in the plural, and 4dependencies5 !ut we very fre uently need to refer to #Ds in the plural, and 4dependencies5 seems to trip off the tongue more readily than 4dependencies5: hence our use of both terms.
!asic definitions &n order to illustrate the ideas of the present section, we ma.e use of a slightly revised version of the shipments relation, one that includes, in addition to the usual attributes "V, 2V and NT$, an attribute (&T$, +#2+#"#1T&1H T@ (&T$ '>+ T@# +#D#-A1T "A22D&#+, ?e will refer to this revised relation as "(2 to avoid confusion. A possible tabulation of relation "(2 is given in 'ig. 1ow, it is very important in this area G as in so many other G to distinguish clearly between a. the value of a given relation (i.e., relation variable) at a given point in time and b. the set of all possible values that the given relation (variable) might assume at different times. &n what follows, we will first define the concept of functional dependency as it applies to (ase a. and then extend it to apply to (ase b. @ere then is the definition for (ase (a). Det + be a relation, and let R and $ be arbitrary subsets of the set of attributes or +. Then we say that $ is functionally dependent on RG hi symbols. R $
(read 4R functionally determines 3,5 or simply 4R arrow $5) G if and only if each R,value in + has associated with it precisely one $,value in +. &n other words, whenever two tuples of + agree on their R,value, they also agree on their $,value. 'or example, the tabulation of relation "(2 shown in 'ig. L.3 satisfies the 'D. (sV) ((&T$)
because every "cp tuple with a given "V value also has the same (ity -alue. &ndeed it also satisfies several more 'Ds, the following among them)
'ig) The relation "(2 ("ample tabulation)
(#xercise) (hec. these.) The left,hand side and right,hand side of an 'D are sometimes called the determinant and the dependent, respectively. As the definition states, the determinant and dependent are both sets of attributes. ?hen the set contains *ust one attribute, however G i.e. when it is a singleton set G we will often drop the set brac.ets and write : *ust #.g., "V (ity
As already explained. The foregoing definitions apply to 4(ase (a)5Gi.e. to individual relation values. @owever, when we consider relation variables G in particular, ?hen we consider base relations G we are usually interested not so much in the 'Ds that happen to hole in the particular value that the variable happens to have at some particular time, but rather in those 'Ds that hold for all possible values of that variable in the case of "(2, for example, the 'D. "V (ity
@ole for all possible values of "(2, because, at any given time, a given supplier has excessively one corresponding city, and so any two tuples appearing in "(2 at the same time with the same supplier number must necessarily have the same city as well. &n fact The statement that this 'D holds 4for all time5 (i.e., for all possible values of "(2) integrity constraint for "(2 G it places limits on the values that "(2 can legitimately assume. @ere then is the 4(ase5 definition of functional dependency (the extensions over the (ase definition are shown in boldface. 0 Det + be a relation -ariable, and let R and $ be arbitrary subsets of the set of attributes of +. Then we say that $ is functionally dependent on R G in symbols. R $
(read 4R functionally determines $,5 or simply 4R arrow $5)Gif and only if, in every possible legal value of +, each R G value has associated with it precisely one $ G value &n other words, in every possible legal value of +, whenever two tuples agree on their R, values, they also agree on their $ G value. @enceforth, we will usually ta.e the term 4functional dependency5 to have this latter, more demanding, timeGindependent meaning(barring explicit statements to the contrary). @ere are some time,independent 'Ds that apply to the relation variable "(2)
1otice in particular that the following 'Ds, which do hold in the sample tabulation of 'ig. 8.K, do not hold 4for all time5.
&n other words, the statement that (e.g.)5every shipment for a given supplier has the same shipment uantity5 happens to be true for the sample values in 'ig 8.K but it is not true for all possible legal values of "(2. &t is worth pointing out that if R is a candidate .ey of relation +Gin particular, if it is the primary .ey G then all attributes $ of relation + must necessarily be functionally dependent on R (this fact follows from the definition of candidate .ey). &n the usual parts relation, 'or example, we must necessarily have)
&n fact, if relation + satisfies the 'D A [ ! and A is not a candidate .ey, M then + will involve some redundancy. &n the case of relation "(2, for example, the fact that a given supplier is located in a given city appears many times, in general (see 'ig. 8.K). 1ow, even if we restrict our attention to 'Ds that hold 4for all time,5 the set of 'Ds satisfied by all legal values of a given relation can still be very large, as the "(2 example suggests. ?hy is this ob*ective desirableW >ne reason is that (as already stated) 'Ds represent integrity constraints, and hence the D!%" needs to chec. them when updates are performed. Hiven a particular set " of 'Ds. Therefore, it is desirable to find some other set T that is (ideally) much smaller than " and has the property that every 'D in " is implied by the 'Ds in T. &f such a set T can be found, it is sufficient that the D!%" enforce the 'Ds in T, and the 'Ds in " will then be enforced automatically, The problem of finding such a set T is thus of considerable practical interest. Trivial and 1ontrivial Dependencies 1ote) &n the remained of this section, we will occasionally abbreviate 4functional dependency5 to *ust 4dependency,5 "imilarly for 4functionally dependent on function dependency5 to *ust 4dependency.5 "imilarly for 4functionally dependent on5, functionally determines,5 etc. >ne obvious way to reduce the size of the set of 'Ds. we have to deal with is to eliminate the trivial dependencies, 4&s trivial5, if it cannot possibly not be satisfied *ust one of the 'D is
trivial if and only if the right hand side is a subset (not necessarily a proper subset) of the left, hand side. As the name implies, trivial dependencies are not very interesting in practice: we are usually more interested in practice in nontrivial dependencies (which are, of course. 2recisely the ones that are not trivial), because these are the ones that correspond to 4genuine5 integrity constraints. ?hen we are dealing with formal dependency theory, however, we cannot necessarily assume that all dependencies are nontrivial.
(losure of a "et of Dependencies As already suggested that certain 'Ds imply others. As a simply example, the 'D T"V, 2V3 ((ity, Nty.^
&mplies both the following 'Ds) T"V, 2V) T"V, 2V) ((ity) TNty.)
As a more complex example, suppose we have a relation + with three attributes A, !, and (, such that the 'Ds A ! and ! ( both hold in +. Then it is easy to see that the 'D A ( also holds in +. The 'D A ( here is an example of a transitive 'D G ( is said to depend on A transitively, via !. The set of all 'Ds that are implied by a given set " of 'Ds is called the closure of ". and is denoted ". (learly we need a way of computing " from ". The first attac. on this problem appeared in a paper by Armstrong which gave a set of rules of inference (more usually called Armstrong7s axioms) by which new 'Ds can be inferred from given ones. Those rules can be stated in a variety of e uivalent ways, one of the simplest of which is as follows /Armstrong7s inference rules) Det A, 38, and ( be arbitrary subsets of the set of attributes of the given relation +, and let us agree to write (e.g.) A! to mean the union of A and !. Then 3. +eflexivity) &f ! is a subset of A, then A 6. Augmentation) &f A 8. Transitivity) &f A !, then A( !(. (. !
! and !
(, then A
#ach of these three rules can be directly proved from the definition of functional dependence (the first is *ust the definition of a trivial dependence, of course). %ore over, the rules are complete, in the sense that, given a set " of 'Ds, all 'Ds implied by " can be derived from " using the rules. They are also sound, sense that no additional 'Ds (i.e., 'Ds not implied by ") can be so derive. &n other words, the rules can be used to derive precisely the closure ". "everal further rules can be derived from the three given above, the following among them.
These additional rules can be used to simplify the practical tas. of computing "C from ". (D is another arbitrary subset of the set of attributes of +.) 9. "elf,determination) A A ! and A !(. !D (.
<. Decomposition) &f A !(, then A E. Anion) &f A ! and A (, then A
F (omposition &f A
! and (
D, then A(
And Darwin proves the following rule, which he calls the Heneral Anification Theorem. K. &f A ! and ( D, then A((, !) [!D (where 4#5 is union and 4 ,4 is set difference).
The name 4Heneral Anification Theorem5 refers to the fact that several of the earlier rules can be seen as special cases) #xample) "uppose we are given relation + ?ith attributes A, !, (, D, #, ', and the 'Ds A ! (D !( # #'
>bserve that we are extending our notation slightly (though not incompatibly) by writing, e.g. !( for the set consisting of attributes ! and (G previously !( would have meant the union of ! and (, where ! and ( were sets of attributes. 1ote) &f you would prefer a more concrete example, ta.e A as employee number for a pro*ect directed by that manager (uni ue within manager), # as department name, and ' as percentage of time allocated by the specified manager to the specified pro*ect. ?e now show that the 'D[AD ' holds in +, and so is a member of the closure of the given set)
(losure of a "at of Attributes
?e have not yet given an effective algorithm for computing the closure of a given set " of 'Ds. @owever, in this section we give an effective way of determining whether a given (specified) 'D is in that closure ?e begin our discussion with the notion of a super.ey. A super.ey for a relation + is a set of attributes of + that includes at least one candidate .ey of + as a subsetG not necessarily a proper subset. >f course (The definition of 4super.ey5 can thus be derived from that of 4candidate .ey5 by simply deleting the irreducibility re uirement.) it follows immediately that the super.eys for a given relation + are precisely those subsets I of the set of attributes of + such that the functional dependency,
holds true for every attribute A of +. 1ow suppose we .now the 'Ds that hold for some given relation, and we need to determine the candidate .eys for that relation. The candidate .eys are, by definition, those super.eys that are irreducible. "o determining whether or not a given set of attributes I is a super.ey is a big step toward determining whether I is in fact a candidate .ey. To determine whether I is a super.ey, we need to determine whether the set of all attributes functionally dependent on I is in fact the set of all attributes of + and so, given a set " of 'Ds that hold in +, we need a way of determining the set of all attributes of + that are functionally dependent on IGthe so,called closure I of I under ".A simple algorithm for computing this closure is given in 'ig. #xample) "uppose we are given relation + with attributes A, !, (, D, #, ', and 'Ds
?e now compute the closure TA,!^C of the set of attributes TA,!^ under this set of 'Ds. 3. ?e initialize the result (D>"A+# QI,"U T> TA,!). 6. ?e now go round the inner loop four times, once for each of the given 'Ds. >n the first iteration (for the 'D, A !(), we find that the left,hand side is indeed a subset of (D>"A+# QI."J as computed so far, so we add attributes (! and) ( to the result. (D>"A+# TI.") is now the set (A, !, (^
8. >n the second iteration (for the 'D, # ,, ('), we find that the left,hand side is not a subset of the result as computed so far, which thus remains unchanged. 9. >n the third iteration Tfor the 'D, ! ,, #), we add # to (D>"A+# QI,"U which now has value (A,!,(,#). <. >n the fourth iteration (for the 'D, (D G/ #'), (D>"A+# QI,"U, remains unchanged. E. 1ow we go round the inner loop four times again. >ne the first iteration, the result does not change: on the second, it expands to TA,!,(,#,'^: on the third and fourth, it does not change. F. 1ow we go round the inner loop four times again. (D>"A+# QI.".U does not change, and so the whole process terminates, with 1ote,
therefore, that TA,!^ is not a super.ey (and hence not a candidate .ey a fortiori). An important corollary of the foregoing is as follows) Hiven a set " of 'Ds. ?e can easily tell whether a specific 'D R,7 $ follows from <, because that 'D will follow if and only if $ is a subset of the closure R7 of R under ". &n other words, we now have a simple way of determining whether a given 'D R[$ is in the closure "7 of ". &rreducible "ets of Dependencies Det "& and "6 be two sets of 'Ds. &f every 'D implied by "& is implied by the 'Ds in "6, i.e., if "3C is a subset of "6C, we say that "6 is a cover for "3M. ?hat this means is that if the D!%" enforces that constraints represented by the 'Ds in "6, then it will automatically be enforcing the 'Ds in "&. 1ext, if "6 is a cover for "3 and "3 is a cover for "6, i.e., if we say
that "& B "6 are e uivalent. (learly, if "& B "6 are e uivalent, then if the D!%" enforces the constraints represented by the 'Ds in "6, it will automatically be enforcing the 'Ds in "&, and vice versa. 1ow we define a set " of 'Ds to be irreducible, if and only if it satisfies the following three properties) 3. The right hand side (the dependent) of every 'D in " involves *ust one attribute (i.e. is a singleton set). 6. The left hand side (the determinant) of every 'D in " is irreducible in turn, meaning that no attribute can be discarded from the determinant without changing the closure (i.e. without converting " into some set not e uivalent to "). ?e will say that such an 'D is left,irreducible. 8. 1o 'D in " can be discarded from " without changing the closure converting " into same set not e uivalent to "). (i.e. without
'or example, consider the familiar parts selection 2. The following 'Ds (among others) hold in that relation)
This set of 'D is easily seen to be irreducible. The right,hand side is a single attribute in each case, the left hand side is obviously irreducible in turn, and none of the 'Ds can be discarded without changing the closure (i.e., without losing some information). !y contest, the following sets of 'Ds are not irreducible.
?e now claim that for every set of 'DA, there exists at least one e uivalent set that is irreducible &n fact,. this is easy to see. Det the original set of 'Ds be ". Than.s to the decomposition sets, we can assume without loss of generality that every 'D in " has a singleton right,hand side. 1ext, for each 'D fin ", we examine each attribute A in the left hand side of f: if " and the set of 'Ds obtained by eliminating A from the left hand side of f are e uivalent, we delete A from the left hand side of f. Then, for each 'D f remaining in <, if " and ",f are e uivalent, we delete f from " The final set " is irreducible and is e uivalent to the original set ". #xample) "uppose we are given relation + with attributes A, !, (, D, and 'Ds,
?e now compute an irreducible set of 'Ds that is e uivalent to this given set 3.The first step is to rewrite the 'Ds such that each one has. a singleton right,hand side.
?e observe immediately that the 'D A be eliminated.
! occurs twice, so one occurrence can
6. 1ext, attribute can be eliminated from the D.@.". of the 'D A( D because we have A (, so A [ A( by augmentation, and we are given A( D so A D by transitivity: thus the ( on the left,hand,side (D@") of A( D is redundant. 8. 1ext, we observe that the 'D, A! ( can be eliminated, because again we have A so A! (! by augmentation, so A! ( by decomposition. (,
9. 'inally, the 'D, A ( is implied by the 'D, A eliminated. ?e are left with)
! and !
(, so it can also be
This set is irreducible. A set 3 of 'Ds that is irreducible and is e uivalent to some other set " of 'Ds is said to be an irreducible cover for ". Thus. Hiven some particular set " of 'Ds that need to be enforced, t. is sufficient for the system to find and enforce an irreducible, cover & instead. ?e should ma.e it clear, however, that a given set of 'Ds does not necessarily have a uni ue irreducible cover.
Q. 3. What is >uery optimiEation?
Ans. >ptimzation techni ues that apply heuristic rules to modify the internal representation of a uery, which is usually in the form of a uery tree or a uery graph data structure to improve its expected performance. The parser of a high,level uery first generates an initial internal representation, which is then optimized according to heuristic rules. 'ollowing that, uery execution plan is generated to execute groups of operations based on the access paths available on the files involved in the uery. >ne of the main heuristic rules is to apply "#D#(T and 2+>J#(T operations before applying the J>&1 or other binary operations. This is because the size of the file resulting from a binary operation, such as J>&1, is usually a multiplicative function of the sizes of the input files. The "#D#(T and 2+>J#(T operations reduce the size of a file and hence, should be applied before a *oin or other binary operation.
Q. . What are the various $uide(ines for database desi$n?
Ans. The process of database design can be stated as follows) Design the logical and physical structure of one or more databases to accommodate the information needs of the users in an organisation for a defined set of applications. The goals of database design are multiple) 3. "atisfy the information content re uirements of the specified users and applications
6. 2rovide a natural and easy,to,understand structuring of the information. 8. "upport processing re uirements and any performance ob*ectives such as response time, processing time, and storage space. These goals are very hard to accomplish and measure, and they involve an inherent tradeoff if one attempts to achieve more 4naturalness5 and 4understand ability5 of the model, it may be at the cost of performance The problem is aggravated because the database design process often begins with informal and poorly defined re uirements &n contrast, the result of the design activity is a rigidly defined database schema that cannot be modified easily once the database is implemented. ?e can identify, six main phases of the database design process (i) +e uirements collection and analysis (ii) (onceptual database, design (iii) (hoice of a D!%" (iv) Data %odel mapping (also called logical database design) (v) 2hysical database design (vi) Database system implementation and tuning.
Q. ! Discuss the concepts of norma(iEation in detai(.
Ans. 1>+%AD '>+%" !A"#D >1 2+&%A+$ I#$" 1ormalization &n very simple words normalization is a techni ue which helps to determine the most appropriate grouping of data items into records, segments or tuples. This is necessary as the data items are arranged in tables which indicate the structure, relationship integrity in the relational databases. 1ormal 'orms The normalization process as first proposed by (odd (3LF6), ta.es a relation schema through a series of tests to 4certify5 whether it satisfies a certain normal form. The process, which proceed in a top,down fashion by evaluating each relation against the criteria for normal form decomposing relation, as necessary, can thus be considered as relational design by analysis. &nitially, (odd proposed three normal forms, which he called first, second and third normal form. A stronger definition of 81'Gcalled !oyce (odd normal form (!(1')Gwas proposed later by !oyce (odd. All these normal forms are based on the functional
dependencies among the attributes of a relation. Dater, a 91' and <1' were proposed, base on the concept of mutivalued dependencies and *oin dependencies, respectively. 1eed of 1ormalization 1ormalization of data can hence be loo.ed upon as a. process of analyzing the given relation schemas based on their 'Ds and primary .eys to achieve the desirable properties of) 3. %inimizing redundancy. 6. %inimizing the insertion, deletion, and updation. 1ormal forms are based on primary .ey. 1ormalization) &t is the process of structuring an unstructured relation into structural one with the purpose of removing redundancy and anomalies. 'irst 1ormal 'orm (&1') Definition) A relation schema is said to be in &1' if the values in the domain of each attribute of the relation are atomic. &n other words, only one value is associated with each attribute and the value is not a set of values or a list of values. A database schema is in &1' if every relation schema included in database scheme is in &1'. A relation is in &1' if and only if all underlying domains contain scalar values only. @ere scalar is atomic ity, meaning there should be single value at the intersection of each row and column as shown in the '&+"T relation obtained by original relation
The functional dependencies in relation '&+"T is as follows)
!ut problem occurs with each of the three operations. &1"#+T) ?e cannot insert the fact that a particular supplier is located in a particular city until that supplier supplies at least one part. '&+"T relation does not show that supplier "< is located in Athens. The reason is that, until "< supplies some part, we have no appropriate primary .ey values. D#D#T#) &f we delete only the '&+"T tuple for a particular) supplier, we destroy not only the shipment connecting that supplier to some port but also the information that the supplier is located in particular city. 'or example, if we delete the. '&+"T tuple with "V value "8 2V value 26, we lose the information that "V is located in 2aris. A2DAT#) The city value for a given supplier appears in '&+"T many times, in general. This redundancy causes update problems. 'or example, if supplier "3 moves from Dondon to Amsterdam, we are faced with either the problem of reaching '&+"T to final every tuple
connecting "3 and Dondon (and changing it) or the possibility of producing an inconsistant result (the city for "3 might be given as Amsterdam in one tuple, Dondon in another). Therefore, to overcome this problem we ma.e 61'. !efore 2roceeding to next form let us denote) + S +elation "cheme " S "et of attributes ' S All of functional dependencies "econd 1ormaO 'ormal (61') Definition) A relation schema +Y", 'Z is in second normal form (61') if it &s in the &1' and if all nonprime attributes are fully functionally dependent on the relation .eys). A database schema is in 61' if every relation schema included in the data base schema is in 61'. 'eature) &. A relation is in 61' if it is &1' and every non.ey attribute is fully dependent on the .ey. 6. &f the .ey is a single attribute then the relation is automatically in the 61'.
0hapter 3 ) Dart 2
S10,FD "econd 1ormal 'orm (definition assuming only one candidate .ey, which is thus the primary .ey)) A relation is in 61' if and only if it is in 31' and every non.ey attribute is irreducibly dependent on the primary .ey. "o we decompose '&+"T relation in two table word. &t should be clear that revised structure overcomes all the problems with update operation s.etched earlier. &1"#+T) ?e can insert the information that "< is located in Athens, even though "< does not currently supply any parts, by simply inserting the appropriate tuple into "#(>1D. D#D#T#) ?e can delete the shipment. (onnecting "8 and 26 by deleting the opposite tuple from "6: we do not lose the information that "8 is located in 2aris. A2DAT#) The "V , (&T$ redundancy has been eliminated. Thus we can change the city for "& from Dondon to Amsterdam by changing it once and for all in the relevant "#(>1D tuple. "till we have problem with their operations in the following ways) &1"#+T) ?e cannot insert the fact that a particular city has a particular status G e.g., we cannot state that any supplier in +ome must have a status of <= G until we have some supplier actually located in that city. D#D#T#) &f we delete the only "#(>1D tuple for a particular city, we destroy not only the information for the supplier concerned but also the information that city has that particular status. 'or example, if we delete the "#(>1D tuple of "<, we lose the information that the status for Athens is 8=). A2DAT#) The status for a given city appears in "#(>1D many times, in general 8. (ourse &D is such an attribute which is overlapping.
@ence relation shown is in 81' and also in !(1'. A relation "chema +(", ') Sis in !(1' ("Sset of Attributes, 'SAll of functional dependency), if a set of attributes R which is subset of " and an attribute $ which belongs to Ds.
>ne of the following two conditions hold. i. #ither $ belongs to DR ($, R) is a Trivial Attribute ii. >r R is a "uper.ey. ?hereas, Trivial dependency) &f the right hand side is a subset of the left hand side is .nown as trivial dependency. eg. ("V, 2V) ,. "V "uper Iey) Adding primary .ey with any attribute is .nown as super .ey.
A relation is in !(1' if every determinant is a candidate .ey. Advisor
G &t is in &1' by definition.
G &t is in 61' since any non .ey attributes are dependent on the entire .ey. G &t is in 81' because it has no transitive dependencies. (the relation still contain some redundancy). Thus, if we need to change the status for D>1D>1 from 6= to 8=, we are faced with either the problem of searching "#(>1D to find every tuple for Dondon (and changing it) or the possibility of producing an inconsistent result (the status in Dondon might be given 6= in one tuple and 8= in another). Again to overcome such problems we replace the original relation ("#(>1D, in this case) by two pro*ections ma.ing 81'. Third 1ormal 'orm (81') Definition) A relation schema +Y", 'Zis in 81', if for all nontrivial functional dependencies in ' of the form R[A, either R contains a .ey (i.e., R is a super.ey) or A is a prime attribute. A database schema is in 81' if every relation schema included in the database schema is in 8 1'. 'eature) A relation + is in 81' if and only if it is in 61' and every non.ey attribute is non, transitively dependent on the primary .ey. (81') (Definition assuming only one candidate .ey, which is thus the primary .ey)) A relation is in 81' if and only if it is ir 61' and every non.ey attribute is non transitively dependent on the primary .ey. (41o transitive dependencies5 implies no mutual dependencies). +elation, "( and (" are both in 81'
'unctional dependencies in the relation "( and ("
Thus, by such relation we have removed transitivity from relation "#(>1D !oyce (odd 1ormal 'orm (!(1') A relation is in !(1' if and only if every nontrivial, left,irreducible 'D has a candidate .ey as its determinant. >r, less formally,
!(1' (informal definition)) A relation is in !(1' if and only if the only determinants are candidate .eys. +elation '&+"T) and "#(>1D, which are not in 81', are not in !(1' either: also that relation "2, "( and (", which were in 81' are also in !(1'. +elation '&+"T contains three determinants, namely "V, (&T$, and T"V, 2V^: of these, only T"V, 2V^ is a candidate .ey, so '&+"T is not in !(1'. "imilarly, "#(>1D is not in !(1' either, because the determinant (&T$ is not a candidate .ey. +elation "2, "( and (" on the other hand, are each in !(1', because in each case the (single) candidate .ey in the only determinant in the reduction. %ulti,-alued Dependencies and 'ourth 1ormal 'orm &t was proposed, as a sample form of 81' but it was found to be stricter than 81' because every relation in !(1' is also in 81': however a relation in 81' is not necessarily in !(1'. Definition ) A normalized relation scheme +Y", 'Z is in !(1' if for every nontrivial 'D in ' of the form R[A where R & " and A&", R is a super.ey of +. !(1' is a special case in 81'. 'eatures) 3. Iey attributes (candidate .eys) are composite (there is no single .ey which identify record). 6. %ore than one candidate .eys are there. 8. &n each candidate .ey at least one attribute is overlapping.
Any table if follow above mentioned three features of !(1', then we will say that this table is in !(1'. 9. #xplanation
3. &t is not in !(1' because it has a determinant '1ame, that is not a candidate .ey.
"TA,AD- Iey) ("&D,'1ame) AD-G"A!J Iey) ('1ame) +elations in &8(1' 1ow we can say that a relation is in !(1' if and only if every nontrivial left,irreducible 'D has a candidate .ey as its determinant. >r less formally, A relation is in !(1' if and only if the only determinant are candidate .eys. 'ourth 1ormal 'orm (91') Definition) Hiven a relation schema + such that the set D of 'Ds and %-D" are satisfied, consider a set of attributes R and $ where R +,$ +. The relation schema
+ is in 'ourth normal 'orm (91') if for all multivalued dependencies of the form R $ DC, either R $ is a trivial %-D or R is a super.ey of +. A database scheme is in
91' if all relation schema included in the database schema are in 91'. %ulti,-alued Dependencies and 'ourth 1ormal 'orm The 81' and !(1' normal forms most of the times serve the purpose well. @owever, there are occasions, where higher normal forms must be considered. The next higher form of normalization is the fourth normal form. &t ma.es use of a new .ind of dependency, called a multi,valued dependency (%-D): %-Ds are a generalization of 'Ds. Di.ewise, the definition of 'ifth normal form ma.es use of another new .ind of dependency, called a *oin dependency (JD), JDs in turn are a generalization of %-D. %ulti,valued dependence) Det + be a relation, and let A, !, and ( be subsets of the attributes of +. Then we say that ! is multi,dependent on A, in symbols) A !
(read 4A multi,determines !,5 or simply 4A double arrow !5) G if and only if, for every possible legal value or +, the set of ! values matching a give (A value, ( value) pair depends only on the A value and is independent of the ( value. To understand it, we will ta.e an example. "uppose we are given a relation @(TR (@ for 4hierarchy5) containing information about course, teachers, and texts, in which the attributes corresponding to teachers and texts are relation,valued (see 'ig.). As you can see, each @(TR tuple consists of a course name, plus a relation containing teacher names, plus a relation containing text names (two such tuples are shown in the figure). The intended meaning of such a tuple is that the specified course can be taught by any of the specified teachers and uses all of the specified text as references. ?e assume that, for a given course, there can exist any number of corresponding teachers and any number of corresponding texts. %oreover, we also assume G perhaps not very realisticallyO G that teachers and text are uite independent of one another: that is, no teacher who actually teaches any particular offering of a given course, the same texts are used. 'inally, we also assume that a given teacher or a given text can be associated with any number of course.
1ow suppose that we want to eliminate the relation,valued attributes. >ne way to do this, however, is simply to. replace relation @(TR by a relation (TR with three scalar attributes (>A+"#, T#A(@#+, and T#RT as indicated in 'ig. 8.3=. As you can see from the figure, each tuple of @(TR gives rise to rn M n tuples, in (TR, where m and n are the cardinalities of the T#A(@#+" and T#RT" relations in that @(TR tuple. 1ote that the resulting relation (TR is 4all .ey5 (the sole candidate .ey for @(TR, by contrast, was *ust T(>A+"#)).
'igure) -alue for relation (TR corresponding to the @(TR value in 'ig. The meaning of relation (TR is basically as follow8) A tuple ((>A+"#) c,T#A(@#+) t, T#RT x:) appears in (TR if and only if course c can be taught by teacher t and uses text x as a reference. >bserve that, for a given course, all possible combinations of teacher and text appear: that is, (TR satisfies the (relation) constraint, &f tuples (c, tl, xl), (c, t6, x6) both appear
then tuples (c, tl, x6), (c, t6, xl) both appear also. 1ow, it should be apparent that relation (TR involves a good deal of redundancy, leading as usual to certain update anomalies. 'or example, to add the information that the physics course can be taught by a new teacher, it is necessary to insert two new tuples, one for each of the two texts. (an we avoid such problemsW ?ell it is easy to see that) 3. The problems in uestion are caused by the fact that teachers and texts are completely independent of one another: 6. %atters would be much improved if (TR were decomposed into its two pro*ectionsG call them (T and (RGon ((>A+"#.T#A(@#+) and ((>A+"#.T#RT3U respectively (see 'ig.). To add the information that the physics course can be taught by a new teacher, all we have to do now is insert a single tuple into relation (T. (1ote that relation (TR can be recovered by *oining (T and (R bac. together again, so the decomposition is non,loss.) Thus, it does seem reasonable to suggest that there should be a way of 4further normalizing5 a relation li.e (TR. 1ote) At this point, you might ob*ect that the redundancy in (TR was unnecessary in the first place, and hence that the corresponding update anomalies were unnecessary too. %ore specifically, you might suggest that (TR need not include all possible T#A(@#+;T#RT combinations for a given course: for example, two tuples are obviously sufficient to show that the physics course has two teachers and two texts. The problem is, which two tuplesW Any particular choice leads to a relation having a very unobvious interpretation and very strange update behavior (try slating the predicate for such a relationO G i.e., try slating the criteria for deciding whether or not some given update is an acceptable operation on that relation).
'igure) -alues for relation (T and (R corresponding to the (TR value in 'ig.
&nformally, therefore, it is obvious that the design of (TR is bad and the decomposition into (T and (R is better. The trouble is, however, these facts are not formally obvious. 1ote in particular that (TR satisfies no functional dependencies at all (apart from trivial ones such as (>A+"# G/ (>A+"#), in fact. (TR is in !(1', since as already noted it is all .ey G any 4ill .ey5 relation must necessarily be in !(1'. (1ote that the two pro*ections (T and (" are also all
.ey and hence in !(1'). The ideas of the previous chapter are therefore of no help with the problem at hand. The existence of 4problem5 !(1' relation li.e (TR was recognized very early on, and the way to deal with them was also understood, at least intuitively. @owever, it was not until 3LFF that these intuitive ideas were put on a sound theoretical footing by 'agin7s introduction of the notion of multi,valued dependencies, %-Ds. %ulti,valued dependencies are a generalization of functional dependencies, in the sense that every 'D is an %-D, but the converse is not true (i.e., there exist %-Ds that are not 'Ds). &n the case of relation (TR there are two %-Ds that hold) (ourse (ourse Teacher Text
1ote the double arrows: the %-D A ! is read as 4! is multi,dependent on A,5 or, e uivalently, 4A multi,determines !.5 Det us concentrate on the first %-D, (ourse Teacher. &ntuitively, what this %-D means is that, although a course does not have a single corresponding teacher, i.e. the functional dependence (>A+"# T#A(@#+ does not hold, nevertheless, each course does have a well,defined set of corresponding teachers. !y 4well, defined5, here we mean, more precisely, that for a given course c an a given text x, the set of teachers t matching the pair (c, x) in (TR depends on the value c alone,it ma.es no difference which particular value of x we choose. The second %$D, (>A+"# T#RT, is interpreted analogously. @ere then is the formal definition of %ulti,valued dependence) Det + be a and let A, !, and ( be subsets of the attributes of + Then we say that ! is multi,dependent on A,in symbols. A !
(read 4A multi,determines !,5 or simply 4A double arrow !5) , if and only if, in every possible legal value of +, the set of ! values matching a give (A value, ( value) pair depends only on the A value an is independent of the ( value. &t is easy to show that, given the relation + TA, !, (^, the %-DA ! holds if and only if the %-DA ( also holds. %-Ds always go together in pairs in this way. 'or this reason it is common to represent them both in one statement, thus) A . ! ; ( 'or example) (ourse Teacher ; Text
?e stated above that multi,valued dependencies are a generalization of functional dependencies, in the sense that every 'D is an %-D. %ore precisely, an 'D is an %-D in which the set of dependent (right,hand side) values matching a given determinant
(left,hand side) value is always a singleton set. Thus, if A
!. then certainly A
!.
+eturning to our original (TR problem, we can now see that the trouble with relations such as (TR is that they involve %-Ds that are not also 'Ds. (&n case fit is not obvious, we point out that it is precisely the existence of those %-Ds that leads to the necessity ofG for example G inserting two tuples to add another physics teacher. Those two tuples are needed in older to maintain the integrity constraint that is represented by the %-D.) The two pro*ection (T and (R do not involve any such %-Ds which is why they represent an improvement over the original design. ?e would therefore li.e to replace (TR by those two pro*ections, and an important theorem proved by 'agin in reference allows us to ma.e exactly that replacement) 0 Theorem ('agin) ) Det + (A, !, () be a relation, where A, ! and ( are sets of attributes. Then + is e ual to the *oin of its pro*ections on (A, () and (!, () if and only if + "atisfies the %-Ds A ! ; (. 0 'ourth normal form) +elation + is in 91' if and only if, whenever there exist subsets A and ! of the attributes of + such that the nontrivial (An %-D A ! is trivial if either A is a superset of ! or the union of A and ! is the entire heading) %-D A ! is satisfied, then all attributes of + are functionally dependent on A. &n other words, the only nontrivial dependencies ('D< or %-Ds) are in the form I R (i.e., functional dependency from a super.ey I to some other attribute R). # uivalently, + is in 91' if it is in !(1' and all %-Ds in + are in fact 4'Ds out of .eys.5 1ote in particular, therefore, that 91' implies !(1'. +elation (TR is not in 91', since it involves an %-D that is not 'D at all, let alone an 'D 4out of a .ey.5 The two pro*ections (T and (R are both in 91', however. Thus 91' is an improvement over !(1' in that it eliminates another form of undesirable dependency. ?hat is more, that 91' is always achievable: that is, any relation can be non,loss,decomposed into an e uivalent collection of 91' relations. Joint Dependencies and 'ifth 1ormal 'orm "o far in this chapter we have assumed that the sole operation necessary or available in the further normalization process is the replacement of a relation in a non,loss way by exactly two of its pro*ections. This assumption has successfully carried us as far as 91'. &t comes perhaps as a surprise, therefore, to discover that there exist relations that cannot be non,loss,decomposed into two pro*ections but can be non,loss, decomposed into three (or more). To coin an ugly but convenient term, we will describe such a relation as 4n,decomposable5 (for some n Z 6),. meaning that the relation in uestion can be non,loss,decomposed into n pro*ections but not
into m for any m Yn. A relation that can be non,loss,decomposed into two pro*ections we will call 46, decomposable,5 (onsider relation "2J from the suppliers,parts,pro*ects database (but ignore >T$ for simplicity): a sample value is shown at the top of 'ig. 8.3=. 1ote that relation "2D is all .ey and involves no nontrivial 'Ds or %-Ds at all, and is therefore in 91'. 1ote to that 'ig. also shows) (a) The three binary pro*ections "2, 2J, and J" corresponding to the "2J relation value shown at the top of the figure: (b) The effect of *oining the "2 and 2J pro*ections (over 2V): (c) The effect of *oining that result and the J" pro*ection (over JV and "V).
'igure) +elation "2J is the *oin of all three of its binary pro*ections but not of any two
>bserve that the result of the first *oin is to produce a copy of the original plus one additional (spurious) tuple, and the effect of the second *oin is then to eliminate that spurious tuple, thereby bringing us bac. to the original "2J relation. &n other words, the original "2J relation is 8,decomposable.
1ote) The net result is the same whatever pair of pro*ections we choose for the first *oin, though the intermediate result is different in each case. #xercise) (hec. this claim. 1ow, the example of 'ig. 8.36 is of course expressed in terms of relations. @owever, the 8, decomposability of "2J could be a more fundamental, time,independent property G i.e., a property satisfied by all legal values of the relationGif the relation satisfies a certain time, independent integrity constraint. To understand what that constraint must be, observe first that the statement 4"2J is e ual to *oin of its three pro*ections "2, 2@ and J"5 is precisely e uivalent to the following statement) if and and then the pair the pair the pair the triple ("3, 23) (23, J3) (J3, "&). ("3, 23, J&) appeals in "2 appears in 2J appears in J" appears in "2J
because the tripl "3, 23, Ji) obviously appears in the *oin of "2, 2J, and J" (The converse of this statement, that if ("&:2&,J3) appears in "2J then ("&, 23) appears in pro*ection "2 etc. is clearly true for any degree,8 relation "2J.) "ince ("3,2i) appears in "2 if and only if ("&, 23, J&) appears in "2J for some J6, and similarly for (23,J3) and (J3,"&), we can rewrite the statement above as a constraint on "2J) &f ("3,23,J6), ("6,23,J&), ("&,26,J3) appear in "2J then ("3,23,J3) also appears in "2J And if this statement is true for all timeGi.e., for all possible legal values of relation "2JG then we do have a time,independent constraint on the relation (albeit a rather bizarre one) 1otice the cyclic nature of that constraint (4if "& is lin.ed to 'l and J& must all coexists in the same tuple5). A relation will be n,decomposable for some nZ 6 if and only if it satisfies some such (n,way) cyclic constraint. "uppose then that relation "2J does in fact satisfy that time,independent constraint (the sample values in 'ig. 8.33 are consistent with this hypothesis). 'or brevity, let us agree in refer to that constraint as (onstraint 8D (8D for 8,decomposable). ?hat does (onstraint 8D mean in real,world termsW Det us try to ma.e it a little more concrete by giving an example. The constraint says that, in the portion of the real world that relation "2J is supposed to represent, it is a fact that, if (for example) (a) "mith supplies mon.ey wrenches, and (b) %on.ey wrenches are used in the %anhattan pro*ect, and
(c) "mith supplies the %anhattan pro*ect. then "mith supplies mon.ey wrenches to the %anhattan pro*ect. 1ote that a b, and c. together normally do not imply d. ?e are saying there is no trapG because there is an additional real,world constraint in effect, namely (onstraint 8D, that ma.es the inference of d. from a, b, and c:. valid in this particular case. To return to the main topic of discussion) !ecause (onstraint 8D is satisfied if and only if the relation concerned is e ual to the *oin of certain of its pro*ections, we refer to that constraint as a *oin dependency (JD). A JD is a constraint on the relation concerned, *ust as a %-D or an 'D is a constraint on the relation concerned. 0 Joint dependency) Det + be a relation, and let A, ! _ be subsets of the attributes of +. Then we say that + satisfies the JD (A, ! ,_^M (read 4star A, 38. _) if and only if every possible legal value of + is e ual to the *oin of its pro*ections on A, !,... _. 'or example, if we agree to use "2, it means the subset ("V,2V) of the set of attributes of "2J, and similarly for 2J and J", then relation "2J satisfies the JD ("2, 2J, J"). ?e have seen, then, that relation "2J, with its JD ("2, 2J, J"), can be 8, decomposed. The uestion is, should it beW And the answer is 42robably yes.5 +elation "2& (with its JD) suffers from a number of problems over update operations, problems that are removed when it is 8,decomposed.
'ig. "ample update problems in "2J
'agin7s theorem, to the effect that +(A,!,() can be non,loss,decomposed into its pro*ections on (A, !) and (A, () if and only if the %-Ds A [[ ! and A [[ ( hold in A, can now be restated as follows) 3. + (A,!,() satisfies the JD (A!,A() if and only if it satisfies the %-Ds A !;(. "ince this theorem can be ta.en as a definition of multi,valued dependency, it follows that an %-D is *ust a special case of a JD, or (e uivalently) that JDs are a generalization of %-Ds. 'ormally, we have A !;(SM(A!,A()
1ote) &t follows from the definition that *oin dependencies are the most general form of dependency possible (using, of course, the term 4dependency5 in a very special sense). That is, there does not exist a still higher form of dependency such that JDs are merely a special case of that higher form G so long as we restrict our attention to dependencies that deal with a relation being decomposed via pro*ection and recomposed via *oin. (@owever, if we permit other decomposition and recomposition operators, then other types of dependencies might come into play). +eturning now to our example, we can see that the problem with relation "2J is that it involves a JD that isnot an %-D, and hence not an 'D either. ?e have also seen that it is possible, and probably desirable, to decompose such a relation into smaller components, namely, into the pro*ections specified by the *oin dependency. That decomposition process can be repeated until all resulting relations are in fifth normal form, which we now define) 'ifth normal form) A relation + is in <1', also called pro*ection,*oin normal form (2J1') if and only if every nontrivial *oin dependency that holds for + is implied by the candidate .eys of +. 1ote) ?e explain below what it means for a JD to be 4implied by candidate .eys.5 +elation "2J is not in <1'. &t satisfies a certain *oin dependency, namely (onstraint 8D, that is certainly not implied by its sole candidate .ey (that .ey being the combination of all of its attributes). To state this differently, relation "2* is not in <1', because (a) it can be 8 decomposed and (b) 8,decomposability is not implied by the fact that the combinations ("V, 2V, JV) is a candidate .ey. !y contrast, after 8,decomposition, the three pro*ections "2, 23, and J" are each in <1', since they do not involve any (nontrivial) JDs at all. Although it might not yet be obvious,because we have not yet explained what it means for a JD to be implied by candidate .eysGit is a fact that any relation in <1' is automatically in 91' also, because (as we have seen) an %-D is a special case of a JD. &n fact any %-D that is implied by a candidate .ey must be in fact an 'D in which that
candidate .ey is the determinant, that any given relation can be non,less,decomposed into an e uivalent of location of <1' relations: that is, <1' is always achievable. ?e now explain what it means for a JD to be implied by candidate .eys. 'irst we consider a simple example. "uppose once again that the familiar "A22D&#+" relation " has two candidate .eys, ("V) and T"1A%#). Then that relation satisfies several *oin dependenciesGfor example, it satisfies the JD M (A,!,. . ._) is trivial if and only if one of the pro*ections A, !,.. ._ is the identity pro*ection + (i.e., the pro*ection over all attributes of +). TT"V, "1A%# , "TATA"^, T"V, (&T$^^ That is, relation " is e ual to the *oin of its pro*ections on ("V, "1A%#", "TATA") and ("V, (&T$), and hence can be non,loss,decomposed into those pro*ections. (This fact does not mean that it should be so decomposed, of course, only that it could be.) This JD is implied by the fact that ("V) is a candidate .ey. Di.ewise, relation " also satisfies the JD. TT "V. "1A%#), T"V, "TATA"^ T"1A%#, (&T$)) This JD is implied by the fact that ("V) and T"1A%#^ are both candidate .eys. As the foregoing example suggests, a given JD M (A, ! _) is implied by candidate .eys if and only if each of A, !,... _ is in fact a super.ey for the relation in uestion. The given, relation +, we can decompose it in <1' so long as we .now all candidate .eys and all JDs in +. @owever, discovering all the JDs might itself be a nontrivial operation. That is, whereas it is relatively easy to identify 'Ds and %-Ds (because they have a fairly straight forward real,world interpretation), the same cannot be said for JDs rmal that is, they are not %Ds and not 'Ds,because the intuitive meaning of JDs might not be obvious. @ence the process of determining when a given relation is in 91' but not in <1', and so could probably be decomposed to advantage, is still unclear. #xperience suggests that such relations are pathological cases and li.ely to be rare in practice. &n conclusion, we note that it follows from the definition that <1' is the ultimate normal form with respect to pro*ection and *oin (which accounts for its alternative name. pro*ection,*oin normal form). That is, a relation in <1' is guaranteed to be free of anomalies that can be eliminated by ta.ing pro*ections. 'or if a relation is in <1', the only *oin dependencies are those that are implied by candidate .eys, and so the only valid decompositions are ones that are based on those candidate .eys. (#ach pro*ection in such a decomposition will consist of one or more of those candidate .eys, plus zero or more additional attributes.) 'or example, the "A22D&#+" relation " is in <1', &t can be further decomposed in several nonloss ways, as we saw earlier, but every pro*ection in any such decomposition will still include one of the original candidate .eys, and hence there does not seem to be any particular advantage in further reduction.
The 1ormalization 2rocedure "ummarized Ap to this point in this chapter, we have been concerned with the techni ue of nonloss decomposition as an aid to database design. The basic idea is as follows) Hiven some 31' relation + and some set of 'Ds, %-Ds, and JDs that apply to +, we systematically reduce + to a collection of 4smaller5 (i.e.. lower,degree) relations that are e uivalent to + in a certain well, defined sense but are also in some way more desirable. (The original relation + might have been obtained by first eliminating certain relation,valued attributes) #ach step of the reduction process consists of ta.ing pro*ections of the relations resulting from the preceding step. The given constraints are used at each step to guide the choice of which pro*ections to ta.e next. The overall process can be stated informally as a set of rules, thus) 3. Ta.e pro*ections of the original &1' relation to eliminate any 'Ds that are not irreducible. This step will produce a collection of 61' relations. 6 Ta.e pro*ections of those 61' relations to eliminate any transitive 'Ds. This step will produce a collection of 81' relations. 8. Ta.e pro*ections of those 81' relations to eliminate any remaining 'Ds in which the determinant is n0hapter B System /mp(iementation 8echni>ues 9Dart 1;
Q 1 What is database security? 12p(ain the mechanism for maintainin$ database security.
Ans. "ecurity ma database involves both policies and mechanism to protect the data and ensure that it is not accessed, altered or deleted without proper authorization As well as information is increasingly in an organization, more and more database created day,to,day. "o there are all database should be secure from unauthorized access or manipulations from the hand of un.nown person Data has to be protected in the database. There are two dimensions for the protection of data in the database,. 'irst a certain class of data is available only to those person who are authoize9 to access it This ma.es the data confidential e g the medical records of patients in a hospital are accessible to health care officer "econd, the data must be protected from accidental or intentional corruption or destruction e g data on national defense is vital to the security of a state There is safety of data processing in a chemical plant &n addition to the economic or strategic reasons for protecting data from un,authorization access, corruption or destruction, there is a privacy dimension for data security and integrity
"ecurity and &ntegrity Threats)
"ome security and integrity threats are) "ome types of threats can only be addressed using social, behavior and control mechanism to damage the data. The threats are either accidental or intentional. "o there are two types of security B integrity threats occur in the security integrity concept
Accidental security and &ntegrity threats.
"ome accidental security and integrity threats are) 3. A user can get access to a portion of, the database which other users cannot access Also that user damage a part of the data accidentally then whole data may be corrupted e g if an application programmer accidentally delete some function or subroutine then whole of the program in database will be affected. 6. "ometimes failure of any portion effect the whole data, or example, during a transaction processing if power supply becomes (fail) off then the computed data will not be transferred to the storage device and so data will be lost. & 2roper recovery procedures are normally used to recover from the failure occurring during transaction processing. 8 ."ometimes concurrent processing or concurrent usage of data gives problem and it will be lost or damaged 9 ."ometime system error occur. A dial in user may be assigned the identity of another dial in user who was disconnected accidentally or who hung up without going through a long off procedure. <. "ometimes improper authorization will cause the problem, which could lead to database security and ; or integrity violation. E. @ardware failure also causes the problem of data destruction. "o to avoid this @ardware failure security, integrity should be needed.
%alacious or &ntentional "ecurity and &ntegrity Threats ) "ome intentional security and integrity threats factors are as ) 3. A computer system operator or system programmer can intentionally by pass the normal security and integrity mechanisms, alter or destroy the data in the database or ma.e unauthorized copies of sensitive data. 6. An unauthorized user can bet access to a secure terminal or the password of an authorized user and compromise the database. "uch user could also destroy the data base file.
8. Authorized users could pass on sensitive information under pressure or form personal gain. 9. "ystem and application programmers could by pass normal security in, their program by directly accessing database files, and ma.ing changes and copies for illegal use. <. An unauthorized person can get access to the computer system, physically or by. using communication channel and compromise the database.
2rotection)
'our levels of defense (protection) are generally recognized for database security. These are) (a) @uman 'actor) which encompass the ethical, legal and social environments. An organization depends on these to provide a certain degree of protection.
(b) 2hysical "ecurity) mechanism includes appropriate loc.s and .eys and entry 3og to computing facility and terminals. "ecurity of the physical storage devices (magnetic tapes, dis. pac. etc) within the organization and when being transmitted from one location to another must be maintained. Aser identification and password have to be .ept confidential otherwise unauthorized user compromises the database.
(c)Administrative control) (ontrols are the security and access control policies that determine what information will be accessible to what class of user and the type of access that will be allowed to this class.
(d) >" and D!%" mechanism) These are very good feature of security. >perating system gives protection to the data and progress both in primary B secondary memories Also users are established by operating system The D!%" transaction management, audit and recovery data during logging process Also D!%" have some integrity constraint and validation procedure for the chec. of user and procedures.
2rotection and -ersion %ethods of 2rotection)
2rotection 2rotection is the branch of security when you want to safe the data from unauthorized access by using different mechanisms and ways Then these ways and mechanisms are protected. "ome protection methods are)
(a) &dentification and Authentication) The authorization mechanism prepares the user profile for user and indicates the portion of the database accessible to that user and the mode of access allowed. The enforcement of the security policies in the data base system re uires that the system .now the identity of security policies in the database system re uires that the system .nows the identity of the user ma.ing the re uest. "o before ma.ing any re uest the user has to identify her or himself to the system and authenticate the identification to confirm that the user in fact the correct person. The simplest and most common authentication scheme used is a password to authenticate the user. The user enters the user name or number B then authenticates her(himself) by the password These are used once for the initial signs on to the system. !ut for sensitive and important data on every step authentication ; identification procedures can be operated. "ometimes badge, card or .eys are used for access. (b) Distributed system 2rotection) 'or the protection of data, security enforcement in distributed system can be enhanced by distributor "ensitive information can be fragmented and stored at dispersed sites. The lea.age of some portion of the fragment data may be not as disastrous as the lea.age of unfragmented data. Also with distribution different sites can have different levels of security and protection of data. (c) (ryptography and #ncryption) "uppose defence want to send or transmit a message with protected way. The message is)
4Than.s are coming towards /A%+&T"A+
>ne method of transmitting this message is to substitute a different character of the alphabet for each character in the message. &f we ignore the space between words and the punctuation and by substitution can be made by shifting each character by a different random amount, then the above message can be transformed into as) 4"bolrbsfdpnn*ohupxbsebnB*ulbs5
The above process is cryptography. This is also called #ncryption of data, !efore transmission data should be encrypted. This is best way to protection.
"ystem &ntegrity.
&ntegrity implies that any properly authorized access, alteration or deletion of data in the database does not change the validity of the data security and integrity concepts are distinct but are related with each other. Actually integrity is obtained from security. The mechanism that is applied to ensure that the data in the database is correct and consistent is called Data &ntegrity. The integrity is also the maintenance of data, which is damaged by unauthorized person.
Data &ntegrity
This re uires that there is a need for guarding against invalid database operations. An operation here is used to indicate any action performed on behalf of a user or application program that modifies the state of the database. "uch operations are the result of the action such as update, insert or delete. Database integrity involves the correctness of data. This correctness has to be preserved in the presence of concurrent operations, error in the user7s operations and application programs and failures in @ardware and "oftware &ntegrity has recovery system for the lost and damaged data and also chec. for data information stored in memory. &n database integrity there are some types of constraints that the database has to enforce to maintain the consistency and validity of the data. &ntegrity constraints are hard to understand when we use these constraints rule in application program. (entralizing the integrity chec.ing directly under the D!%" reduces duplication and ensure the consistency and validity of the database. The centralized integrity constraints can be maintained in a system catalog (data dictionary) and can be accessible to the database users via the uery language.
Q. 2. What is database security?
Ans. "ecurity in a database involves both policies and mechanism to protect the data and ensure that it is not accessed, altered or deleted without proper authorization. As well as information is increasingly in an organization, more and more database are created day,to,day. "o there are all database should be secured from unauthorized access or manipulations from the hand of un.nown person. Data has to be protected in the database. There are two
dimensions for the protection of data in the database. 'irst a certain class of data is available only to those person who are authorized to access it. This ma.es the data confidential e.g. the medical records of patients in a hospital are accessible to health care officer. "econd, the data must be protected from accidental or intentional corruption or destruction e.g. data on national defense is vital to the security of a state. There is safety f data processing in a chemical plant. &n addition to the economic or strategic reasons for protecting data from un,authorization access, corruption or destruction, there is a privacy dimension for data security and integrity.
Q. 3. Discuss the concept of transaction in detai(. ,r What are the desirab(e properties of transaction?
Ans. A transaction is a logical unit of wor. that must be either entirely completed or aborted: no intermediate states are acceptable. %ost real,world database transactions are formed by two or more database re uests. A database re uest is the e uivalent of a single "ND statement in an application program or transaction. A transaction that changes the contents of the database must alter the database from one consistent database state to another. To ensure consistency of the database, every transaction must begin with the database in a .nown consistent state
A transaction is a logical unit of wor. that contains one or more "ND statements. A transaction is an atomic unit. The effects of all the "ND statements in a transaction can be either all committed (applied to the database) or all rolled bac. (undone from the database). A transaction begins with the first executable "ND statement. A transaction ends when it is committed or rolled bac., either explicitly with a (>%%&T or +>DD!A(I statement or implicitly when a DDD statement is issued.
(onsider a ban.ing database, when a ban. customer transfers money from a savings account to a chec.ing account, the transaction can consist of three separate operations) 0 Decrement the savings account 0 &ncrement the chec.ing account 0 +ecord the transaction in the transaction *ournal &f all three "ND statements can be performed to maintain the accounts in proper balance, the effects of , the transaction can, be applied to the database. @owever, if a problem such as insufficient funds, invalid account number, or a hardware failure prevents one or two of the statements in the transaction from completing, the entire transaction must be rolled bac. so that the balance of all accounts is .correct. 'igure 6 illustrates the ban.ing transaction example
"tatement #xecution and Transaction (ontrol)
A "ND statement that runs successfully is different from a committed transaction. #xecuting successfully means that a single statement was)
02arsed 0 'ound to be a valid "ND construction 0. +un without error as an atomic unit. 'or example, all rows of a multirow update are changed. @owever, until the transaction that contains the statement is committed, the transaction can be rolled bac., and all of the changes of the statement can be undone.. A statement, rather than a transaction, runs successfully.
(ommitting means that a user has explicitly or implicitly re uested that the changes in the transaction be made permanent. An explicit re uest )means that the user issued a (>%%&T statement. An implicit re uest can be made through normal termination of an application or in data definition language, for example. The changes made by the "ND statements of your transaction become permanent and visible to other users only after your transaction has been committed. >nly other users7 transactions that started after yours will see the committed changes.
$ou can name a transaction using the "#T T+A1"A(@>1 ... 1A%# statement before you start the transaction. This ma.es it easier to monitor long,running transactions and to resolve in, doubt distributed transactions.
&f at any time during execution a "ND statement causes an error, all effects of the statement are rolled bac.. The effect of the rollbac. is as if that statement had never been run. This operation is a statement,level rollbac.. #rrors discovered during "ND statement execution cause statement,level rollbac.s. An example of such an error is attempting to insert a duplicate value in a primary .ey. "ingle "ND statements involved in a deadloc. (competition for the same data) can also cause a statement, level rollbac.. #rrors discovered during "ND statement parsing, such as a syntax error, have not yet been run, so they do not cause a statement, level rollbac..
A "ND statement that fails causes the loss only of any wor. it would have performed itself. &t does not cause the loss of any wor. that preceded it in the current transaction. &f the statement is a DDD statement, then the implicit commit that immediately preceded it is not undone. The user can also re uest a statement,level rollbac. by issuing a +>DD!A(I statement. 1ote that users cannot directly refer to implicit save points in rollbac. statements.
+esumable "pace Allocation.
"ome D!%" provides a means for suspending, and later resuming, the execution of large database operations in the event of space allocation failures. This enables an administrator to ta.e corrective action, instead of the database server returning an error to the user. After the error condition is corrected, the suspended operation automatically resumes. This feature is called resumable space allocation and the statements that are affected are called resumable statements.
A statement runs in a resumable mode only when the client explicitly enables resumable semantics for the session using the ADT#+ "#""&>1 statement. +esumable space allocation is suspended when one of the following conditions occurs) 0 >ut of space condition 0 %aximum extents reached condition 0 "pace uota exceeded condition 'or nonresumable space allocation, these conditions result in errors and the statement is rolled bac.. "uspending a statement automatically results in suspending the transaction. Thus all transactional resources are held through a statement suspend and resume.
?hen the error condition disappears (for example, as a result of user intervention or perhaps sort space released by other ueries), the suspended statement automatically resumes execution.
2rocess of Transaction. A D!%" must provide transaction processing system (T2 "ystem) to guarantee that if the transaction executes some updates and then a failure occurs due to some reason before transaction reaches its termination, then those updates will be undone. Therefore the transaction either executes in its entirety or is totally canceled.
Transaction processing systems provide tools to help software development for applications that involve uerying and updating databases. The term 4T2 system5 is generally ta.en to mean a complete system, including application generators, one or more database systems, utilities and networ.ing software. ?ithin a Ti7 system, there is a core collection of services, called the T2 monitor, that coordinates the flow of transactions through the system. ('ig.)
&n order to wor. properly transaction,processing system needs following system re uirements 0 @igh Availability) "ystem must be on,line and operational while enterprise is functioning. 0 @igh +eliability) (orrectly trac.s state, does not lose data, controlled concurrency. 0 @igh Throughput) %any users SZ many transactions;sec. 0 Dow +esponse Time) >n,line SZ users are waiting. 0 Dong Difetime) (omplex systems are not easily replaced. 0 %ust be designed so they can be easily extended as the needs of the enterprise change. 0 "ecurity) "ensitive information must be carefully protected since system is accessible to many users . +oles in Design, &mplementation, and %aintenance of a T2" 0 "ystem Analyst specifies system,using input from customer and also provides complete description of functionality from customers and users point of view. 0 Database Designer specifies structure of data that will be stored in database 0 Application 2rogrammer implements application programs (transactions) that access data and support enterprise rules 0 Database Administrator maintains database once system is operational) space allocation, performance optimization, database security 0 "ystem Administrator maintains transaction,processing system, monitors inter connection of hardware and software modules, deals with failures and congestion.
A transaction begins when the first executable "ND statement is encountered. An executable "ND statement is a "ND statement that generates calls to an instance, including D%D and DDD statements.
?hen a transaction begins, D!%" assigns the transaction to an available undo tablespace or rollbac. segment to record the rollbac. entries for the new transaction. A transaction ends when any of the following occurs) 0 A user issues a (>%%&T or +>DD!A(I statement without a "A-#2>&1T clause. 0 A user runs a DDD statement such as (+#AT#, D+>2, +#1A%#, or ADT#+. &f the current transaction contains any D%D statements, D!%" first commits the transaction, and then runs and commits the DDD statement as a new, single statement transaction. 0 A user disconnects from D!%". The current transaction is committed. 0 A user process terminates abnormally. The current transaction is rolled bac.. After one transaction ends, the next executable "ND statement automatically starts the following transaction. 1ote that Applications should always explicitly commit or roll bac. transactions before program termination. (ommit Transactions (ommitting a transaction means ma.ing permanent the changes performed by the "ND statements within the transaction. !efore a transaction that modifies data is committed, the following has occurred) 0 D!%" has generated rollbac. segment records in buffers in the "HA that store rollbac. segment data. The rollbac. information contains the old data values changed by the "ND statements of the transaction. 0 D!%" has generated redo log entries in the redo log buffer of the "HA. The redo log record contains the change to the data bloc. and the change to the rollbac. bloc.. These changes may go to dis. before a transaction is committed. 0 The changes have been made to the database buffers of the "HA. These changes may go to dis. before a transaction is committed. 1ote that the data changes for a committed transaction, stored in the database buffers of the "HA, are not necessarily written immediately to the data files by the database writer (D!?n) bac.ground process. This writing ta.es place when it is most efficient for the database to do so. &t can happen before the transaction commits or alternatively, it can happen some time after the transaction commits.
?hen a transaction is committed, the following occurs)
0 The internal transaction table for the associated rollbac. segment records that the transaction has committed, and the corresponding uni ue system change number ("(1) of the transaction is assigned and recorded in the table. 0 The log writer process (DH?+) writes redo log entries in the "HA7s redo log buffers to the online redo log file. &t also writes the transactions "(1 to the online redo log file. This atomic event constitutes the commit of the transaction. 0 D!%" releases loc.s held on rows and tables. 0 D!%" mar.s the transaction complete.
+ollbac. of Transactions) +olling bac. means undoing any changes to data that have been performed by "ND statements within an uncommitted transaction. D!%" uses undo tablespaces or rollbac. segments to store old values. The redo log contains a record of changes. D!%" lets you roll bac. an entire uncommitted transaction. Alternatively, you can roll bac. the trailing portion of an uncommitted transaction to a mar.er called a savepoint. All types of rollbac.s use the same procedures) 0 "tatement,level rollbac. (due to statement or deadloc. execution error) 0 +ollbac. to a savepoint 0 +ollbac. of a transaction due to user re uest 0 +ollbac. of a transaction due to abnormal process termination 0 +ollbac. of all outstanding transactions when an instance terminates abnormally 0 +ollbac. of incomplete transactions during recovery &n rolling bac. an entire transaction, without referencing any savepoints, the following occurs) 3. D!%" undoes all changes made by all the "ND statements in the transaction by using the corresponding undo tablespace or rollbac. segment. 6. D!%" releases all the transaction7s loc.s of data. 8. The transaction ends.
"avepoints in Transactions
$ou can declare intermediate mar.ers called savepoints within the context of a transaction. "avepoints divide a long transaction into smaller parts.
Asing savepoints, you can arbitrarily mar. your wor. at any point within a long transaction. $ou then have the option later of rolling bac. wor. performed before the current point in the transaction but after a declared savepoint within the. transaction. 'or example, you can use savepoints throughout a long complex series of updates, so if you ma.e an error, you do not need to resubmit every statement.
"avepoints are similarly useful in application programs. &f a procedure contains several functions, then you can create a savepoint before each function begins. Then, if a function fails, it is easy to return the data to its state before the function began and re,run the function with revised parameters or perform a recovery action.
After a rollbac. to a savepoint, D!%" releases the data loc.s obtained by rolled bac. statements. >ther transactions that were waiting for the previously loc.ed resources can proceed. >ther transactions that want to update previously loc.ed rows can do so.
?hen a transaction is rolled bac. to a savepoint, the following occurs) 0 D!%" rolls bac. only the statements run after the savepoint. 0 D!%" preserves the specified savepoint, but all savepoints that were established after the specified one are lost. 0 D!%" releases all table and row loc.s ac uired since that savepoint but retains all data loc.s ac uired previous to the savepoint.
The transaction remains active and can be continued. 1ote that whenever a session is waiting on a transaction, a rollbac. to savepoint does not free rowloc.s. To ma.e sure a transaction doesn7t hang if it cannot obtain a loc., use '>+ A2DAT# X.. 1>?A&T before issuing A2DAT# or D#D#T# statements.
Transaction 1aming
$ou can name a transaction, using a simple and memorable text string. This name is a reminder of what the transaction is about. Transaction names replace commit comments for distributed transactions, with the following advantages) 0 &t is easier to monitor long,running transactions and to resolve in,doubt distributed transactions. 0 $ou can view transaction names along with transaction &Ds in applications. 'or example, a database administrator can view transaction names in #nterprise %anager when monitoring system activity. 0 Transaction names are written to the transaction auditing redo record. 0 Dog %iner can use transaction names to search for a specific transaction from transaction auditing records in the redo log. 0 $ou can use transaction names to find a specific transaction in data dictionary tables, such as -\T+A1"A(T&>1.
1ame a transaction using the "#T T+A1"A(T&>1 ... 1A%# statement before you start the transaction. ?hen you name a transaction, you associate the transaction7s name with its &D. Transaction names do not have to be uni ue: different transactions can have the same transaction name at the same time by the same owner. $ou can use any name that enables you to distinguish the transaction.
The Two,phase (ommit %echanism &n a distributed database, D!%" must coordinate transaction control over a networ. and maintain data consistency, even if a networ. or system failure occurs. A distributed transaction is a transaction that includes one or more statements that update data on two or more distinct nodes of a distributed database.
A two,phase commit mechanism. guarantees that all database servers participating in a distributed transaction either all commit or all roll bac. the statements in the transaction. A two, phase commit mechanism also protects implicit D%D operations performed by integrity constraints, remote procedure calls, and triggers.
The two,phase commit mechanism is completely transparent to users who issue distributed transactions. &n fact, users need not even .now the transaction is distributed. A (>%%&T statement denoting the end of a transaction automatically triggers the two,phase commit
mechanism to commit the transaction. 1o coding or complex statement syntax is re uired to include distributed transactions within the body of a database application.
The recoverer (+#(>) bac.ground process automatically resolves the outcome of in,doubt distributed transactions ,distributed transactions in which the commit was interrupted by any type of system or networ. failure. After the failure is repaired and communication is reestablished, the +#(> process of each local D!%" server automatically commits or rolls bac. any in,doubt distributed transactions consistently on all involved nodes.
&n the event of a long,term failure, D!%" allows each local administrator to manually commit or roll bac. any distributed transactions that are in doubt as a result of the failure This option enables the local database administrator to free any loc.ed resources that are held indefinitely as a result of the long,term failure. &f a database must be recovered to a point in the past, D!%"7s recovery facilities enable database administrators at other sites to return their databases to the earlier point in time also. This operation ensures that the global database remains consistent.
+ead;?rite >peration "ince a transaction is a general program, there are an enormous number of potential operations that a transaction can perform. @owever, there are only two really important operations) 3. read(A,t) (or read(A) when t is not important) G This operation is used to read database element A into local variable t. 6. write(A,t) (or write(A) when t is not important) G This operation is used to write the value of local variable t to the database element A. ?e will assume that the buffer manager insures that database element is in memory. ?e could ma.e the memory management more explicit by using following operations) 8. input(A) G This operation is used to read database element A into local memory buffer. 9. output(A) , This operation is used to copy the bloc. containing A to dis.. Det us consider an example to understand the use of read and write operation. "uppose that we want to transfer \<= from account A to account !, then the set of operations performed are) 3. read(A,t) 6. t S t G <=
8. write(A,t) 9. read(!,t) <. t S t C <= E. write(!,t) The first sep is used to read amount in account A with the help of read operation into the local variable t. &n step 6, we reduce the vale of t by <=. "tep 8 is used to write bac. the updated value to account A with the help of write operation. &n step 9, value of account ! is read into local variable t, which is incremented by <= in step <. -alue of t is written to account ! in step E with the help of write operation.
Transaction 2roperties (Acid 2roperties) Any change to system state within a transaction boundary, therefore, has to ensure that the change leaves the system in a stable and consistent state. A transactional unit of wor. is one in which the following four fundamental transactional properties are satisfied) atomicity, consistency, isolation, and durability (A(&D). ?e will examine each property in detail.
AT>%&(&T$ +esults of a transaction7s execution are either all committed or all rolled bac.. All changes ta.e effect, or none do. &t is common to refer to a transaction as a 4unit of wor..5 &n describing a transaction as a unit of wor., we are describing one fundamental property of a transaction) that the activities within it must be considered indivisible that is, atomic. A 'lute !an. customer may interact with 'lute7s AT% and transfer money from a chec.ing account to a savings account.
?ithin the 'lute !an. software system, a transfer transaction involves two actions) debit of the chec.ing account and credit to the savings account. 'or the transfer transaction to be successful, both actions must complete successfully. &f either one fails, the transaction fails. The atomic property of transactions dictates that all individual actions that constitute a transaction must succeed for the transaction to succeed, and, conversely, that if any individual action fails, the transaction as a whole must fail. As an example consider two transactions) T&) T6) !#H&1 A S A C 3==, ! S ! , 3== #1D !#H&1 A S 3.=EMA, ! S 3.=EM! #1D
&ntuitively, the first transaction is transferring \3== from !7s account to A7s account. The second is crediting both accounts with a E` interest payment. There is no guarantee that T& will execute before T6 or vice,versa, if both are submitted together. @owever, the net effect must be e uivalent to these two transactions running serially in some order. T&) T6) A S A C 3==, A S 3.=EMA, ! S ! G 3==, ! S 3.=EM!
This is >I. !ut what about this T&) T6) A S A C 3==, A S 3.=EMA, ! S ! , 3==, ! S 3.=EM!
The D!%" /s view of the second schedule is T3) T6) read(A), read(A), write(A), write(A), read(!), read(!), write(!) write(!)
'ig. A possible interleaving (schedule)
(onsistency The database is transformed from one valid state to another valid state. This defines a transaction as legal only if it obeys user,defined integrity constraints. &llegal transactions aren7t allowed and, if an integrity constraint can7t be satisfied then the transaction is rolled bac.. 'or example, suppose that you define a rule that, after a transfer of more than \3=,=== out of the country, a row is added to an audit table so that you can prepare a legally re uired report for the &+". 2erhaps for performance reasons that audit table is stored on a separate dis. from the rest of the database. &f the audit table7s dis. is off,line and can7t be written, the transaction is aborted.
A database or other persistent store usually defines referential and entity integrity rules to ensure that data in the store is consistent. A transaction that changes the data must ensure that the data remains in a consistent stateG that data integrity rules are not violated, regardless of whether the transaction succeeded or failed. The data in the store may not be consistent during the duration of the transaction, but the inconsistency is invisible to other transactions, and consistency must be restored when the transaction completes. &solation
?hen multiple transactions are in progress, one transaction may want to read the same data another transaction has changed but not committed. Antil the transaction commits, the changes it has made should be treated as transient state, because the transaction could roll bac. the change. &f other transactions read intermediate or transient states caused by a transaction in progress, additional application logic must be executed to handle the effects of some transactions having read potentially erroneous data. The isolation property of transactions dictates how concurrent transactions that act on the same subset of data behave. That is, the isolation property determines the degree to which effects of multiple transactions, acting on the same subset of application state, are isolated from each other.
At the lowest level of isolation, a transaction may read data that is in the process of being changed by another transaction but that has not yet been committed. &f the first transaction is rolled bac., the transaction that read the data would have read a value that was not committed. This level of isolation,read uncommitted, or 4dirty read5,can cause erroneous results but ensures the highest concurrency. An isolation of read committed ensures that a transaction can read only data that has been committed. This level of isolation is more restrictive (and conse uently provides less concurrency) than a read uncommitted isolation level and helps avoid the problem associated with the latter level of isolation.
An isolation level of repeatable read signifies that a transaction that read a piece of data is guaranteed that the data will not be changed by another transaction until the transaction completes. The name 4repeatable read5 for this level of isolation comes from the fact that a transaction with this isolation level can read the same data repeatedly and be guaranteed to see the same value. The most restrictive form of isolation is serializable. This level of isolation combines the properties of repeatable, read and read,committed isolation levels: effectively ensuring that transactions that act on the same piece of data are serialized and will not execute concurrently.
The isolation portion of the A(&D properties is needed when there are concurrent transactions. (oncurrent transactions are transactions that occur at the same time, such as shared multiple users accessing shared ob*ects. This situation is illustrated at the top of the figure < as activities occurring over time. The safeguards used by a D!%" to prevent conflicts between concurrent transactions are a concept referred to as isolation.
'ig. (oncurrently executing transaction As an example, if two people are updating the same catalog item, it7s not acceptable for one person7s changes to be 4clobbered5 when the second person saves a different set of changes. !oth users should be able to wor. in isolation, wor.ing as though he or she is the only user. #ach set of changes must be isolated from those of the other users. An important concept to understanding isolation through transactions is serializability. Transactions are serializable when the effect on the database is the same whether the transactions are executed in serial order or in an interleaved fashion. As you can see at the top of the figurel.<, Transactions 3 through Transaction 8 are executing concurrently over time. The effect on the D!%" is that th transactions may execute in serial order based on consistency and isolation re uirements. &f you loo. at the bottom of the figure 3.<, you can see several ways in which these transactions may execute. &t is important to note that a serialized execution does not imply the first transactions will automatically be the ones that will terminate before other transactions in the serial order. Degrees of &solation 0 Degree = A transaction does not overwrite data updated by another user or process (4dirty data7) of other transactions 0 Degree 3 Degree = plus a transaction does not commit any writes until it completes all its writes (until the end of transaction). 0 Degree 6 Degree 3 plus a transaction does not read dirty data from other transactions. 0 Degree 8 Degree 6 plus other transactions do not dirty data read by a transaction before the transaction commits.
These were originally described as degrees of consistency by Jam Hray 'or example let us consider two transactions) 'irst transaction transfers \3== from !7s account to A7s "econd transaction credits both accounts with E` interest. Det us assume at first A and ! each have \3===. Then what are the legal outcomes of running T& and T6W There is no guarantee that Ti will execute before T6 or vice,versa, if both are submitted together. (onsider a possible interleaved schedule
Durability >nce committed (completed), the results of. a transaction are permanent and survive future system and media failures. &f the airline reservation system computer gives you seat 66A and crashes a millisecond later, it won7t have forgotten that you are sitting in 66A and also give it to someone else. 'urthermore, if a programmer spills coffee into a dis. drive, it will be possible to install a new dis. and recover the transactions up to the coffee spill, showing that you had seat 66A.
The durability property of transactions refers to the fact that the effect of a transaction must endure beyond the life of a transaction and application. That is, state changes made within a transactional boundary must be persisted onto permanent storage media, such as dis.s, databases, or file systems. &f the application fails after the transaction has committed, the system should guarantee that the effects of the transaction will be visible when the application
restarts. Transactional resources are also recoverable) should the persisted data be destroyed, recovery procedures can be executed to recover the data to a point in time (provided the necessary administrative tas.s were properly executed). Any change committed by one transaction must be durable until another valid transaction changes the data.
"tates of Transaction During its execution, a transaction can be in many states. These states indicate the status of a transaction. -arious states in which a transaction can be are) 3. Active) This state is the initial state of a transaction. The transaction stays in this state while it is executing. A transaction enters active state when the first uery or update is encountered. Data is processed in buffer or on dis.. 6. 2artially committed) A transaction is partially committed after its final statement, has been executed. A transaction may change its state form active to partially committed one. A transaction enters this state immediately before the 4commit wor.5. All operations are completed (in the memory buffer or on dis.) and wait to be finalized. 8. 'ailed) A transaction enters the failed state after the discovery that normal execution can no longer proceed. A transaction may change is state form active to failed state. A transaction enter this state when the transaction is interrupted by an event such as a program exception or a system. 9. Aborted) A transaction is aborted after it has been rolled bac. and the database restored to its prior state before the transaction. A transaction enters this state after a 4rollbac. ,wor.5 or at the system recovery. All updates made by the transaction are rolled bac. and .the database is restored to the state prior to the start of the transaction. /There are two options after abort) 0 +estart the transaction) This option is selected only if there is no internal logical error. 0 Iill the transaction) This option is selected if there is problem with transaction itself. <. (ommitted) (ommit state occurs after successful completion. %ay also consider terminated as a transaction state A transaction enters this state after 4commit wor.5. Apdates are guaranteed to be permanent.
'ig. Transaction "tate Diagram
Advantages of (oncurrent #xecution of Transaction (oncurrent execution of transaction means executing more than one transaction at the same time. The schedule shown in fig. represents an interleaved execution of two transactions. #nsuring transaction isolation while permitting such concurrent execution is difficult.
Thus, multiple transactions are allowed to run concurrently in the system. Advantages of using concurrent execution of transaction are)
0 &ncreased processor utilization , &f system is executing only one transaction then processor might not be always busy for example if the only transaction in the system is waiting for the completion of some &;> operation, processor is also waiting and thus doing no tas.. >n the other hand if system is executing more than one transaction at same time, processor might be always busy executing one or the older transaction. 0 &ncreased Dis. utilization 0 !etter transaction throu3put , >ne transaction can be using the (2A while another is reading from or writing to the dis.
0 +educed average response time for transaction , As short transactions need not wait behind long ones. 0 +educed average turnaround time for transactions , Turnaround time is the time interval between transaction submission and transaction completion. As more than one transaction is executing at same time there is reduction in average turnaround time. 0. +educed average wait time for transactions , As more transactions are completed in less time.
Q. . What are the various &oc3in$ 8echni>ues for 0oncurrency 0ontro(?
Ans. A loc. is a variable associated with a data item in the database and describes the status of that item with respect to possible access operations to the item. Doc.s enable a multi, user D!%" to maintain the integrity of transactions by isolating a transaction from others executing concurrently. L.s are particularly critical in write, intensive and mixed wor.load (read;write) environments, because they can prevent the inadvertent loss of data or consistency problems with reads.
'igure 3= 3 depicts lost update situation that could occur if a D!%" did not loc. data Two transactions read the same ban. account balance, each intending to ad3 money to it @owever, because the second transaction bases its update on the original balance, the money deposited by the first is lost .
?e could have avoided this scenario if the D!%" had appropriately loc.ed the balance on behalf of the first transaction in preparation for its update. The second transaction would have waited, thereby using the updated balance as a basis for its wor..
As with loc.ing data in preparation for writes, loc.ing data for reads can be important in certain situations, preventing inconsistent analysis of the database. ?hile D!%"s use exclusive loc.s for writes, share loc.s are commonly used for reads. "hare loc.s enable other concurrently executing transactions to read the same data but prohibit any transaction from writing the chosen data.
(onsider a situation that might occur without share loc.s, as shown in 'ig 3= 6 The first transaction reads the balances of multiple accounts (perhaps through multiple ueries) with the intention of calculating an accurate sum Another transaction transfers money from one account to another during this process The timing of this wor. is such that it causes the first transaction to read only part of the effect of this transfer, thus ma.ing its sum total inconsistent with what it should be &f share loc.s ?ere held by the first transaction until transaction commit, this inconsistent analysis would not occur.
"erializability is an important concept associated with loc.ing. &t guarantees that the wor. of concurrently executing transactions will leave the database in consistent state as it would have been if these transactions had executed serially. This re uirement is the ultimate criterion for database consistency and is the motivation for the two,phase loc.ing protocol, which dictates that no new loc.s can be ac uired on behalf of a transaction after the D!%" releases a loc. held by that transaction. &n practice, this protocol generally means that loc.s are held until commit time.
Aside from their integrity implications, loc.s can have a significant impact on performance. ?hile it may benefit a given application to loc. a large amount of data (perhaps one or more tables) and hold these loc.s for a long period of time, doing so inhibits concurrency and increases the li.elihood that other applications will have to wait for loc.ed resources. $et loc.ing only small amounts of data and releasing these loc.s uic.ly may be inappropriate for some applications, increasing the overhead associated with transaction processing. &n addition, certain integrity problems can arise if a single transaction ac uires loc.s after some have already been released. 1eed. for Doc. Doc. is re uired for the following 0 1eed isolation (the 4&5 of A(&D)) 0 Hive each transaction the illusion that there are no concurrent updates. 0 @ide concurrency anomalies. 0 Do it automatically , (system does not .now transaction semantics) 0 Hoal of loc. ,To provide concurrency in a system execution e uivalent to some serial execution of the system ,1ot deterrninisticLutc6me *ust a consistent transformation Doc.s are a popular approach concurrency control. Transactions re uest and ac uire loc.s on data items which they wish to access and which they do not want other transaction to update. i.e. a loc. loc.s other transactions out. Transactions can not access data items unless they have the appropriate loc..
%ost loc.ing protocols are based on two types of loc.s) , ?+&T# (or exclusive) loc.s) if a transaction holds a write loc. on an item no other transaction may ac uire a read or write loc. on that item. +#AD (or shared) loc.s) if a transaction holds a read loc. on an item no other transaction may ac uire a write loc. on that item. Transactions go into a ?A&T state till re uired loc. is available. Ac uisition of loc.s is the responsibility of the transaction management subsystem. 'or strict schedules , i.e. simple recovery, transactions should hold all exclusive loc.s until (>%%&T or +>DD!A(I time. Thus
no transaction can read or update an item until the last transaction that updated it has committed and released the exclusive loc..
D>(I %A1AH#%#1T The art of the D!%" that .eeps trac. of the loc.s issued to transactions is called the loc. manager. The loc. manager maintains a loc. table which is a hah table with the data ob*ect identifier as the .ey. The D!%" also maintains a descriptive entry for each transaction in a transaction table, and among other things, the entry contains a pointer to a list of loc.s held by the transaction. A loc. table entry for an ob*ect which can be a page, a record, and so on, depending on the D!%" contains the following information) the number of transactions currently. holding a loc. on the ob*ect (this can be more than one if the ob*ect is loc.ed in shared mode), the nature of the loc. (shared or exclusive), and a pointer to a ueue of loc. re uests.
&mplementing Doc. and Anloc. +e uests According to the "trict 62D protocol, before a transaction T reads or writes a database ob*ect =, it must obtain a shared or exclusive loc. on = and must hold on to the loc. until it commits or aborts. ?hen a transaction needs a loc. on an ob*ect, it issues a. loc. re uest to the loc. manager) 3. &f a shared loc. is re uested, the ueue of re uests is empty, and the ob*ect is not currently loc.ed in exclusive mode, the loc. manager grants the loc. and updates the loc. table entry for the ob*ect (indicating that the ob*ect is loc.ed in shared mode, and incrementing the number of transactions holding a loc. by one). 6. &f an exclusive loc. is re uested, and no transaction currently holds a loc. on the ob*ect (which also implies the ueue of re uests is empty), the loc. manager grants the loc. and updates the loc. table entry. 8. >therwise, the re uested loc. cannot be immediately granted, and the loc. re uest is added to the ueue of loc. re uests for this ob*ect. The transaction re uesting the loc. is suspended. ?hen a transaction aborts or commits, it releases all its loc.s. ?hen a loc. on an ob*ect is released, the loc. manager updates the loc. table entry for the ob*ect and examines the loc. re uest at the head of the ueue for this ob*ect. &f this re uest can now be granted, the transaction that made the re uest is wo.en up and given the loc.. &ndeed, if there are several re uests for a shared loc. on the ob*ect at the front of the ueue, all of these re uests can now be granted together.
1ote that if T& has a shared loc. on =, and T6 re uests an exclusive loc., T67s re uest is ueued. 1ow, if T8 re uests a shared loc., its re uest enters the ueue behind that of T6, even
though the re uested loc. is compatible with the loc. held by T&. This rule ensures that .T6 does not starve, that is, wait indefinitely while a stream of other transactions ac uire shared loc.s and thereby prevent T6 from getting the exclusive loc. that it is waiting for.
Atomicity of Doc.ing and Anloc.ing, The implementation of loc. and unloc. commands must ensure that these are atomic operations. To ensure atomicity of these operations when several instances of the loc. manager code can execute concurrently, access to the loc. table has to be guarded by an operating system synchronization mechanism such as a semaphore. To understand why, suppose that a transaction re uests an exclusive loc.. The loc. manager chec.s and finds that no other transaction holds a loc. on the ob*ect and therefore decides to grant the re uest. !ut in the meantime, another transaction might have re uested and received a conflicting loc.O To prevent this, the entire se uence of actions in a loc. re uest call (chec.ing to see if the re uest can be granted, updating the loc. table, etc.) must be implemented as an atomic operation.
The D!%" maintains a transaction table, which contains (among other things) a list of the loc.s currently held by a transaction. This list can be chec.ed before re uesting a loc., to ensure that the same transaction does not re uest the same loc. twice. @owever, a transaction may need to ac uire an exclusive loc. on an ob*ect for which it already holds a shared loc.. "uch a loc. upgrade re uest is handled specially by granting the write loc. immediately if no other transaction holds a shared loc. on the ob*ect and inserting the re uest at the front of the ueue otherwise. The rationale for favoring the transaction thus is that it already holds a shared loc. on the ob*ect and ueuing it behind another transaction that wants an exclusive loc. on the same ob*ect causes both transactions to wait for each other and therefore be bloc.ed forever.
?e have concentrated thus far on how the D!%" schedules transactions, based on their re uests for loc.s. This interleaving interacts with the operating system7s scheduling of processes7 access to the (2A and can lead to a situation called a convoy, where most of the (2A cycles are spent on process switching. The problem is that a transaction T holding a heavily used loc. may be suspended by the operating system. Antil T is resumed, every other transaction that needs this loc. is ueued. "uch ueues, called convoys, can uic.ly become very long: a convoy, once formed, tends to be stable. (onvoys are one of the drawbac.s of building a D!%" on top of a general purpose operating system with preemptive scheduling. &n addition to loc.s, which are held over a long duration, a D!%" also supports short duration latches. "etting a latch before reading or writing a page ensures that the physical read or write operation is atomic: otherwise, two read;write operations might conflict if the ob*ects being loc.ed do not correspond to dis. pages (the units of & =). Datches are unset immediately after the physical read or write operation is completed.
T$2#" >' D>(I" Any data that are retrieved by a user for updating must be loc.ed, or denied to other users, until the update is completed or aborted. Doc.ing can be done at different levels. These levels includes database, table, record, field. Data items can be loc.ed in two modes 3. #xclusive (R) loc. Data item can be both read as well as written R,loc. as re uested using loc.,R instruction 2revent another transaction from reading a record until it is unloc.ed ( 6 "hared (") loc. Data item can only be read ",loc. is re uested using loc.," instruction Allow other transactions to read a record or other resource.
!inary Doc.s A binary loc. has only two states) loc.ed (3) or unloc.ed (=). &f a ob*ect is loc.ed by transaction, no other transaction can use that ob*ect. &f an ob*ect is unloc.ed, any transaction can loc. the ob*ect its use. A transaction must unloc. the ob*ect after its termination very transaction re uires a loc. and unloc. operation for each data item that is accessed. !inary loc.s are simple but restrictive. (hec.ing is done before entry is made, waiting is done when the ob*ect is found loc.ed, unloc. is done after use.
"hare;#xclusive Doc.s An exclusive loc. exists when access is specially reserved for, the transaction that loc.ed the ob*ect. The exclusive loc. must be used when the potential for conflict exists.
An exclusive loc. is issued when a transaction wants to write (update) data item and no loc.s are currently held on that data item. There are two basic re uirements of loc.ing) 0 +#AD operations (such as "#D#(T and '#T(@), ac uire "@A+# loc. before rows can be retrieved. 0 ?+&T# operations (such as A2DAT#, &1"#+T, and D#D#T#),. must ac uire #R(DA"&-# before rows can be modified. A "@A+#(") loc. permits reading by other users. 1o other transaction may modify the data that is loc.ed with an " loc..
0 ?hen an " loc. is obtained at the table level the transaction can read all rows in the table. 1o row or page level loc. are ac uired when the transaction reads a row (the " loc. at the table level covers all of the rows in the table, so additional loc.s are not necessary). 0 ?hen an s loc. is obtained at the page level the transaction can read all rows on the page. 1o row level loc.s are ac uired when the transaction reads a row (the " loc. at the page level covers all of the rows on the page). 0 ?hen an " loc. is obtained at the row level, the transaction can read the row. An #R(DA"&-# (R) loc. prevents access by any other user. An R loc. is the strongest type of loc.. 1o other transaction may read or modify the data that is loc.ed with an R loc.. An R loc. must be obtained (either at the table, page, or row level) when user data is updated, inserted, or deleted. 0 ?hen an R loc. is obtained at the table level, the transaction can read and modify all rows in the table. 1o row or page level loc.s are ac uired when the transaction reads or modifies a row. 0 ?hen an R loc. is obtained at the page level, the transaction can read and modify all rows on the page. 1o row level loc.s are ac uired when the transaction reads or modifies a row. 0 ?hen an R loc. is obtained at the row level, the transaction can read and modify the row.
Disadvantages of Doc.ing 2essimistic concurrency control has a number of .ey disadvantages, particularly in distributed systems) 0 >verheat) Doc.s cost, and you pay even if no conflict occurs. #ven read only actions must ac uire loc.a@igh overhead forces careful choices about loc. granularity. 0 Dow concurrency) &f loc.s are too coarse, they reduce concurrency unnecessarily. 1eed for strict 62D to avoid cascading aborts ma.es it even worse.
0 Dow availability) A client cannot ma.e progress if the server or loc. holder is temporarily unreachable. 0 Deadloc.. T?> 2@A"# D>(I&1H 2+>T>(>D
A loc.ing protocol is a set of rules followed by all transactions while re uesting and releasing loc.s +ules for Two,2hase Doc.ing 2rotocol are 0 Two transactions cannot have conflicting loc.s. 0 1o unloc. operation can precede a loc. operation in the same transaction. 0 1o data are affected until all loc.s are obtained that is, until the transaction is in its loc.ed point. Two phase loc.ing protocol is a protocol which ensures conflict,serializable schedules. 2hase 3) Hrowing 2hase 0 Transaction may obtain loc.s 0 Transaction may not release loc.s 2hase 6) "hrin.ing 2hase 0 Transaction may release loc.s 0 Transaction may not obtain loc.s
The protocol assures serializability. &t can be proved that the transactions can be serialized in the order of their loc. points (i.e. the point where a transaction ac uired its final loc.). Two,phase loc.ing does not ensure freedom from deadloc.s. (ascading roll,bac. is possible under two,phase loc.ing. To avoid this, follow a modified protocol called strict two, phase loc.ing. @ere a transaction must hold all its exclusive loc.s till it commits;aborts. +igorous two,phase loc.ing is even stricter) here all loc.s are held till commit; abort. &n this protocol transactions can be serialized in the order in which they commit. There can be conflict serializable schedules that cannot be obtained if two,phase loc.ing is used. @owever, in the absence of extra information (e.g., ordering of access to data), two, phase loc.ing is needed for conflict serializability in the following sense) 0 Hiven a transaction Ti that does not follow two,phase loc.ing, we can find a transaction T* that uses two,phase loc.ing, and a schedule for Ti and T* that is not conflict serializable. #xample of a transaction performing loc.ing) !egin(T3) Doc.(T3,A,") +ead (T3,A) Anloc.(T3,A) !egin(T6) Doc.(T6,!,") +ead (T6,!)
Anloc.(T6,!) Display(AC!) (ommit(T3) (ommit(T6) Doc.ing as above is not sufficient to guarantee serializability G if A and ! get updated in, between the read of A and !, the displayed sum would be wrong. Two,phase loc.ing with loc. conversions) 'irst 2hase) 0 can ac uire a loc.," on item 0 can ac uire a loc.,R on item 0 can convert a loc.," to a loc.,R (upgrade) "econd 2hase) 0 can release a loc.," 0 can release a loc.,R ) 0 can convert a 3oc.R to a loc.," (downgrade) This protocol assures serializability !ut still relies on the programmer to insert the various loc.ing instructions. &%2D#%#1TAT&>1 >' D>(I&1H A loc. manager can be implemented as a separate process to which transactions send loc. and unloc. re uests The loc. manager replies to a loc. re uest by sending a loc. grant messages (or a message as.ing the transaction to roll bac., in case of a deadloc.) The re uesting transaction waits until its re uest is answered The loc. manager maintains a data structure called a loc. table to record granted loc.s and pending re uests The loc. table is usually implemented as an in,memory hash table indexed on the name of the data item being loc.ed. &n loc. table, !lac. rectangles indicate granted loc.s, white ones indicate waiting re uests. Doc. table also records the type of loc. granted or re uested. 1ew re uest is added to the end of the ueue of re uests for the data item, and granted if it is compatible with all earlier loc.s. Anloc. re uests result in the re uest being deleted, and later re uests are chec.ed to see if they can now be granted. &f transaction aborts, all waiting or granted re uests of the transaction are deleted. loc. manager may .eep a list of loc.s held by each transaction, to implement this efficiently.
2roblems ?ith Two 2hase Doc.ing 2rotocol (onsider the partial schedule
'ig. An example schedule 1either T8 nor T9 can ma.e progress G executing loc.,"(!) causes T9 to wait for T8 to release its loc. on !, while executing loc.,R(A) causes T8 to wait for T9 to release its loc. on A. "uch a situation is called a deadloc.. To handle a deadloc. one of T8 or T9 must be rolled bac. and its loc.s released. The potential for deadloc. exists in most loc.ing protocols. Deadloc.s are a necessary evil. "tarvation is also possible if concurrency control manager is badly designed. 'or example) A transaction may be waiting for an R,loc. on an item, while a se uence of other transactions re uest and are granted an ",loc. on the same item. The same transaction is repeatedly rolled bac. due to deadloc.s. (oncurrency control manager can be designed to prevent starvation. 2+#(#D#1(# H+A2@ 2recedence graph is used, for testing serializability of a schedule. There is one node for each transaction in the schedule.
&f the precedence graph has a cycle the schedule is not serializable. &f it has no cycle, any ordering of the transactions which obeys the arrows is an e uivalent serial schedule, so the schedule is serializable. "olution >f &nconsistency 2roblem 2roblem of inconsistent analysis can be solved with the help of loc.s. Det us understand this with the help of following example &nitial values) A S \9==, ! S \<==, and ( S \8==
1ote that the above example leads to a deadloc.. "till, it is an acceptable solution because the A(&D properties are preserved.
Q. !. What are 0oncurrency 0ontro( Based on 8imestamp ,rderin$?
Ans. (oncurrency control ) concurrent control is a method used to ensure that database transaction are executed in a safe manner it process of management operations against a database so tat 3bta operation do not interfere with each other in a multi,user environment. >ne such that shared database is used in on,line manner is the database for an airline reservation that is used by many agents accessing the database from their terminals A database could also be accessed in hatch mode and it is concurrently used with the online mode The sharing of the database for read only access not cause any problem, but if one of the transactions running concurrently tries to modify same data item, it could lead to inconsistencies.
'urther if more than one transaction is allowed to simultaneously modify a data item in a database. &t could lead to incorrect value for the data item and an inconsistent database will be created. 'or example, suppose that two tic.ed agents access the online reservation system simultaneously to see, if the seat is available on a given flight or not and if both agents ma.e the reservation against the last available seat on that flight then a message of overboo.ing will be displayed. This will ma.e the data in inconsistent way ?e can say that concurrent processing of the programs, process or *ob are similar to 33ie multiprogramming, i.e. no. of *obs or programs processed simultaneously to achieve their independent B different goals according to their own re uirements.
"ome concurrency problem, when we apply a correct in the concurrent processing then we see that databases becomes inconsistent after the completion of the transaction &n the case of concurrent operation where a number of transactions are running and using the database we cannot ma.e any assumption about the order in which the statement belonging to different transactions will be executed The order in which these statements are executed is called "chedu8 "o the processing of these statements which are in schedule and used in concurrent operation and we cannot change the schedule is called concurrent schedule. "ome problems occur during scheduling and concurrent processing are as) (a) Dost update problem (onsider the two transactions given below and these transactions are accessing the same data item A. #ach of these transactions modifies the data item and write it bac.. Then we see that he concurrent processing of the modification of the value of A
(b) &nconsistent +ead 2roblem The lost update problem was caused, by concurrent modification of same data item. @owever concurrency can also cause problem when only one transaction modifies given set of data while that set of data is being used by other transaction. 'or example, if here are two transactions occur T< and TE in a schedule "uppose A and ! represents some data items having integer value then if both are concurrently processed then one processing will be reading the data and other will be modifying the data. "o it will create an inconsistency in the reading in next transactions that which data is correct for reading and which is incorrect.
(c) The phantom phenomenon &n phantom, phenomenon let us consider an organization where parts are purchased and .ept in stoc. The parts are withdrawn from the stoc. and us number of pro*ects To chec. the extent of loss, we want to see that whether (or if) current uantity of some part purchased and received is e ual to the current sum of the uantity of that part in stoc., plus the current uantity is used by various pro*ects The phantom problem means if additional items are added, this additional information reflects the transaction and uery during the concurrent processing. This problem could be prevented by using the concept of Doc.ing i.e. loc.ing of such type of records also prevents the addition of such phantom records.
ot a candidate .ey. This step will produce a collection of !(1' relations. (d) "emantic of (oncurrent Transaction As we ta.e two different transactions for different ordering, it is not necessary that the two transactions are commutative. "uppose two transactions.
'or example, whether syntax of operation as same, but semantics of these transactions are different during concurrent processing. An important part of concurrency is serial execution or seralizability. ?hen we let some independent transactions in a schedule by setting them in a order such that their execution becomes serially then these type of execution is called serial execution or serialzability.
"ome problem of concurrent processing are removed by serial execution by setting or ordering the operation in a particular se uences.
"enalizablity A non,serial schedule that is e uivalent to some serial execution of transactions is called a serializiable schedule, 'or example m below written three schedules, schedule, 8 is "erializable schedule and is e uivalent to the scheule,3 and schedule,6. The purpose of "erializable scheduling is to find the non,serial schedules that allows the transactions to execute concurrently without interfering with one another and therefore produces a database state that could be produced by a serial execution.
1ote that serializability also removes the problem of inconsistency.
Definition) The given interleaved execution of some transactions is said to be serializable if it produces the same results as some serial execution of the transactions.
&n serializablity the ordering of read and write operations are important before any operation to avoid any type of confusion or in,consistency.
"erializablity Test can be explained and solved with the help of 2recedence Hraph. "olution to these problems &f all schedules in concurrent environment are restricted to serializable schedule, the result obtained will be consistent with some serial execution of the transaction and will be considered
correct Also testing of serialzability of a schedule is not only expensive but it is impractical some time. Thus one of the following concurrent control schemes is applied m concurrent database environment to ensure that the schedule produced by concurrent transaction are serializable. "ome concurrency controls schemes used to solve all the problems occurs during the concurrent scheduling are as discussed below
Doc.ing "cheme
(ii) Time stamp based order (iii) >ptimistic scheduling (iv) %ulti version techni ue
Doc.ing 'rom the point of view of Doc.ing a database can be considered as being made up of set of data items A loc. is a variable associated with each such data item %anipulating the value of the loc. is called D>(I&1H The value of loc. variable is used m loc.ing scheme to control the concurrent access and manipulation of the associated data item. The loc.ing is done by a subsystem of D!%" and such system is called Doc. %anager. There are two, types of Doc.) (a) #xclusive Doc.) #xclusive Doc. is also called update or writes loc.. The intention of their mode of loc.ing is to provide exclusive use of data items to one transaction. &f a transaction T loc.s a data item in an exclusive mode, no other transaction can access N or not even read N until the loc. is released by transaction.
(b) "hared Doc. ) "hare Doc. is also called a read loc.. Any number of transactions can concurrently loc. and access a data item in ,the shared mode but none of thes transactions can modify the data item. A data item loc.ed in a shared mode cannot be loc.ed in the exclusive mode until the shared loc. is released by all transactions holding the loc.. A data item in the exclusive mode cannot be loc.ed in the share mode untill the exclusive loc. on the data item is released.
Two 2hase Doc.ing) Another method of loc.ing is called Two 2hase Doc.ing. &n this once a loc. is released no additional loc. are re uested. &n other words the release of the loc. is
delayed, until all the loc.s on all data items re uired by the transactions have been ac uired. &t has two phases, a growing phase in which the number of loc.s increase from = to maximum for the transaction and a contracting phase in which the number of loc.s held decreases from maximum to zero. !oth of these phases is monotonic ie. the number of loc.s are only increasing in first phase and decreasing in the 6 phase. >nce a transaction starts releasing loc.s, it is not allowed to re uest any further loc.s. &n this way a transaction is obliged to re uest all loc.s it may need during its life before it releases any. This leads to control and lower the degree of (oncurrency.
(ii) Time stamp based order &n time stamp based method, a serial order is created among the concurrent transaction by assigning to each transaction a uni ue non,decreasing number. The usual value assigned to each transaction is the system cloc. value at the start of the transaction. &t is called Time "tamp ordering. There are two types of. time stamp) (a) ?rite time stamp (b) +ead time stamp. A variation of this scheme is used in a distributed environment includes the site of a transaction appendid to the system wide cloc. value. This value can then be the system wide cloc. value. This value can then be used on deciding the order in which the conflict between two transactions is resolved. A transaction with a smaller time stamp value is considered to be an /older7 transaction than another transaction with a larger time stamp value. Data item R is thus represented by triple R set as R) TR, ?x, +x) where each R is represented as) R The value of data item. ?x ) The write time stamp value, the largest time stamp value of any transaction that was allowed to write a value of R. +x ) The read time stamp value, the largest timestamp value of any transaction that was allowed to read the current value R. (iii) >ptimistic "cheduling &n the optimistic scheduling schema, the philosophy is to assume that all data items can be successfully updated at the end of a transaction and to read in the values for data item without any loc.ing +eading is done when re uired and &f any data item is found to be inconsistent at the end of a transaction then the transaction is rolled bac. (Ased for recovery procedure in D!%"). &n optimistic scheduling each transaction has three phases)
(a) The read phase ) This phase starts with the activation of a transaction and in this all data items are read into local variables and any modification that are made are only to those local copies. This ends with commitment. (b) -alidation phase) &n this when data items are modified it chec. that data after the procedure be rolled bac.. (c) ?rite phase) ?hen transaction passes. the validation phase, then whole transaction be written into secondary storage data An optimistic scheme does not use loc. and so it is dead loc. free even through starvation can still occur. (iv) %ulti version techni ue &t is also called time domain addressing scheme, which follows the accounting principle of never overwriting a transaction Any charge are achieved by entering compensating transaction e g m this R is achieved by ma.ing a new copy or version of data item R "o it is called %ulti versions &n this way a history of evolution of the value of data item is recorded in the database. ?ith multi version techni ue, write operations can occur concurrently. "ince they do not overwrite with each other. Also read operation can read any version.
Q. ". What do you mean by Database 5ecovery 8echni>ues?
Ans. +ecovery ) A computer system is an electromechanical device sub*ect to failures of various types. +ecovery is the procedure through which data can be again collected, recalled or accessed by using different mechanisms, which has been lost during the processing or due to failure of any type. The types of failures that the computer system is li.ely to be sub*ected to include failures of components or sub systems, software failure, power failure, accidents, natural or man,made disasters. Database recovery techni ue or methods of ma.ing the database formation in valid form and original form, which is damaged by any failure. The aim of the recovery scheme is to allow database operations to be resumed after a failure with a minimum loss of information at an economically *ustifiable cost. +ecovery schemes can be classified as forward or bac.ward recovery. Database system term use the later scheme to recover from errors. (a) 'orward #rror +ecovery) &n this scheme when a particular error in the system is detected, the recovery system ma.es an accurate assessment of the state of the system and then ma.es appropriate ad*ustment based on the anticipated result which ma.e the system error free. The aim of the ad*ustment is to restore the system so that the effects of the error are cancelled and system can continue to operate. This scheme is not applicable to unanticipated errors.
(b) !ac.ward #rror +ecovery) &n this scheme no attempt is made to extrapolate and no state is accessed which is error free. &n this system is reset to some previous correct state that is .nown to be free of any errors. $ou can ta.e bac.up from floppy dis.s for bac.ward recovery.
+ecovery in a centralized D!%" &n a centralized D!%" if there be any failure, then there are some methods used to recover the data. "ome methods used to recover the data are) (i) To set transaction mar.er for transaction identification through which we can access the data. (ii) !y applying some operations on record These operations are insert, delete and modify (iii) To set log on the system transaction &n an online database system, for example, an airline reservation system, there could be hundreds of transactions handled per minute. The log for this type of database contains a very large volume of information. A scheme called chec.point is used to limit the7 volume of log information that has to be handled B processed in the events of a system failure involving the loss of volatile information The chec. point scheme is an additional component of the logging scheme described above. A chec.point operation performed periodically copies log information onto stable storage.
Q # What are the various Database Security /ssue?
Ans. 3. A computer system operator or system programmer can intentionally by pass the normal security and integrity mechanisms, alter or destroy The data in the database or ma.e unauthorized copies of sensitive data. 6. An unauthorized user can bet access to a secure terminal or the password of an authorized user and compromise the database. "uch user could also destroy the database file. 8. Authorized users could pass on sensitive information under pressure or form personal gain. 9. "ystem and application programmers could by pass normal security in their program by directly accessing database files and ma.ing changes and copies for illegal use. <. An unauthorized person can get access to the computer system, physically or by using communication channel and compromise the database.
Q. %. What do you mean by the term dead(oc3?
Ans. (onsider the following example) transaction Ti gets an exclusive loc. on ob*ect A, T6 gets an exclusive loc. on !, Ti re uests an exclusive loc. on ! and is ueued, and T6 re uests an exclusive loc. on A and is ueued. 1ow, T& is waiting for T6 to release its loc. and T6 is /waiting for T& to release its loc.O "uch a cycle of transactions waiting for loc.s to be released is called a deadloc. )(learly, these two transactions will ma.e no further progress. ?orse, they hold loc.s that may be re uired by other transactions. The D!%" must either prevent or detect (and resolve) such deadloc. situations. Deadloc. 2revention ?e can prevent deadloc.s by giving each transaction a priority and ensuring that lower priority transactions are not allowed to wait for higher priority transactions (or vice versa). >ne way to assign priorities is to give each transaction a timestamp when at starts up.The lower the timestamp, the higher the transaction7s priority, that is, the oldest transaction has the highest priority.
&f a transaction Ti re uests a loc. and transaction T* holds a conflicting loc., the loc. manager can use one of the following two policies)
?ait,die &f Ti has higher priority, it is allowed to wait, otherwise it is aborted ?ound,wait &f Ti has higher priority, abort T*, otherwise ti waits.
&n the wait,die scheme, lower priority transactions can never wait for higher priority transactions. &n the wound,wait scheme, higher priority transactions never wait for lower priority transactions. &n either case no deadloc. cycle can develop. A subtle point is that we must also ensure that no transaction is perennially aborted because it never has a sufficiently high priority. (1ote that in both schemes, the higher priority transaction is never aborted.) ?hen a transaction is aborted and restarted, it should be given the same timestamp that it had originally. +eissuing timestamps in this way ensures that each transaction will eventually become the oldest transaction, and thus the one with the highest priority, and will get all the loc.s that it re uires.
The wait,die scheme is nonpreemptive: only a transaction re uesting a loc. can be aborted. As a transaction grows older (and its priority increases), it tends to wait for more and more younger transactions. A younger transaction that conflicts with an older transaction may be repeatedly aborted (a disadvantage with respect to wound wait, but on the other hand, a transaction that has all the loc.s it needs will never be aborted for deadloc. reasons (an advantage with respect to wound,wait, which is preemptive).
Deadloc. Detection Deadloc.s tend to be rare and typically involve very few transactions. This observation suggests that rather than ta.ing measures to prevent deadloc.s, it may be better to detect and resolve deadloc.s as they arise. &n the detection approach, the D!%" must periodically chec. for deadloc.s. ?hen a transaction Ti is suspended because a loc. that it re uests cannot be granted, it must wait until all transactions T* that currently hold conflicting loc.s release them.
The loc. manager maintains a structure called a waits,for graph to detect deadloc. cycles. The nodes correspond to active transactions, and there is an arc from Ti to T* if (and only if) Ti is waiting for T* to release a loc.. The loc. manager adds edges to this graph when it ueues loc. re uests and removes edges when it grants loc. re uests. >bserve that the waits,for graph describes all active transactions, some of which will eventually abort. &f there is an edge from Ti to T* in the waits,for graph, and both Ti and T* eventually commit, there will be an edge in the opposite direction (from T* to Ti) in the precedence graph (which involves only committed transactions). The waits, for graph is periodically chec.ed for cycles, which indicate deadloc., A deadloc. is resolved by aborting a transaction that is on a cycle and releasing its loc.s: this action allows some of the waiting transactions to proceed.
'ig. ?atts,for Hraph before and after Deadloc. As an alternative to maintaining a waits,for graph, a simplistic way to identify deadloc.s is to use a timeout mechanism) if a transaction has been waiting too long for a loc., we can assume (pessimistically) that it is in a deadloc. cycle and abort it.
Q. *. What is seria(iEabi(ity of schedu(es?
Ans. "erializablity A non,serial schedule that is e uivalent to some serial execution of transactions is called a serializiable schedule, 'or example in below written three schedules, schedule,8 is "erializable schedule and is e uivalent to the scheule,3 and schedule,6. The purpose of "erializable scheduling is to find the non,serial schedules that allows the transactions to execute concurrently without interfering with one another and therefore produces a database state that could be produced by a serial execution. 1ote that serializability also removes the problem of inconsistency.
Definition) The given interleaved execution of some transactions is said to be serializable if it produces the same results as some serial execution of the transactions. &n serializablity the ordering of read and write operations are important before any operation to avoid any type of confusion or in,consistency. "erializablity Test can be explained and solved with the help of 2recedence Hraph.
Q 1. Define the concept of a$$re$ation 6ive t'o e2amp(es of 'here this concept is usefu(?
Ans. "electing the data n group of records is called aggregation Data aggregation is in which information is gathered and eprdJiia.summary.3orzn, for purposes such as statistical
analysis A common aggregation purpose is to get more information about particular groups based on specific variables such as age profession or income.
Q 11 0ompare the shado')pa$in$ recovery scheme 'ith (o$)based recovery Schemes?
Ans. %odifying the database without ensuring that the transaction will commit may leave the database in an inconsistent state. (onsider transaction Ti that transfers \<= from account A to account !: goal is either to perform all database modifications made by Ti or none at all. "everal output operations may be re uired for Ti (to output A and !). A failure may occur after one of these modifications has been made but before all of them are made. To ensure atomicity despite failures) 'irst output information describing the modifications to stable storage without modifying the database itself: only then start modifying the database. ?e study two approaches) 3. Dog ,based recovery 6. "hadow paging ?e assume (initially) that transactions run serially, that are, one after the other. A log) 0 "e uence of log records 0 %aintains a record of update activities on the database 0 Iept on stable storage. 0 ?hen transaction Ti starts, it writes log record YTi startZ 0 !efore Ti executes write (R), it writes log record YTi, R, -&, -6Z) 0 -3 is the value of R before the write (for undo) 0 -6isthevaluetobewrittentoR 0 ?hen Ti commits, it writes log record YTi commitZ 0 ?hen Ti aborts3 it writes log record YTi abort Z
?e assume for now that log records are written directly to stable storage (that is, they are not buffered) The two approaches using logs are 3. Deferred database modification 6. &mmediate database modification Deferred Database %odification 0 +ecords all modifications to the log 0 Defers all the writes to after partial commit 0 Transaction starts by writing YT3 startZ record to log. 0 A write (R) operation results in writing a log record YT3, R, -Z 0 - is the new value for R 1ote) >ld value is not needed for this scheme. The write is not performed on R at this time, but is deferred. ?hen T3 partially commits, YT3 commitZ is written to the log. 'inally, the log records are read and used to, /actually execute the previously deferred write s. During recovery after a crash, a transaction needs to be redone if and only if both Y T3 startZ and YT commitZ are in the log. +edoing a transaction T, (redo Ti) sets the value of all data items updated by the transaction to the new values. (rashes can occur while the transaction is executing the original updates, or while recovery action is being ta.en. #xample transactions T= and T3 (T= executes before T3)) T=) read(A) A)A<= ?rite(A) read(!) !)!C <= write(!) Det the original value of A be 3===, that of ! be 6=== and that of ( be F==. T3) read(() ()(3== write (()
Det us handle the cases when crashes occur at three different instances as shown in (a),(b), (c)
Dog at three instances of time (a), (b), (()) (rash at (a)) 1o redo actions need to be ta.en (b)) redo (T=) must be performed since YT= commitZ is present (c)) redo (T=) must be performed followed by redo (T3) since YT= commitZ and YT3 commitZ are present &mmediate Database %odification 0 Allows database updates of an uncommitted transaction. 0 Andoing may be needed. 0 Apdate logs must have both old value and new value. 0 Apdate log record must be written before database item is written ?e assume that the log record is output directly to stable storage output of updated bloc.s. &t can ta.e place at any time before or after transaction commit order in which bloc.s are output can be different from the order in which they are written. &mmediate Database %odification #xample
+ecovery procedure has two operations instead of one) undo (Ti) restores the value of all data items updated by Ti to their old values, going bac.wards from the last log record for Ti redo (Ti) sets the value of all data items updated by Ti to the new values, going forward from the first log record for Ti !oth operations must be idem potent, that is, even if the operation is executed multiple times the effect is the same as if it is executed once. &t is needed since operations may get re executed during recovery. ?hen recovering after failure, transaction Ti needs to be undone if the log contains the record YTi startZ, but does not contain the record YTi commitZ. Transaction Ti needs to be redone if the log contains. !oth the record YTi startZ and the record YTi commitZ. Ando operations are performed first, then redo operations. &mmediate Database %odification +ecovery
Dog at three instances of time (a), (b), (c) +ecovery actions in each case above are) (a) undo (T=)) ! is restored to 6=== and A to 3===. (b)undo (T3) and redo (T=)) ( is restored to F==, and then A and ! are set to L<= and 6=<= respectively. (c) redo (T=) and redo (T3)) A and ! are set to L<= and 6=<= respectively. Then ( is set to E==
(hec.points 2roblems in recovery procedure: 0 "earching the entire log is time consuming 0 ?e might unnecessarily redo transactions that have already output their updates to the database. 0 "treamline recovery procedure by periodically performing 0 (hec. pointing procedure 0 >utput all log records currently residing in main memory onto stable storage. 0 >utput all modified buffer bloc.s to the dis. 0 ?rite a log record Ychec.point Z onto stable storage. (hec.point system failure
During recovery we need to consider only the most recent transaction Ti that started before the chec.point, and transactions that started after Ti.
Dog based recovery The log, sometimes called the trail or *ournal, is a history of actions executed by the D!%". 2hysically, the log is a file of records stored in stable storage, which is assumed to survive crashes: this durability can be achieved by maintaining two or more copies of the log on deferent dis.s (perhaps in different locations), so that the chance of all copies of the log being simultaneously lost is negligibly small.
The most recent portion of the log, called the log tail, is .ept in main memory and is periodically forced to stable storage. This way, log records and data records are written to dis. at the same granularity (pages or sets of pages).
#very log record is given a uni ue id called the log se uence number (D"1). As with any record id, we can fetch a log record with one dis. access given the D"1. 'urther, D"1s should be assigned in monotonically increasing order: this property is re uired for the A+&#" recovery algorithm. &f the log is a se uential file, in principle growing indefinitely, the D"1 can simply be the address of the first byte of the log record. 'or recovery purposes, every page in the database contains the D"1 of the most recent log record that describes a change to this page. This D"1 is called the page D"1.
A log record is written for each of the following actions) 3. Apdating a page) After modifying the page, an update type record (described later in this section) is appended to the log tail. The page D"1 of the page is then set to the D"1 of the update log record. (The page must be pinned in the buffer pool while these actions are carried out.) 6. (ommit) ?hen a transaction decides to commit, it force,writes a commit type log record containing the transaction id. That is, the log record is appended to the log, and the log tail is written to stable storage, up to and including the commit record.6 The transaction is considered to have committed at the instant that its commit log record is written to stable storage. ("ome additional steps must be ta.en, e.g., removing the transaction7s entry in the transaction table: these follow the writing of the commit log record.) 8. Abort ) ?hen a transaction is aborted, an abort type log record containing the transaction id is appended to the log, and Ando is initiated for this transaction
9. #nd As noted above, when a transaction is aborted or committed, some additional actions must be ta.en beyond writing the abort or commit log record. After all these additional steps are completed, an end type log record containing the transaction id is appended to the log. <. Andoing an update ) ?hen a transaction is rolled bac. (because the transaction is aborted, or during recovery from a crash), its updates are undone. ?hen the action described by an update log record is undone, a compensation log record, or (D+, is written.
#very log record has certain fields) prevD"1, translD, and type. The set of all log records for a given transaction is maintained as a lin.ed list going bac. in time, using the prevD"1 field: this list must be updated whenever a log record is added. The tras&D field is the id of the transaction generating the log record, and the type field o&viously indicates the type of the log record.
Additional fields depend on the type of the log record. ?e have already mentioned the, additional contents of the various log record types, with the exception of the update and compensation log record types, which we describe next. Apdate Dog +ecords.
The pagelD field is the page id of the modified page: the length in bytes and the offset of the change are also included. The before,image is the value of the changed bytes before the change: the after,image is the value after the change. An update log record that contains both before, and after,images can be used to redo the change and to undo it. &n certain contexts, which we will not discuss further, we can recognize that the change will never be undone (or, perhaps, redone). A redo,only update log record will contain *ust the after,image: similarly an undo,only update record will contain *ust the before,image. Dog 'ile (ontains information about all updates to database) 0 Transaction records. 0 (hec.point records. Transaction records contain) 0 Transaction identifier 0 Type pf log record, (transaction start, insert, update, delete, abort, commit). 0 &dentifier of data item affected by database action (insert, delete, and update operations). 0 !efore,image of data item. 0 After,image of data item.
0 Dog management information. A techni ue often used to perform recovery is the transaction log or *ournal 3. +ecords information about the progress of transactions in a log since the last consistent state. 6. The database therefore .nows the state of the database before and after each transaction. 8. #very so often database is returned to a consistent state and the log may be truncated to remove committed transactions. 9. ?hen the database is returned to a consistent state the process is often referred to as chec.pointing.
Q. 12. What do you understand by a distributed database?
Ans. Distributed database technology is recent development within overall database field. Distributed database can be defined as 4&t is a system consisting of data with different parts under the control of separate D!%" running on interconnected way. #ach system has autonomous processing capability and is applicable for the local application.5 #ach system participates in the more global applications. Distributed data are capable of handling both local and global transactions. Distributed database are handled or controlled by DD!%" (Distributed D!%").
A distributed database system is also defined as it is not stored at a single physical location and it is spread across networ. of computer that are geographically dispersed and is connected by communication lin. or by using networ.. Distributed database has sharing of data. Distributed database is always available and it is reliable. Also we can do the increments growth (addition of data) of data in a distributed system.
A Nuery in a distributed database is divided into sub, uery and all the sub, ueries are parallel evaluated. 'or example, consider a ban.ing system in which customer account database is distributed across the ban. branch offices, such that each individual customer can process his data or record at the local branch. &n other words we can say data is stored at all the locations and any customers can access his data from any location via the communication networ.. &t means customer data is distributed to all the locations and so we call it distributed database. >ne more advantage of distributed database system is that it loo.s li.e a centralized system to
the user. 'or example &ndian +ailway reservation system has a distributed database system, which can be accessed at any location by any station. (hallenges to Distributed "ystem Z %onotonicity) >nce something is published in an open distributed system, it cannot be ta.en bac.. Z 2luralism) Different subsystems of an open distributed system include heterogeneous, overlapping and possibly conflicting information. There is no central arbiter of truth in open distributed systems. Z Anbounded nondeterminism) Asynchronously, different subsystems can come up and go down and communication lin.s can come in and go out between subsystems of an open distributed system. Therefore the time that it will ta.e to complete an operation cannot be bounded in advance A scalable system is one that can easily be altered to accommodate changes in the number of users, resources and computing entities affected to it. "calability can be measured in three different dimensions) 0 Doad scalability) A distributed system should ma.e it easy for us to expand and contract its resource pool to accommodate heavier or lighter loads. 0 Heographic scalability) A geographically scalable system is one that maintains its usefulness and usability, regardless of how far apart its users or resources are. 0 Administrative scalability) 1o matter how many different organizations need to share a single distributed system, it should still be easy to use and manage. "ome loss of performance may occur in a system that allows itself to scale in one or more of these dimensions.
A multiprocessor system is simply a computer that has more than one (2A on its motherboard. &f the operating system is built to ta.e advantage of this, it can run different processes on different (2As, or different threads belonging to the same process. >ver the years, many different multiprocessing options have been explored for use in distributed computing. &ntel (2As employ a technology called @yperthreading that allows more than one thread (usually two) to run on the same (2A. The most recent "un A3tra"2A+( T&, Athlon E9 R6 and &ntel 2entium D processors feature multiple processor cores to also increase the number of concurrent threads they can run.
A multicomputer system is a system made up of several independent computers interconnected by a telecommunications networ.. %ulticomputer systems can be homogeneous or heterogeneous) A homogeneous distributed system is one where all (2As are similar and are connected by a single type of networ.. They are often used for parallel computing which is a .ind of distributed computing where every computer is wor.ing on different parts of a single problem.
&n contrast an heterogeneous distributed system is one that can be made up of all sorts of different computers, eventually with vastly differing memory sizes, processing power and even basic underlying architecture. They are in widespread use today, with many companies adopting this architecture due to the speed with which hardware goes obsolete and the cost of upgrading a whole system simultaneously.
-arious hardware and software architectures exist that are usually used for distributed computing. At a lower level, it is necessary to interconnect multiple (2As with some sort of networ., regardless of that networ. being printed onto a circuit board or made up of several loosely,coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those (2As with some sort of communication system. 0 (lient,server ) "mart client code contacts the server for data, then formats and displays it to the user. &nput at the client is committed bac. to the server when it represents a permanent change. 0 8,tier architecture ) Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. %ost web applications are 8,Tier. 0 1,tier architecture) 1,Tier refers typically to web applications which further forward their re uests to other enterprise services. This type of application is - the one most responsible for the success of application servers. 0 Tightly coupled (clustered) ) refers typically to a set of highly integrated machines that run the same process in parallel, subdividing the tas. in parts that are made individually by each one, and then put bac. together to ma.e the final result. 0 2eer,to,peer ) an architecture where there is no special machine or machines that provide a service or manage the networ. resources. &nstead all responsibilities are uniformly divided among all machines, .nown as peers.
0 "ervice oriented ) ?here system is organized as a set of highly reusable services that could be offered through a standardized interfaces. 0 %obile code ) !ased on the architecture principle of moving processing closest to source of data 0 +eplicated repository) ?here repository is replicated among distributed system to support online ; offline processing provided this lag in data update is acceptable. Distributed computing implements a .ind of concurrency. The types of distributed computers are based on 'lynn7s taxonomy of systems:. single instruction, single data ("&"D),multiple instruction, single data (%&"D), single instruction, multiple data ("&%D) and multiple instruction, multiple data (%&%D).
13. Write short notes on the fo((o'in$ B 9a; Mu(tip(e 6ranu(arity 9b; 8ransaction Drocessin$ Systems
Ans. (a) %ultiple Hranularity Another specialized loc.ing strategy is called multiple, granularity loc.ing, and it allows us to efficiently set loc.s on ob*ects that contain other ob*ects. 'or instance, a database contains several flies, a file is a collection of pages, and a page is a collection of records A transaction that expects to access most of the pages in a file should probably set a loc. on the entire file, rather than loc.ing individual pages (or records7) as and when it needs them.
Doing so reduces the loc.ing overhead considerably >p the other hand, other transactions that re uire access to parts of the file , even parts that are not needed by this transaction are bloc.ed &f a transaction accesses relatively few pages of the %e, it is better to loc. only those pages "imilarly, if a transaction accesses several records on a page, it should loc. the entire page, and if it accesses *ust a few records, it should loc. *ust those records.
The uestion to be addressed is how a loc. manager can efficiently ensure that a page, for example, is not loc.ed by a transaction while another transaction holds a conflicting loc. on the file containing the page (and therefore, implicitly, on the page). The idea is to exploit the hierarchical nature of the /contains7 relationship.
A database contains a set of, files, each file contains a set of page, and each page contains a set of records This containment hierarchy can be thought of as a tree of ob*ects, where each node contains all its children (The approach can easily be extended to cover hierarchies that are not trees, but we will not discuss this extension.) A loc. on a node loc.s that node and, implicitly, all its descendants. (1ote that this interpretation of a loc. is very different from !C tree loc.ing, where loc.ing a node does not loc. any descendants implicitlyO) &n addition to shared (") and exclusive (R) loc.s, multp3e,granularity loc.ing protocols also use two new .inds of loc.s, called intention shared (&") and intention exclusive (&R) loc.s &" loc.s conflict only with loc.s &R loc.s conflict with " and R loc.s. To loc. a node in " (respectively R) mode, a transaction must first loc. all its ancestors in &" (respectively &R) mode Thus, if a transaction loc.s a node in " mode, no other transaction can have loc.ed any ancestor in R mode, similarly, f a transaction loc.s a node in R mode, no other transaction can have loc.ed any ancestor in " or R mode. This ensures that no other transaction holds a loc. on an ancestor that conflicts with the re uested " or R loc. on the node.
A common situation is that a transaction needs to read an entire file and modify a few of the records in it: that is, it needs an " loc. on the file and an &R loc. so that it can subse uently loc. some of the contained ob*ects in ,R mode. &t is useful to define a new .ind of loc. called an "&R loc. that is logically e uivalent to holding an " loc. and an &R loc.. A transaction can obtain a single "&R loc. (which conflicts with any loc. that conflicts with either " or &R) instead of an " loc. and an &R loc..
A subtle point is that loc.s must be released in leaf,to,root order for this protocol to wor. correctly. To see this, consider what happens when a transaction Ti loc.s all nodes on a path from the root (corresponding to the entire database) to the node corresponding to same page p in &" mode, loc.s p in " mode, and then releases, the loc. on the root node. Another transaction T* could now obtain an R loc. on the root. This loc. implicitly gives T* an R loc. on page p, which conflicts with the " loc. currently held by Ti.
%ultiple,granularity loc.ing must be used with 62D in order to ensure serializability. 62D dictates when loc.s can be released. At that time, loc.s obtained using multiple granularity loc.ing can be released and must be released in leaf,to,root order.
'inally, there is the uestion of how to decide what granularity of loc.ing is appropriate for a given transaction. >ne approach is to begin by obtaining fine granularity loc.s (e.g., at the record level) and after the transaction re uests a certain
number of loc.s at that granularity, to start obtaining loc.s at the next higher granularity (e.g., at the page level). This procedure is called loc. escalation. (b) Transaction 2rocessing "ystems. "ame Answer of Nuestion 1o. 8.
Q. 1 . What are the desirab(e properties of transactions in a database?
Ans. "ame Answer of Nuestion 1o. 8.
Q. 1!. What are (oc3in$ techni>ues for concurrency contro(? 12p(ain.
Ans. (oncurrent control is a method used to ensure that database transaction are executed in a safe manner or &t is the process of managing simultaneous operations against a database so that data operation do not interfere with each other in a multi user environment.
>ne such that shared database is used in on,line manner is the database for an airline reservation that is used by many agents accessing the database from their terminals. A database could also be accessed in batch mode and it is concurrently used with the online mode. The sharing of the database for read only access does not cause any problem, but if one of the transactions running concurrently tries to modify same data item, it could lead to inconsistencies.
'urther if more than one transaction is allowed to simultaneously modify a data item in a database. &t could lead to incorrect values for the data item and an inconsistent database will be created. 'or example, suppose that two tic.ed agents access the online reservation system simultaneously to see, if the seat is available on a given flight or not and if both agents ma.e the reservation against the last available seat on that flight then a message of overboo.ing will be displayed. This will ma.e the data in inconsistent way) ?e can say that concurrent processing of the programs, process of Job are similar to the multiprogramming, i.e. no. of *obs or programs processed simultaneously to achieve their independent B different goals according to their own re uirements.
"ome concurrency problem, when we apply a correct transaction in the concurrent processing then we see that databases becomes inconsistent after the completion of the transaction &n the case of concurrent operation where a number of transactions are running and using the database we cannot ma.e any assumption about the order in which the statement belonging to different transactions will be executed. The order in which these statements are executed is called "chedule. "o the processing of these statements which are in schedule and used in concurrent operation /and we cannot change the schedule is called concurrent schedule. "ome problems occur during scheduling and concurrent processing are as) (a) Dost update problem ) (onsider the two transactions given below and these transactions are accessing the same data item A. #ach of these transactions modifies the data item and write it bac.. Then. we see that the concurrent processing of the modification of the value of A will create a problem by loosing the old value with the currently updated value.
(b) &nconsistent +ead 2roblem) The lost update problem was caused by concurrent modification of same data item. @owever concurrency can also cause problem when only one transaction modifies a given set of data while that set of data is being used by other transaction. 'or example, if there are two transaction occur T< and TE in a schedule. "uppose A and ! represents some data items having integer value then if both are concurrently processed then one processing will be reading the data and other will be modifying the data. "o it will create an inconsistency in the reading in next transactions that which data is correct for reading and which is incorrect.
(c) The phantom phenomenon &n phantom phenomenon let us consider an organization where parts are purchased and .ept in stoc.. The parts are withdrawn from the stoc. and used by number of pro*ects. To chec. the extent of loss, we want to see that whether (or if) current uantity of some part purchased and received is e ual to the current sum of the uantity of that part in stoc., plus the current uantity is used by various pro*ects The phantom problem means
if additional items are added, this additional information reflects the transaction and uery during the concurrent processing. This problem could be prevented by using the concept of Doc.ing i.e. loc.ing of such type. of records also prevents the addition of such phantom records. (d) "emantic of (oncurrent Transaction ) As we ta.e two different transactions for different ordering, it is not necessary that the two transactions are commutative. "uppose two transactions A S (AC3=) C 6= A S (AC6=) C 3= gives same result. !ut some time the commutative operations is not same "alary S ("alary C 3===) M 3.3 "alary S ("alary x 3.3) C 3=== 'or example, whether syntax of operation as same, but semantics of these transactions are different during concurrent processing. An important part of concurrency is serial execution or seralizability. ?hen we let some independent transactions in a schedule by setting them in a order such that their execution becomes serially then these type of execution is called serial execution or serialzability. "ome problem of concurrent processing are removed by serial execution by setting or ordering the operation in a particular se uences.
"erializablity) A non,serial schedule that is e uivalent to some serial execution of transactions is called a serializiable schedule, 'or example in below written three schedules, schedule, 8 is "erializable schedule and is e uivalent to the scheule,3 and schedule,6. The purpose of "erializable scheduling is to find the non,serial schedules that allows the transactions to execute concurrently without interfering with one another and therefore produces a database state that could be produced by a serial execution. 1ote that serializability also removes the problem of inconsistency.
Definition) The given interleaved execution of some transactions is said to be serializable if it produces the same results as some serial execution of the transactions. &n serializablity the ordering of read and write operations are important before any operation to avoid any type of confusion or in,consistency, "erializablity Test can be explained and solved with the help of 2recedence Hraph. "olution to these problems &f all schedules in concurrent environment are restricted to serializable schedule, the result obtained will be consistent with some serial execution of the transaction and will be considered correct. Also testing of serialzability of a schedule is not only expensive but it is impractical some time. Thus one of the following concurrent control schemes is applied in concurrent database environment to ensure that the schedule produced by concurrent transaction are serializable. "ome concurrency controls schemes used to solve all the problems occurs during the concurrent scheduling are as discussed below) (i). Doc.ing "cheme (ii) Time stamp based order (iii) >ptimistic scheduling (iv) %ulti version techni ue (i) Doc.ing 'rom the point of view of Doc.ing a database can be considered as being made up of set of data items A loc. is a variable associated with each such data item %anipulating the value of the loc. is called D>(I&1H. The value of loc. variable is used in loc.ing scheme to control the concurrent access and manipulation of the associated data item. The loc.ing is done by a subsystem of D!%" and such system is called Doc. %anager. There are two types of Doc.)
(a) #xclusive Doc.) #xclusive Doc. is also called update or writes loc.. The intention of their mode of loc.ing is to provide exclusive use of data items to one transaction. &f a transaction T loc.s a data item N in ar exclusive mode, no other transaction can access N or not even read N until the loc. is released by transaction (b) "hared Doc. ) "hare Doc. is also called a read loc.. Any number of transactions can concurrently loc. and access a data item in the shared mode but none of these transactions can modify the data item. A data item loc.ed in a shared mode cannot be loc.ed in the exclusive mode until the shared loc. is released by all transactions holding the loc.. A data item in the exclusive mode cannot be loc.ed in the share mode until the exclusive loc. on the data item is released.
Two 2hase Doc.ing) Another method of loc.ing is called Two 2hase Doc.ing. &n this once a loc. is released no additional loc. are re uested. &n other words the release of the loc. is delayed, until all the loc.s on all data items re uired by the transactions have been ac uired. &t has two phases, a growing phase in which the number of loc.s increase from = to maximum for the transaction and a contracting phase in which the number of loc.s held decreases from maximum to zero. !oth of these phases is monotonic i.e. the number of loc.s are only increasing in first phase and decreasing in the 6nd phase. >nce a transaction starts releasing loc.s, it is not allowed to re uest any further loc.s. &n this way a transaction is obliged to re uest all loc.s it may need during its life before it releases any. This leads to control and lower the degree of (oncurrency.
(ii) Time stamp based order &n time stamp based, method, a serial order is created among the concurrent transaction by assigning to each transaction a uni ue non,decreasing number. The usual value assigned to each transaction is the system cloc. value at the start of the transaction. &t is called Time "tamp ordering. There are two types of time stamp) (a) ?rite time stamp (b) +ead time stamp. A variation of this scheme is used in a distributed environment includes the site of a transaction appendid to the system wide cloc. value. This value can then be the system wide cloc. value. This value can then be used on deciding the order in which the conflict between two transactions is resolved. A transaction with a smaller time stamp value is considered to be an /older7 transaction than another transaction with a larger time stamp value. Data item R is thus represented by triple R set as R) TR, ?x, +x) where each R is represented as R ) The value of data item.
?x ) The write time stamp value, the largest time stamp value of any transaction that was allowed to write a value of R. +x ) The read time stamp value, the largest timestamp value of any /transaction that was allowed to read the current value R. (iii) >ptimistic "cheduling &n the optimistic scheduling schema, the philosophy is to assume that all data items can be successfully updated at the end of a transaction and to read in the values for data item without any loc.ing. +eading is done when re uired and if any data item is found to be inconsistent at the end of a transaction then the transaction is rolled bac. (Ased for recovery procedure in D!%"). &n optimistic scheduling each transaction has three phases) (a) The read phase ) This phase starts with the activation of a transaction and in this all data items are read into local variables and any modification that are made are only to those local copies. This ends with commitment. (b) -alidation phase) &n this when data items are modified it chec. that data after the procedure be rolled bac.. (c) ?rite phase) ?hen transaction passes the validation phase, then whole transaction be written into secondary storage data. An optimistic scheme does not use loc. and so it is dead loc. free even through starvation can still occur. (iv) %ulti version techni ue &t is also called time domain addressing scheme, which follows the accounting principle of never overwriting a transaction. Any charge are achieved by entering compensating transaction e.g. in this R is achieved by ma.ing a new copy or version of data item R. "o it is called %ulti version. &n this way a history of evolution of the value of data item is recorded in the database. ?ith multi version techni ue, write operations can occur concurrently. "ince they do not overwrite with each other. Also read operation can read any version.
Q. 1". Describe the usefu(ness of $rantin$ privi(ed$es to the users.
Ans. $ou use H+A1T to assign roles or system privileges to roles or users. The same command wor.s whether you are assigning these roles or system privileges to an individual user or to a role that in turn can be assigned to many users. The "yntax for the H+A1T (ommand The H+A1T command ta.es the following syntax) "$1TAR) H+A1T role or system privilege Q, role or system privilege & T> user or role or 2A!D&( Q, user or roleU Q?&T@ AD%&1 >2T&>1U The H+A1T command can ta.e any number of system privileges and roles and assign them to any number of users or roles. !y specifying that you want to grant a role or system privilege to 2A!D&(, you are specifying that you want that role or privilege to be granted to all users in the system. The +#->I# command is *ust the opposite of the H+A1T command: it will ta.e a role or system privilege away from a user or role) +#->I# role or system privilege Q/role or system privilegeU '+>% user or role or 2A!D&( Q/user or roleU
Q. 1%. Defi(e the term 6enera(iEation.
Ans. Heneralization. Heneralization seems to be simplification of data, i.e. to bring the data from An,normalized form to normalized form 0 A botto,up design process , combine a number of entity sets that share the same features into a higher,level entity set . 0 "pecialization and generalization are simple inversions of each other: they are represented an #,+ diagram in the same way. 0 The terms specialization and generalization are used interchangeably.
Q. 1*. What is cascadin$ ro(( bac3
Ans. A cascading rollbac. occurs in database systems when a transaction (T3) causes a failure and rollbac. must be performed >ther transactions dependent on T37s actions must also be rolled bac. due to T37s failure, thus causing a cascating effect That is, one transaction7s failure causes many to fail.
Q. 2.. 12p(ain various recovery techni>ues based on deferred update.
Ans. +efer to N.1o. 33.
Q. 21. What is a (o$ record ? What fie(ds it contains ? Go' it is used for database recovery?
Ans. +efer to N.1o. 33.
Q. 22. What does ro(( name si$nify? 12p(ain 'ith e2amp(e.
Ans. >racle provides for easy and controlled privilege management through roles. +oles are named groups of related privileges that you grant to users or other roles. +oles, are designed to ease the administration of end,user system and ob*ect privileges. @owever, roles are not meant to be used for application developers, because the privileges to access ob*ects within stored programmatic constructs need to be granted directly. These properties of roles allow for easier privilege management within a database) . 0 +educed privilege administration +ather than explicitly granting the same set of privileges to several users, you can grant the privileges for a group of related users to a role, and then only the role needs to be granted to each member of the group.
0 Dynamic privilege management &f the privileges of a group must change, only the privileges of the role need to be modified. The security domains of all users granted the group7s role automatically reflect the changes made to the role.
0 "elective availability of privileges $ou can selectively enable or disable the roles granted to a user This allows specific control of a user7s privileges in any given situation. 0 Application awareness !ecause the data dictionary records which roles exist, you can design database applications to uery the dictionary id automatically enable (and disable) selective roles when a user attempts to execute the application via a given username. 0 Application,specific security $ou can protect role use with a password. Applications can be created specifically to enable a role when supplied the correct password. Asers cannot enable the role if they do not .now the password. Q.1 What is data mas3in$?
So(B
Data mas.ing is the process of de,identifying (mas.ing) specific data elements within data stores. &t ensures that the sensitive data is replaced with realistic but not real data. The result is that sensitive information is not available to users outside of authorized environments. Data mas.ing is done while provisioning non,production environments so that copies created to support test, analysis, and development processes are not exposing sensitive information. ?ithout mas.ing, these policies can ris. exposing sensitive data. De,identification or mas.ing is the process of replacing sensitive data with randomly generated data that is valid and functional for application processing, but is not associated with the original record. dgmas.er preserves the application and relational integrity of the data set, and transforms it to meet application business rules. &n data mas.ing, the format of data remains the same: only the values are changed. The data may be altered in a number of ways, including encryption, character shuffling, and character or word substitution. ?hatever method is chosen, the values must be changed in some way that ma.es detection or reverse engineering impossible. Data mas.ing is not the same thing as restricting the visibility of information in production databases from people who are not authorized to see it. &n that situation, the data is actually present in the database and is simply not visible to the unauthorized. There are many good and *ustifiable reasons for ta.ing this approach in a production
system, but adopting a 4data is present but hidden5 approach to the protection of data in test and development databases is a recipe for trouble. The reason is that strict controls are in place in production databases and these can present a carefully managed view. Test and development systems are different. Typically, they are an environment in which access is usually much wider. &nformation is visible to more people and those people often have greater privileges and low level access. 'rom a data visibility standpoint, a test or dev system in which the data is present but hidden is a system which sooner or later will expose its data. &n general, a reasonable security assumption is that the more people who have access to the information, the greater the inherent ris. of the data being compromised. The modification of the existing data in such a way as to remove all identifiable distinguishing characteristics yet still usable as a test system can provide a valuable layer of security for test and development databases
Q.2 Why Mas3 Data?
So(B
&e$a( 5e>uirements
The regulatory environment surrounding the duties and obligations of a data holder to protect the information they maintain are becoming increasingly rigorous in *ust about every legal *urisdiction. &t is a pretty safe assumption that the standards for the security and maintenance of data will become increasingly strict in the future.
&oss of 0onfidence And Dub(ic 5e(ations Disasters
&t can reasonably be said in most locations, that if a data escape happens at your organization, then the formal legal sanctions applied by governmental bodies is not the only problem you will be facing. 2ossibly it may not even be the biggest of your immediate worries. &nappropriate data exposure, whether accidental or malicious, can have devastating conse uences. >ften the costs of such an event, both actual and un, uantifiable can far exceed any fines levied for the violation of the rules. 'or example, what will it cost the organization if potential customers are
not willing to provide sensitive information to your company because they read an article about a data escape in the newspaper. Dealing with the public relations aftermath of seeing the companies name in the press will not be cheap. &t also does not ta.e much imagination to realize that senior management are not going to be happy about having to give a press conference to reassure the public. The public relations costs of a data escape usually far exceed the sanctions levied by governmental organizations.
Ma(icious 12posure
%ost people thin. the ma*or ris. to the information they hold is external entities (and organized syndicates) out to brea. in and steal the data. The assumption then follows that protecting the networ. and firewalls is the appropriate and sufficient response. There is no denying that such protection is necessary c however it has been shown that in many cases the data is stolen by malicious insiders who have been granted access to the data. 1o firewall can .eep an insider from ac uiring data under such circumstances. @owever, by reducing the number of databases with unmas.ed information, the overall ris. of exposure is mitigated. The external hac.ers, if they get through the networ. security, will have far fewer useable targets and a far greater proportion of the inside personnel will have no access to the real data.
Accidenta( 12posure
The ris. of accidental exposure of information is often neglected when considering the security ris.s associated with real test data. >ften it is thought that 4there is no point in mas.ing the test data because everybody has access to production anyways5. 1ot so, the ris.s associated with an accidental exposure of the data remain. >ften *ust mas.ing the most sensitive information (credit card numbers, customer email addresses etc) is enough to somewhat mitigate the damage associated with accidental exposure and the mas.ed databases remain *ust as functional.
Data Mas3in$ Architectures
'undamentally, there are two basic types of architectures which are used in the design of data mas.ing software.
,n the -(y4 Server)8o)Server4 Data Mas3in$ Architectures
&n this architecture the data does not exist in the target database prior to mas.ing. The anonymization rules are applied as part of the process of moving the data from the source to the target. >ften this type of mas.ing is integrated into the cloning process which creates the target database.
Q.3B What are the Advanta$es H Disadvanta$es of Data Mas3in$?
So(B Advanta$es
The data is never present in an unmas.ed form in the target database.
Disadvanta$es
Any errors in the process necessarily interrupt the transfer of the data.
d The ability to mas. data after the transfer has completed can be troublesome. This might happen in cases where the mas.ed target database has been built and it is subse uently decided that a specific column of information really needs to be mas.ed. &n this case, the mas.ing software needs to have &n,"itu mas.ing capabilities (see below) or the entire clone and mas.ing operation will need to be repeated. d The ability to use alternative, perhaps preferred, tools to perform the cloning operation is impacted.
Q. B 12p(ain I/n)Situ Data Mas3in$ ArchitecturesI So(B
&n this style, the clone of the database to be mas.ed is created by other means and the software simply operates on the cloned database. There are two types of in,situ mas.ing) mas.ing rules which are executed and controlled as a standalone entity on the target and data mas.ing rules which are controlled by a different system which then connects to the target and controls the execution of the rules.
Advanta$es
&t is possible to apply additional mas.ing operations at any time.
d The mas.ing operations are separate from the copy process so existing cloning solutions can be used and the data mas.ing rules are possibly simpler to maintain.
Disadvanta$es
d The data is present in an unmas.ed state in the target database and hence additional security measures will be re uired during that time.
Q.!B 12p(ain Data Mas3in$ 8echni>ue. So(B
Substitution
This techni ue consists of randomly replacing the contents of a column of data with information that loo.s similar but is completely unrelated to the real details. 'or example, the surnames in a customer database could be sanitized by replacing the real last names with surnames drawn from a largish random list.
"ubstitution is very effective in terms of preserving the loo. and feel of the existing data. The downside is that a largish store of substitutable information must be available for each column to be substituted. 'or example, to sanitize surnames by substitution, a list of random last names must be available. Then to sanitize telephone numbers, a list of phone numbers must be
available. 're uently, the ability to generate .nown invalid data (credit card numbers that will pass the chec.sum tests but never wor.) is a nice,to,have feature.
"ubstitution data can sometimes be very hard to find in large uantities , however any data mas.ing software should contain datasets of commonly re uired items. ?hen evaluating data mas.ing software the size, scope and variety of the datasets should be considered. Another useful feature to loo. for is the ability to build your own custom datasets and add them for use in the mas.ing rules.
Shuff(in$
"huffling is similar to substitution except that the substitution data is derived from the column itself. #ssentially the data in a column is randomly moved between rows until there is no longer any reasonable correlation with the remaining information in the row.
There is a certain danger in the shuffling techni ue. &t does not prevent people from as.ing uestions li.e 4& wonder if so,and,so is on the supplier listW5 &n other words, the original data is still present and sometimes meaningful uestions can still be as.ed of it. Another consideration is the algorithm used to shuffle the data. &f the shuffling method can be determined, then the data can be easily 4un,shuffled5. 'or example, if the shuffle algorithm simply ran down the table swapping the column data in between every group of two rows it would not ta.e much wor. from an interested party to revert things to their un,shuffled state.
"huffling is rarely effective when used on small amounts of data. 'or example, if there are only < rows in a table it probably will not be too difficult to figure out which of the shuffled data really belongs to which row. >n the other hand, if a column of numeric data is shuffled, the sum and average of the column still wor. out to the same amount. This can sometimes be useful.
"huffle rules are best used on large tables and leave the loo. and feel of the data intact. They are fast, but great care must be ta.en to use a sophisticated algorithm to randomize the shuffling of the rows.
Fumber and Date ?ariance
The 1umber -ariance techni ue is useful on numeric or date data. "imply put, the algorithm involves modifying each number or date value in a column by some random percentage of its real value. This techni ue has the nice advantage of providing a reasonable disguise for the data while still .eeping the range and distribution of values in the column to within existing limits. 'or example, a column of salary details might have a random variance of e3=` placed on it. "ome values would be higher, some lower but all would be not too far from their original range. Date fields are also a good candidate for variance techni ues. !irth dates, for example, could be varied with in an arbitrary range of e 36= days which effectively disguises the personally identifiable information while still preserving the distribution.
The variance techni ue can prevent attempts to discover true records using .nown date data or the exposure of sensitive numeric or date data.
1ncryption
This techni ue offers the option of leaving the data in place and visible to those with the appropriate .ey while remaining effectively useless to anybody without the .ey. This would seem to be a very good option c yet, for anonymous test databases, it is one of the least useful techni ues.
The advantage of having the real data available to anybody with the .ey c is actually a ma*or disadvantage in a test or development database. The 4optional5 visibility provides no ma*or advantage in a test system and the encryption password only needs to escape once and all of the data is compromised. >f course, you can change the .ey and regenerate the test instances c but outsourced, stored or saved copies of the data are all still available under the old password.
#ncryption also destroys the formatting and loo. and feel of the data. #ncrypted data rarely loo.s meaningful, in fact, it usually loo.s li.e binary data. This sometimes leads to character set issues when manipulating encrypted varchar fields. (ertain types of encryption impose constraints on the data format as well. &n effect, this means that the fields must be extended with a suitable padding character which must then be stripped off at decryption time.
The strength of the encryption is also an issue. "ome encryption is more secure than others. According to the experts, most encryption systems can be bro.en c it is *ust a matter of time and effort. &n other words, not very much will .eep the national security agencies of largish
countries from reading your files should they choose to do so. This may not be a big worry if the re uirement is to protect proprietary business information. 1ever, ever, use a simplistic encryption scheme designed by amateurs. 'or example, one in which the letter /A7 is replaced by /R7 and the letter /!7 by /%7 etc. is trivially easy to decrypt based on letter fre uency probabilities. &n fact, first year computer science students are often as.ed to write such programs as assignments.
Fu((in$ ,utJ8runcatin$
"imply deleting a column of data by replacing it with 1ADD values is an effective way of ensuring that it is not inappropriately visible in test environments. Anfortunately it is also one of the least desirable options from a test database standpoint. Asually the test teams need to wor. on the data or at least a realistic approximation of it. 'or example, it is very hard to write and test customer account maintenance forms if the customer name, address and contact details are all 1ADD values. 1ADD7ing or truncating data is useful in circumstances where the data is simply not re uired, but is rarely useful as the entire data sanitization strategy.
Mas3in$ ,ut Data
%as.ing data, besides being the generic term for the process of data anonymization, means replacing certain fields with a mas. character (such as an R). This effectively disguises the data content while preserving the same formatting on front end screens and reports. 'or example, a column of credit card numbers might loo. li.e)
989E E9<9 ==6= <8FL 99L8 L68K F83< <FKF 96LF K6LE F9LE KF69
and after the mas.ing operation the information would appear as)
989E RRRR RRRR <8FL 99L8 RRRR RRRR <FKF
96LF RRRR RRRR KF69
The mas.ing characters effectively remove much of the sensitive content from the record while still preserving the loo. and feel. Ta.e care to ensure that enough of the data is mas.ed to preserve security. &t would not be hard to regenerate the original credit card number from a mas.ing operation such as) 96LF K6LE F9LE KFRR since the numbers are generated with a specific and well .nown chec.sum algorithm. Also care must be ta.en not to mas. out potentially re uired information. A mas.ing operation such as RRRR RRRR RRRR <8FL would strip the card issuer details from the credit card number. This may, or may not, be desirable.
&f the data is in a specific, invariable format, then %as.ing >ut is a powerful and fast option. &f numerous special cases must be dealt with then mas.ing can be slow, extremely complex to administer and can potentially leave some data items inappropriately mas.ed.

Important Questions

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Important Questions

Transféré par

Droits d'auteur :

Formats disponibles

Q. 1. What do you mean by database?

Q. 2. What are the various characteristics of DBMS?

Q. 3. What are the various characteristics of DBMS approach?

Q. 3. What are the various types of databases?

Q. . What do you mean by DBMS?

Q. !. What are the various components of DBMS?

Q. ".What are the various functions of DBMS?

Q#. What are the advanta$es and disadvanta$es of a database approach?

Q. %. &ist five si$nificant differences bet'een a fi(e)processin$ system and a DBMS.

2+>!D#%" >' '&D# 2+>(#""&1H

use and maintenance by authorized end users.

Q.1.. What are the various types of database uses?

Q11. Discuss the architecture of database mana$ement system.

Ans. D!%" A+(@&T#(TA+#

>!J#(T&-#" >' T@+## D#-#D A+(@&T#(TA+#

Q. 12. Write a note on Database &an$ua$e And /nterfaces.

Q.13. Describe the 0(assification of Database Mana$ement Systems.

Q. 1 . 12p(ain the difference bet'een physica( and (o$ica( data independence.

(-iew before adding a new column)

Aser !7s external view

Q. 1!. What is physica( data independence?

Q. 1". What do you mean by data redundancy?

Q. 1#. What do you man by database schema?

Q. 1*. What are the ma+or functions of a database administrator?

Q. 2.. What do you mean by re(ationships? 12p(ain different types of re(ationships.

After the *oin)

(onsider relational schema + (A, !, (, D), A

+epresentation of #ntity "ets in the form of +elations

2 table (#ntity) or 2 +elation)

Q. 23. What is the usa$e of unified mode((in$ (an$ua$e 9:M&;?

Q. 2 . What are $raphica( user interfaces?

Q. 2!. Define the term dan$(in$ pointer.

Ans. The pointers that points to nothing is called dangling pointer.

Q. 2". Write a short note on Mappin$.

Q. 2#. Distin$uish bet'een 5DBMS and DBMS.

Q. 1. What is re(ationa( a($ebra?

relations. &t is denoted by the symbol

Q. 3. What are sin$(e)va(ued and mu(tiva(ued attributes?

A relation is in !(1' if every determinant is a candidate .ey.

Q. . Define the term data manipu(ation (an$ua$e.

Q. ". What do you mean by 5e(ationa( 0onstraints?

0 A 1>T 1ADD constraint for that column

Q. %. Write a note on SQ& basic >ueries.

Q *. What are the various features of SQ&?

Q. 1.. What is a tri$$er?

Q. 11. What is the difference bet'een a procedura( and a non)procedura( (an$ua$e?

Q. 1 . &ist any t'o procedura( pro$rammin$ (an$ua$es.

Ans. 3. 2ostgre "ND 6. D!6 "ND 8. 2D;"ND.

Q. 1!. What are ro' tri$$ers?

Q. 1". Define the term DD&.

Ans. DDD is data definition languages.

Ans. "imilar Nuestion (hapter,6, Nuestion 1o. 66.

Ans. "imilar Nuestion (hapter,6, Nuestion 1o. 38.

Q. 1*. Define C?ie'7.

The functional dependencies in relation '&+"T is as follows)

(omposite Iey wor.ing as a candidate .ey.

A relation is in !(1' if every determinant is a candidate .ey.

Q. 2. What are mu(tiva(ued dependencies? ,r Define the tern functiona( dependency.

'ig) The relation "(2 ("ample tabulation)

<. Decomposition) &f A !(, then A E. Anion) &f A ! and A (, then A

(losure of a "at of Attributes

?e observe immediately that the 'D A be eliminated.

! occurs twice, so one occurrence can