Vous êtes sur la page 1sur 14


Definition of DBMS :: A database management system(DBMS) is a collection of interrelated data and a set of programs to access those data. A collection of data, usually referred to as the database, contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient. Example:- Oracle, Microsoft SQL Server, IBM DB2 etc. Database Applications :: o Banking: all transactions o Airlines: reservations, schedules o Universities: registration, grades o Sales: customers, products, purchases o Online retailers: order tracking, customized recommendations o Manufacturing: production, inventory, orders, supply chain o Human resources: employee records, salaries, tax deductions Disadvantages of File system :: o Data redundancy and inconsistency: Since different programmers create the files and application programs, the various files are likely to have different file formats. Moreover same information may be duplicated in different files, for example the address and telephone no. of a particular customer may appear in a file consisting of savings account records and in a file consisting of checking account records. This redundancy leads to higher storage and access cost. Again, if by mistake, a change is made in a record in a file and that change is not reflected in the related records in the other files, then it will lead to data inconsistency. o Difficulty in accessing data: In file systems, there is need to write a new programs to carry out each new task. For example, a bank officer needs to find out the names of all customers who live within a particular postal code area. So, now the department has to generate an application program for this query or they have to find it out manually from the list of all customers. But, if another day the bank officer comes with another query, then again the department has to generate the required application program. So, the point is that, the conventional file systems do not allow needed data to be retrieved in a convenient and efficient manner. o Data isolation: Because data are scattered in various files and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult. o Integrity problems: The data values stored in the database must satisfy certain consistency constraints. For example, the balance of a bank account may never fall below a certain amount (say Rs 1000).in file systems, these constraints are enforced in the system by writing appropriate code in the required application programs. But it is difficult to change the programs, if new constraints are to be added to the system.



o Atomicity of updates: Failures in a system during any transaction, may leave the database in an inconsistent state with partial updates carried out. Example: Transfer of funds from one account to another should either complete or not happen at all. It is difficult to ensure atomicity in a conventional file processing system. o Concurrent access anomalies: Nowadays, concurrent access by multiple users is allowed for improved performance of the system. But, uncontrolled concurrent accesses can lead to inconsistencies in the database. Example: Two people reading a balance and updating it at the same time, may lead to an inconsistent value of the account balance. It is difficult to ensure data consistency during transaction in a conventional file processing system. o Security problems: In an efficient system, it is required that not every users of the system should be able to access all the data. But in file systems, it is hard to provide user access to some, but not all, data. Database systems offer solutions to all the above problems. 1.4 Data abstraction:: For a system to be user friendly, it hides the complexity of the system (i.e the details of how the data are stored and maintained) from the users, through several levels of abstraction, to simplify users interactions with the system. Levels of Data Abstraction ::

Fig. 1.1 Levels of Data Abstraction 2

Physical level: This level describes how a record is stored and describes the complex low level data structures in detail.. Logical level: This level describes what data are stored in database, and the relationships among the data. DBAs work in this level. View level: This level describes only part of the entire database as the users of the system dont need all the information in the system, but part of it which is required. The system may provide many views of the same database. Views can also hide information (such as an employees salary) for security purposes.


Instances and Schemas :: Schema: The overall design of the database is called the database schema. Schemas are changed infrequently, if at all. Types of Schema: Physical schema: It describes the database design at the physical level Logical schema: It describes the database design at the logical level Subschema: It describes different views of the database.

Instance: The actual content of the database at a particular point of time is called the instance of the database. Data Independence :: Data Independence is defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. We can define two types of data independence:


Logical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs. We may change the conceptual schema to expand the database (by adding a record type or data item), or to reduce the database (by removing a record type or data item). In the latter case, external schemas that refer only to the remaining data should not be affected. Only the view definition and the mappings need to be changed in a DBMS that supports logical data independence. Application programs that reference the external schema constructs must work as before, after the conceptual schema undergoes a logical reorganization. Changes to constraints can be applied also to the conceptual schema without affecting the external schemas or application programs.

Physical data independence is the capacity to change the internal schema without having to change the conceptual (or external) schemas. Changes to the internal schema may be needed because some physical files had to be reorganizedfor example, by creating additional access structuresto improve the performance of retrieval or update. If the same data as before remains in the database, we should not have to change the conceptual schema.


Database Languages :: DDL (Data Definition Language):: It is a set of SQL commands used to create, modify and delete database structures, but not data. DDL compiler generates a set of tables stored in a data dictionary Example: create table account (account_number char(10), balance integer); DML (Data Manipulation Language):: It is the area of SQL that allows changing data within the database. It is of two types: Procedural user specifies what data is required and how to get those data. Declarative (nonprocedural) user specifies what data is required without specifying how to get those data Example: Insert, Update, and Delete statements.

DCL (Data Control Language):: It is the component of SQL statement that control access to data and to the database. Sometimes DCL statements are grouped with DML statements. Example: o GRANT - gives user's access privileges to database o REVOKE - withdraw access privileges given with the GRANT command Transaction Control (TCL) statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions. Example: o COMMIT - save work done o SAVEPOINT - identify a point in a transaction to which you can later roll back o ROLLBACK - restore database to original since the last COMMIT o SET TRANSACTION - Change transaction options like isolation level and what rollback segment to use

DQL (Data Query Language):: It is the component of SQL statement that allows getting data from the database and imposing ordering upon it. This is also sometimes grouped with DML statements. Example: Select statements 1.8 Data Models :: It is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. The different types of data models are:Hierarchical model Network model Relational model Object-oriented data model

There are some other models also: Object-relational data model Deductive/Inference model Hierarchical model :- The data is sorted hierarchically, using a downward tree. This model uses pointers to navigate between stored data. It was the first DBMS model. There is a hierarchy of parent and child data segments. So only one-to-one and one-to-many relationships can be implemented here.

Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM's Information Management System (IMS) DBMS, through the 1970s. Network model :-. Some data were more naturally modeled with more than one parent per child. So, the network model permitted one-to-one, one-to-many, many-to-one and many-to-many relationships. Like the hierarchical model, this model uses pointers toward stored data. However, it does not necessarily use a downward tree structure.

Example: - The Conference on Data Systems Languages (CODASYL) DBMS. Relational model :- The data is stored in two-dimensional tables (rows and columns). A table is a collection of records and each record in a table contains the same fields. Certain fields may be designated as keys, which mean that searches for specific values of that field will use indexing to speed them up. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. A database based on the relational model is called RDBMS (Relational Database Management System). Example:- Oracle, MySQL, Microsoft SQL Server etc. Object-oriented data model:- The desire to represent complex objects has led to the development of object-oriented (OO) systems. Objectoriented databases employ a data model that supports object-oriented features and abstract data types. OO databases provide unique object identifiers (OIDs) so that the objects can be easily identified. This is similar to a primary key in the relational model. Object-oriented databases utilize the power of object-oriented programming languages to provide excellent database programming capability. The data in object-oriented database management systems (OODBMSs) is managed through two sets of relations, one describing the interrelations of data items and another describing the abstract relationships (inheritance). These systems employ both relation types to couple data items with procedural methods (encapsulation). As a result, a direct relationship is established between the application data model and the database data model. The strong connection between application and database results in less code, more natural data structures, and better maintainability and reusability of code. Example:- O2 (now called Ardent) developed by Ardent Software, and the Object Store system produced by Object Design Inc. Object-relational data model :- The main objective of ORDBMS design was to achieve the benefits of both the relational and the object models such as scalability and support for rich data types. ORDBMSs employ a data model that attempts to incorporate OO features into RDBMSs. All database information is stored in tables, but some of the tabular entries may have richer data structure, termed abstract data types (ADTs). An ORDBMS supports an extended form of SQL called SQL3 that is still in the development stages. Examples:- Universal Server, developed by Informix, Oracle8, from Oracle Corporation, and Universal DB (UDB) from IBM. Deductive/Inference model :- This model stores as little data as possible, but compensate by maintaining the rules that allow new data combinations to be created when needed.

For more details (advantages, disadvantages, comparison among the models) of the data models, refer to the book Database management Systems Leon & Leon (Page no.-107-116). Dr. E.F Codds rules for RDBMS Edgar F. Codd, a pioneer of the relational model for databases, proposed a set of 12 rules, designed to define what is required from a database management system in order for it to be considered relational, i.e., an RDBMS Rule 1: The information rule: All information in the database is to be represented in table form. Rule 2: The guaranteed access rule: All data must be accessible without ambiguity. This rule can be accomplished through a combination of the table name, the primary key and the column name. Rule 3: Systematic treatment of null values: The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way. There should also be the provision of some columns not allowing nulls (primary key). Rule 4: Active online catalog based on the relational model: The users must be able to access the database's structure (catalog) using the same query language that they use to access the database's data.

Rule 5: The comprehensive data sublanguage rule: The system must support at least one relational language that 1. Has a linear syntax 2. Can be used both interactively and within application programs, 3. Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback). Rule 6: The view updating rule: All views that are theoretically updatable must be updatable by the system. Rule 7: High-level insert, update, and delete: The system must support set-at-a-time insert, update, and delete operators. This means that data can

be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table. Rule 8: Physical data independence: Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure. Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence. Rule 10: Integrity independence: Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications. Rule 11: Distribution independence: The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully: 1. when a distributed version of the DBMS is first introduced; and 2. when existing distributed data are redistributed around the system. Rule 12: The nonsubversion rule: There should be no way to modify the database structure other than through the multiple row database language (like SQL).


Database users :: Users are differentiated by the way they expect to interact with the system. There are four types of users: Application programmers They are computer professionals who write application programs. RAD (Rapid Application development) tools enable an application programmer to construct forms and reports without writing a program. Sophisticated users They interact with the system without writing programs. They form their requests in a database query language. Analysts, who submit queries to explore data in the database, fall in this category.

Specialized users They are sophisticated users who write specialized

database applications that do not fit into the traditional data processing framework. Among these applications are computer aided design (CAD) systems, Knowledge based and expert systems, systems that store data with complex data types, and environment modeling systems. Naive users They are unsophisticated users who interact with the system by invoking one of the permanent application programs that have been written previously Examples:: people accessing database over the web, bank tellers, clerical staff etc. 1.10 Database Administrator :: Database administrator coordinates all the activities of the database system. He has central control of both the data and the programs that access those data. The database administrator has a good understanding of the enterprises information resources and needs. Database administrator's duties include: Schema definition:- The DBA creates the original database schema by executing a set of DDL statements. Storage structure and access method definition Schema and physical organization modification:- The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization. Granting user authority to access the database:- By granting different types of authorization, the DBA can regulate which parts of the database the users can access. Specifying integrity constraints Routine maintenance:- The DBAs routine maintenance activities are:o Periodically backing up the database either onto tapes or onto remote servers, to prevent loss of data in case of flooding. o Ensuring that enough free disk space is available for normal operations, and upgrading disk space is required. o Monitoring jobs running on the database and ensuring that performance is not degraded by very expensive tasks submitted by some users.


Components of DBMS :: A database system as shown in Fig. 1.2 is partitioned into modules that deal with each of the responsibilities of the overall system. They are: Storage Manager :: A storage manager is a program module that provides the interface between the low level data stored in the database and the application programs and queries submitted to the system. It includes::


Buffer Manager :: It is responsible for fetching data from disk storage into main memory and deciding what data to cache in main memory. File Manager :: It manages the allocation of space on disk storage and data structures used to represent data stored on disk. Authorization and Integrity Manager:: It tests for satisfaction of integrity constraints and checks authority of users to access data. Transaction Manager:: It ensures that a database remains in a consistent state despite system failures and that concurrent transaction executions proceed without conflicting.


Fig. 1.2 Overall Database System Structure

Disk Storage :: It consists of some data structures as part of the physical

system implementation. It includes:12

Data Files :: It stores the database itself. Indices :: It provides fast access to data items that hold particular values. Data Dictionary :: It stores metadata. Statistical Data :: It stores some numerical data needed for analysis.

Query Processor :: The query processor components include the following : DDL Interpreter :: This interprets DDL statements and records the definitions in data dictionary. DML compiler :: This translates DML statements into an evaluation plan consisting of low level instructions that the query evaluation engine understands. Query evaluation engine :: This executes low level instructions generated by the DML compiler.


Client Server Architecture :- This architecture is used in those database applications where the users connect with the database system through a network. The client machines are those where the remote database users work. The server machines are those on which the database system runs. Two types of architectures are available on this client-server technology :o Two-tier architecture :- Here, the application is partitioned into a component that resides in the client machine, which invokes database functionality at the server machine through query language statements. Application program interface standards like ODBC and JDBC are used for interaction between the client and the server.


o Three-tier architecture :- Here the client machine acts merely as a front end and doesnt contain any direct database calls. Instead, the client end communicates with an application server, usually through forms interface. The application server communicates with the database system to access data. The logic that includes what actions to carry out under what conditions, is embedded in the application server. These are more appropriate for large applications, and for applications that run on the World Wide Web.