Unit 13 Distributed Database Structure 13.1 Introduction to Distributed DBMS Concepts Objectives Self Assessment Question(s) (SAQs) 13.2 Client-Server Model Self Assessment Question(s) (SAQs) 13.3 Data Fragmentation, Replication, and Allocation Techniques for Distributed Database Design Self Assessment Question(s) (SAQs) 13.4 Summary 13.5 Terminal Questions (TQs) 13.6 Multiple Choice Questions (MCQs) 13.7 Answers to SAQs, TQs, and MCQs 13.7.1 Answers to Self Assessment Questions (SAQs) 13.7.2 Answers to Terminal Questions (TQs) 13.7.3 Answers to Multiple Choice Questions (MCQs) 13.1 Introduction to Distributed DBMS Concepts In a centralized database system, all system components such as data, DBMS software, storage devices reside at a single computer or site, where as in distributed database system data is spread over one or more computer connected by a network. Distributed database is thus a set of databases stored on multiple computers but it appears to a user as a single database. The data on several computers can be simultaneously accessed and modified (data from local and remote databases) using a network. Each database server in the DDB is controlled by its local DBMS, and each cooperates to maintain the consistency of the global database. Database Management Systems Unit 13 Sikkim Manipal University Page No.: 215 As a general goal, distributed computing systems divide a big, unmanageable problem into smaller pieces and solve it efficiently in a coordinated manner. Fig. 13.1: Data distribution and replication among distributed database Objectives To know about o Client-Server Model o Data fragmentation o Replication o Allocation Techniques for Distributed Database Design Advantages of Distributed Databases 1. Increased reliability and availability: Reliability is broadly defined as the probability that a system is running at a certain time point, whereas reliability is defined as the system that is continuously available during a time interval. When the data and DBMS software are distributed over Database Management Systems Unit 13 Sikkim Manipal University Page No.: 216 several sites, one site may fail while other sites continue to operate. Only the data and software that exist at the failed site cannot be accessed. In a centralized system, failure at a single site makes the whole system unavailable to all users. 2. Improved performance: Large database is divided into smaller databases by keeping the necessary data where it is needed most. Data localization reduces the contention for CPU and I/O services, and simultaneously reduces access delays involved in wide area network. When a large database is distributed over multiple sites, smaller databases exist at each site. As a result, local queries and transactions accessing data at a single site have better performance because of the smaller local databases. To improve parallel query processing a single large transaction is divided into a number of smaller transactions and executes multiple transactions at different sites. 3. Data sharing: Data can be accessed by users at other remote sites through the distributed database management system (DDBMS) Software. 4. Transparency: Ideally, a distributed database should be distribution transparent in the sense of hiding the details of where each file is physically stored within the system. It provides network transparency, that is the command used to perform a task is independent of the location of data, and the location of the system where the command was issued. 5. Easier expansion: In a distributed environment, expansion of the system in terms of adding more data, increasing database size, or adding more processors is much easier. Additional Functions of Distributed Databases: Basic functions performed by DDBMS in addition to those of centralized DBMS. Database Management Systems Unit 13 Sikkim Manipal University Page No.: 217 1. Distributed query processing: Distributed query processing means the ability to access remote sites and transmit queries and data among the various sites via the communication network. 2. Data tracing: DDBMS should have the ability to keep track of the data distribution, fragmentation and replication by maintaining DDBMS catalog. 3. Distributed transaction management In DDBMS transactions that accesses data from more than one site, and it synchronizes the access to distributed data and maintains integrity of the overall database. 4. Distributed database recovery: The ability to recover from individual site crashes and from new types of failures. 5. Security: It must be executed with the proper management of the security of the data and the authorization/access privileges of the users. 6. Distributed directory (catalog) management: A directory contains information (meta data) about data in the database. The directory may be global for the entire DDB, or local for each site. The placement and distribution of the directory are design and policy issues. These functions increase the complexity of a DDBMS over a centralized DBMS. Self Assessment Question(s) (SAQs) (For Section 13.1) 1. Define distributed database system 2. What are the advantages of Distributed database systems? 13.2 Client-Server Model The Client-Server model is basic to distributed systems, it allows clients to make requests that are routed to the appropriate server in the form of transactions. The client_server model consists of three parts. Database Management Systems Unit 13 Sikkim Manipal University Page No.: 218 1. Client - The client is the machine (workstation or pc) running the front and applications. It interacts with a user through the keyboard, display and mouse. The client has no direct data access responsibilities. The client machine provides front_end application software for accessing the data on the server. The clients initiates transactions, the server processes the transactions. Interaction between client and server might be processed as follows during processing of an SQL query. 1. The client passes a user query and decomposes it into a number of independent site queries. Each site query is sent to the appropriate server site. 2. Each server processes the local query and sends the resulting relation to the client site. 3. The client site combines the results of the queries to produce the result of the originally submitted query. So the server is called database processor or back end machine, where as the client is calledapplication processor or front end machine. Another function controlled by the client is that of ensuring consistency of replicated copies of a data item by using distributed concurrency control techniques. The client must also ensure the atomicity of global transactions by performing global recovery when certain sites fail. It provides distribution transparency, that is the client hides the details of data distribution from the user. 1. Server The server is the machine that runs the DMS software. It is referred to as back end. The server processes SQL and other query statements received from client applications. It can have large disk capacity and fast processors. Database Management Systems Unit 13 Sikkim Manipal University Page No.: 219 2. Network The network enables remote data access through client server and server-to-server communication. Each computer in a network is a node, acts as a client, a server, or both, depending on the situation. Advantages: Client applications are not dependent on physical location of the data. If the data is moved or distributed to other database servers, the application continues to function with little or no modification. It provides multi-tasking and shared memory facilities; as a result they can deliver the highest possible degree of concurrency and data integrity. In networked environment, shared data is stored on the servers, rather than on all computers in the system. This makes it easier and more efficient to manage concurrent access. Inexpensive, low-end client work stations can access the remote data of the server effectively. Self Assessment Question(s) (SAQs) (For Section 13.2) 1. Explain the concept of Client server model. 13.3 Data fragmentation, Replication, and Allocation Techniques for Distributed Database Design Data fragmentation: Techniques that are used to break up the database into logical units called fragments that may be assigned for storage at the various sites. In a DDBMS, decisions must be made regarding which site should be used to store which portions of the database. There are three types of fragmentation: 1. Horizontal fragmentation: A horizontal fragmentation divides a relation "horizontally" by grouping rows to create subsets of tuples, where each subset has a certain logical meaning. These fragments can then be Database Management Systems Unit 13 Sikkim Manipal University Page No.: 220 assigned to different sites in the distributed system. For example, we may divide employee relation into three horizontal fragments with the following conditions: (DNO=10), (DNO=20) AND (DNO=30) each fragment contains the Employee tuples working for a particular department. 2. Vertical fragmentations: It is a collection of only certain attributes of the relation. It divides a relation "vertically" by columns. For ex: we may want to fragment the employee relation into two vertical fragments. The first fragment includes personal information Name, B date, Address and the Second includes work related information-SSN, Salary, Mgr no etc. 3. Mixed fragmentation: Mixing of horizontal and vertical fragmentation is called mixed fragmentation. Data Replication and Allocation: Replication is useful in improving the availability of data. This replication of the whole database at every site in the distributed system is called fully replicated database. This can improve availability because the system can continue to operate as long as at least one site is up. It improves performance of retrieval for global queries, because the result of such a query can be obtained locally from any one site. The disadvantage is that it can slow down update operations, since update must be performed on every copy of the database to keep the copies consistent. Full replication makes the concurrency control and recovery techniques more expensive. The other extreme from full replication is no replicating that is, each fragment is stored at only one location, whereas in partial replication some fragments of the database may be replicated and others may not. Some people carry partially replicated databases with them on laptops. Database Management Systems Unit 13 Sikkim Manipal University Page No.: 221 Allocation: Each copy of a fragment must be assigned to a particular site in the distributed system. This process is called data distribution or allocation. Type of Distributed DB Systems: In DDB software is distributed over multiple sites connected by network. It is categorized as: The first factor is the degree of homogeneity of the DDBMS software. If all servers (or individual local DDMSs) use identical software and all users use identical software, the DDBMS is called homogeneous; otherwise, it is called heterogeneous. At the other extreme is the federated DDBMS or multidatabase system. In such a system each server has an independent DBMS, own local users, local programmers and DBA. In heterogeneous FDBS one server may be RDBMS, another may be network DBMS, and the third one may be hierarchical DBMS etc. In such a way, it is necessary to have a canonical system language and language translators to translate canonical language to the language of each server. Self Assessment Question(s) (SAQs) (For Section 13.3) 1. What do you mean by data fragmentation? Explain different types. 2. Explain the concept of data replication and allocation. 13.4 Summary In this unit we have learnt concepts such as o Client-Server Model o Data fragmentation o Replication o Allocation Techniques for Distributed Database Design 13.5 Terminal Questions (TQs) 1. Discuss briefly the advantages of distributed databases. 2. Discuss Data fragmentation, Replication, and Allocation Techniques for Distributed Database Design. Database Management Systems Unit 13 Sikkim Manipal University Page No.: 222 13.6 Multiple Choice Question (MCQs) 1. In .all system components such as data, DBMS software, storage device reside at a single computer or site. a) a centralized database system b) Distributed database System c) client and server architecuture d) None of the above 2. Indata is spread over one or more computer connected by a network a) a centralized database system b) Distributed database System c) client and server architecuture d) None of the above 3. is the machine (workstation or pc) running the front end applications. a) Server b) Client c) Client and server d) None of the above 4. enables remote data access through client server and server- to-server communication a) The network b) client c) Server d) None of the above 13.7 Answers to SAQs, TQs, and MCQs 13.7.1 Answers to Self Assessment Questions (SAQs) For Section 13.1 1. In a distributed database system, data is spread over one or more computer connected by a network. (Refer section 13.1) Database Management Systems Unit 13 Sikkim Manipal University Page No.: 223 2. Increased reliability and availability, Improved performance, Data sharing, Transparency, Easier expansion (Refer section 13.1) For Section 13.2 1. The Client-Server model is basic to distributed systems, it allows clients to make requests that are routed to the appropriate server in the form of transactions. (Refer section13.2) For Section 13.3 1. Data fragmentation: Techniques that are used to break up the database into logical units called fragments, that may be assigned for storage at the various sites. (Refer section13.3) 2. Data Replication and Allocation: Replication is useful in improving the availability of data. Each copy of a fragment must be assigned to a particular site in the distributed system. This process is called data distribution or allocation. (Refer section 13.3) 13.7.2 Answers to Terminal Questions (TQs) 1. Increased reliability and availability: Reliability is broadly defined as the probability that a system is running at a certain time point, whereas reliability is defined as the system is continuously available during a time interval. (Refer section 13.1) 2. Data fragmentation: Techniques that are used to break up the database into logical units called fragments, that may be assigned for storage at the various sites. (Refer section 13.3) 13.7.3 Answers to Multiple Choice Questions (MCQs) 1. A 2. B 3. B 4. A