MCS 043 Qns 4: Define Hash join and explain the process and cost calculation of Hash join with the help of an example.
Ans:
This is applicable to both the equi-joins and natural joins. A hash function h is used to partition tuples of both relations, where h maps joining attribute (enroll no in our example) values to {0, 1, ..., n-1}. The join attribute is hashed to the join-hash partitions. In the example of Figure 4 we have used mod 100 function to hashing, and n = 100.
Figure 4: A hash-join example Once the partition tables of STUDENT and MARKS are made on the enrolment number, then only the corresponding partitions will participate in the join as: A STUDENT tuple and a MARKS tuple that satisfy the join condition will have the same value for the join attributes. Therefore, they will be hashed to equivalent partition and thus can be joined easily.
Cost calculation for Simple Hash-Join (i) Cost of partitioning r and s: all the blocks of r and s are read once and after partitioning written back, so cost 1 = 2 (blocks of r + blocks of s). For 100% Result Oriented IGNOU Coaching and Project Training Call CPD TM : 011-65164822, 08860352748
(ii) Cost of performing the hash-join using build and probe will require at least one block transfer for reading the partitions
Cost 2 = (blocks of r + blocks of s) (iii) There are a few more blocks in the main memory that may be used for evaluation, they may be read or written back. We ignore this cost as it will be too less in comparison to cost 1 and cost 2.
Thus, the total cost = cost 1 + cost 2 = 3 (blocks of r + blocks of s) Cost of Hash-Join requiring recursive partitioning: (i) The cost of partitioning in this case will increase to number of recursion required, it may be calculated as:
Number of iterations required = ([log M-1 (blocks of s] 1) Thus, cost 1 will be modified as: = 2 (blocks of r + blocks of s) ([log M-1 (blocks of s)] 1) The cost for step (ii) and (iii) here will be the same as that given in steps (ii) and (iii) above. Thus, total cost = 2(blocks of r + blocks of s) ( [log M1 (blocks of s) 1] ) + (blocks of r + blocks of s). Because s is in the inner term in this expression, it is advisable to choose the smaller relation as the build relation. If the entire build input can be kept in the main memory, n can be set to 1 and the algorithm need not partition the relations but may still build an in-memory index, in such cases the cost estimate goes down to (Number of blocks r + Number of blocks of s). Qns 5 i) List the feature of semantic database Ans:
Semantic modeling is one of the tools for representing knowledge especially in Artificial Intelligence and object-oriented applications. Thus, it may be a good idea to model some of the knowledge databases using semantic database system. Some of the features of semantic modeling and semantic databases are: For 100% Result Oriented IGNOU Coaching and Project Training Call CPD TM : 011-65164822, 08860352748
these models represent information using high-level modeling abstractions, these models reduce the semantic overloading of data type constructors, semantic models represent objects explicitly along with their attributes, semantic models are very strong in representing relationships among objects, and they can also be modeled to represent IS A relationships, derived schema and also complex objects.
ii) Explain clustering in data mining. Ans:
database D={t1,t2,,tn} of tuples and an integer value k, the Clustering problem is to define a mapping where each tuple ti is assigned to one cluster Kj, 1<=j<=k. A Cluster, Kj, contains precisely those tuples mapped to it. Unlike the classification problem, clusters are not known in advance. The user has to the enter the value of the number of clusters k. In other words a cluster can be defined as the collection of data objects that are similar in nature, as per certain defining property, but these objects are dissimilar to the objects in other clusters. Some of the clustering examples are as follows: To segment the customer database of a departmental store based on similar buying patterns. To identify similar Web usage patterns etc. Clustering is a very useful exercise specially for identifying similar groups from the given data. Such data can be about buying patterns, geographical locations, web information and many more. Some of the clustering Issues are as follows:
Outlier handling: How will the outlier be handled? (outliers are the objects that do not comply with the general behaviour or model of the data) Whether it is to be considered or it is to be left aside while calculating the clusters? Dynamic data: How will you handle dynamic data? Interpreting results: How will the result be interpreted? Evaluating results: How will the result be calculated? Number of clusters: How many clusters will you consider for the given data? Data to be used: whether you are dealing with quality data or the noisy data? If, the data is noisy how is it to be handled? Scalability: Whether the algorithm that is used is to be scaled for small as well as large data set/database. There are many different kinds of algorithms for clustering.
iii) Explain the characteristics of mobile database. Also give an application of mobile database. Ans:
Characteristics of Mobile Databases The mobile environment has the following characteristics: 1) Communication Latency: Communication latency results due to wireless transmission between the sources and the receiver. But why does this latency occur? It is primarily due to the following reasons: a) due to data conversion/coding into the wireless formats, b) tracking and filtering of data on the receiver, and c) the transmission time. For 100% Result Oriented IGNOU Coaching and Project Training Call CPD TM : 011-65164822, 08860352748
2) Intermittent wireless connectivity: Mobile stations are not always connected to the base stations. Sometimes they may be disconnected from the network. 3) Limited battery life: The size of the battery and its life is limited. Information communication is a major consumer of the life of the battery. 4) Changing location of the client: The wireless client is expected to move from a present mobile support station to an other mobile station where the device has been moved. Thus, in general, the topology of such networks will keep on changing and the place where the data is requested also changes. This would require implementation of dynamic routing protocols.
Because of the above characteristics the mobile database systems may have the following features:
Very often mobile databases are designed to work offline by caching replicas of the most recent state of the database that may be broadcast by the mobile support station. The advantages of this scheme are:
it allows uninterrupted work, and reduces power consumption as data communication is being controlled.
iv) How is audit trail done in database? How are they related to database security? Ans:
AUDIT TRAILS IN THE DATABASES One of the key issues to consider while procuring a database security solution is making sure you have a secure audit-trail. An audit trail tracks and reports activities around confidential data. Many companies have not realised the potential amount of risk associated with sensitive information within databases unless they run an internal audit which details who has access to sensitive data and have assessed it. Consider the situation that a DBA who has complete control of database information may conduct a security breach, with respect to business details and financial information. This will cause tremendous loss to the company. In such a situation database audit helps in locating the source of the problem. The database audit process involves a review of log files to find and examine all reads and writes to database items during a specific time period, to ascertain mischief if any, banking database is one such database which contains very critical data and should have the security feature of auditing. An audit trail is a log that is used for the purpose of security auditing
Database auditing is one of the essential requirements for security especially, for companies in possession of critical data. Such companies should define their auditing strategy based on their knowledge of the application or database activity. Auditing need not be of the type all or nothing. One must do intelligent auditing to save time and reduce performance concerns. This also limits the volume of logs and also causes more critical security events to be highlighted.
More often then not, it is the insiders who makes database intrusions as they often have network authorisation, knowledge of database access codes and the idea about the value of data they want to exploit. Sometimes despite having all the access rights and policies in place, database files may be For 100% Result Oriented IGNOU Coaching and Project Training Call CPD TM : 011-65164822, 08860352748
directly accessible (either on the server or from backup media) to such users. Most of the database applications, store information in form text that is completely unprotected and viewable.
As huge amounts are at stake, incidents of security breaches will increase and continue to be widespread. For example, a large global investment bank conducted an audit of its proprietary banking data. It was revealed that more than ten DBAs had unrestricted access to their key sensitive databases and over hundred employees had administrative access to the operating systems. The security policy that was in place was that proprietary information in the database should be denied to employees who did not require access to such information to perform their duties. Further, the banks database internal audit also reported that the backup data (which is taken once every day) was also cause for concern as tapes could get stolen. Thus the risk to the database was high and real and that the bank needed to protect its data.
However, a word of caution, while considering ways to protect sensitive database information, please ensure that the privacy protection process should not prevent authorized personnel from obtaining the right data at the right time.
The credit card information is the single, most common financially traded information that is desired by database attackers. The positive news is that database misuse or unauthorized access can be prevented with currently available database security products and audit procedures.
Qns 6: i) How does OLAP support query processing in data warehouse. Ans: Data warehouses are not suitably designed for transaction processing, however, they support increased efficiency in query processing. Therefore, a data warehouse is a very useful support for the analysis of data. But are there any such tools that can utilise the data warehouse to extract useful analytical information? On Line Analytical Processing (OLAP) is an approach for performing analytical queries and statistical analysis of multidimensional data. OLAP tools can be put in the category of business intelligence tools along with data mining. Some of the typical applications of OLAP may include reporting of sales projections, judging the performance of a business, budgeting and forecasting etc. OLAP tools require multidimensional data and distributed query-processing capabilities. Thus, OLAP has data warehouse as its major source of information and query processing. But how do OLAP tools work? In an OLAP system a data analyst would like to see different cross tabulations by interactively selecting the required attributes. Thus, the queries in an OLAP are expected to be executed extremely quickly. The basic data model that may be supported by OLAP is the star schema, whereas, the OLAP tool may be compatible to a data warehouse. For 100% Result Oriented IGNOU Coaching and Project Training Call CPD TM : 011-65164822, 08860352748
Let us, try to give an example on how OLAP is more suitable to a data warehouse rather than to a relational database. An OLAP creates an aggregation of information, for example, the sales figures of a sales person can be grouped (aggregated) for a product and a period. This data can also be grouped for sales projection of the sales person over the regions (North, South) or states or cities. Thus, producing enormous amount of aggregated data. If we use a relational database, we would be generating such data many times. However, this data has many dimensions so it is an ideal candidate for representation through a data warehouse. The OLAP tool thus, can be used directly on the data of the data warehouse to answer many analytical queries in a short time span. The term OLAP is sometimes confused with OLTP. OLTP is online transaction processing. OLTP systems focus on highly concurrent transactions and better commit protocols that support high rate of update transactions. On the other hand, OLAP focuses on good query-evaluation and query-optimization algorithms. ii) Differentiate between embedded SQL and dynamic SQL. Give an example of embedded SQL. Ans: Embedded SQL: The embedded SQL statements can be put in the application program written in C, Java or any other host language. These statements sometime may be called static. Why are they called static? The term static is used to indicate that the embedded SQL commands, which are written in the host program, do not change automatically during the lifetime of the program. Thus, such queries are determined at the time of database application design. For example, a query statement embedded in C to determine the status of train booking for a train will not change. However, this query may be executed for many different trains. Please note that it will only change the input parameter to the query that is train- number, date of boarding, etc., and not the query itself. Dynamic SQL: Dynamic SQL, unlike embedded SQL statements, are built at the run time and placed in a string in a host variable. The created SQL statements are then sent to the DBMS for processing. Dynamic SQL is generally slower than statically embedded SQL as they require complete processing including access plan generation during the run time.
However, they are more powerful than embedded SQL as they allow run time application logic. The basic advantage of using dynamic embedded SQL is that we need not compile and test a new program for a new query.
Example of Embedded SQL
Cursors and Embedded SQL: on execution of an embedded SQL query, the resulting tuples are cached in the cursor. This operation is performed on the server. Sometimes the cursor is opened by RDBMS itself these are called implicit cursors. However, in embedded SQL you need to declare these cursors explicitly these are called explicit cursors. Any cursor needs to have the following operations defined on them: For 100% Result Oriented IGNOU Coaching and Project Training Call CPD TM : 011-65164822, 08860352748
DECLARE to declare the cursor OPEN AND CLOSE - to open and close the cursor FETCH get the current records one by one till end of tuples. In addition, cursors have some attributes through which we can determine the state of the cursor. These may be:
ISOPEN It is true if cursor is OPEN, otherwise false. FOUND/NOT FOUND It is true if a row is fetched successfully/not successfully. ROWCOUNT It determines the number of tuples in the cursor.
Let us explain the use of the cursor with the help of an example:
Example: Write a C program segment that inputs the final grade of the students of MCA programme. Let us assume the relation:
EXEC SQL BEGIN DECLARE SECTION; Char enrolno[10], name[26], p-code[4], grade /* grade is just one character*/ int phone; int SQLCODE; char SQLSTATE[6] EXEC SQL END DECLARE SECTION;
printf (enter the programme code); scanf (%s, &p-code); EXEC SQL DECLARE CURSOR GUPDATE SELECT enrolno, name, phone, grade FROM STUDENT WHERE progcode =: p-code FOR UPDATE OF grade; EXEC SQL OPEN GUPDATE;
EXEC SQL FETCH FROM GUPDATE INTO :enrolno, :name, :phone, :grade; WHILE (SQLCODE==0) { printf (enter grade for enrolment number, %s, enrolno); scanf (%c, grade); EXEC SQL UPDATE STUDENT SET grade=:grade WHERE CURRENT OF GUPDATE EXEC SQL FETCH FROM GUPDATE; For 100% Result Oriented IGNOU Coaching and Project Training Call CPD TM : 011-65164822, 08860352748
} EXEC SQL CLOSE GUPDATE;
Qns 7: Explain the following with the help of an example:
i) Application of Datagrid
Ans:
ii) XML and HTML
Ans:
Most documents on Web are currently stored and transmitted in HTML. One strength of HTML is its simplicity. However, it may be one of its weaknesses with the growing needs of users who want HTML documents to be more attractive and dynamic. XML is a restricted version of SGML, designed especially for Web documents. SGML defines the structure of the document (DTD), and text separately. By giving documents a separately defined structure, and by giving web page designers ability to define custom structures, SGML has and provides extremely powerful document management system but has not been widely accepted as it is very complex. XML attempts to provide a similar function to SGML, but is less complex. XML retains the key SGML advantages of extensibility, structure, and validation. XML cannot replace HTML.
iii) Data-marts
Ans: Data marts can be considered as the database or collection of databases that are designed to help managers in making strategic decisions about business and the organization. Data marts are usually smaller than data warehouse as they focus on some subject or a department of an organization (a data warehouses combines databases across an entire enterprise). Some data marts are also called dependent data marts and may be the subsets of larger data warehouses.
A data mart is like a data warehouse and contains operational data that helps in making strategic decisions in an organization. The only difference between the two is that data marts are created for a certain limited predefined application. Even in a data mart, the data is huge and from several operational systems, therefore, they also need a multinational data model. In fact, the star schema is also one of the popular schema choices for a data mart.
iv) Security classes
Ans: vi) Deductive database
For 100% Result Oriented IGNOU Coaching and Project Training Call CPD TM : 011-65164822, 08860352748
Ans:
A deductive database is a database system that can be used to make deductions from the available rules and facts that are stored in such databases. The following are the key characteristics of the deductive databases: the information in such systems is specified using a declarative language in the form of rules and facts, an inference engine that is contained within the system is used to deduce new facts from the database of rules and facts, these databases use concepts from the relational database domain (relational calculus) and logic programming domain (Prolog Language), the variant of Prolog known as Datalog is used in deductive databases. The Datalog has a different way of executing programs than the Prolog and the data in such databases is specified with the help of facts and rules. Deductive databases normally operate in very narrow problem domains. These databases are quite close to expert systems except that deductive databases use, the database to store facts and rules, whereas expert systems store facts and rules in the main memory. Expert systems also find their knowledge through experts whereas deductive database have their knowledge in the data. Deductive databases are applied to knowledge discovery and hypothesis testing.
ix) Query optimization
Ans:
Query Optimisation: Amongst all equivalent plans choose the one with the lowest cost. Cost is estimated using statistical information from the database catalogue, for example, number of tuples in each relation, size of tuples, etc.
Thus, in query optimisation we find an evaluation plan with the lowest cost. The cost estimation is made on the basis of heuristic rules.