Vous êtes sur la page 1sur 9

For 100% Result Oriented IGNOU Coaching

and Project Training


Call CPD
TM
: 011-65164822, 08860352748


MCS 043
Qns 4: Define Hash join and explain the process and cost calculation of Hash join with the help
of an example.

Ans:

This is applicable to both the equi-joins and natural joins. A hash function h is used to partition tuples of
both relations, where h maps joining attribute (enroll no in our example) values to {0, 1, ..., n-1}.
The join attribute is hashed to the join-hash partitions. In the example of Figure 4 we have used mod 100
function to hashing, and n = 100.


Figure 4: A hash-join example
Once the partition tables of STUDENT and MARKS are made on the enrolment number, then only the
corresponding partitions will participate in the join as:
A STUDENT tuple and a MARKS tuple that satisfy the join condition will have the same value for the
join attributes. Therefore, they will be hashed to equivalent partition and thus can be joined easily.

Cost calculation for Simple Hash-Join
(i) Cost of partitioning r and s: all the blocks of r and s are read once and after partitioning written back,
so cost 1 = 2 (blocks of r + blocks of s).
For 100% Result Oriented IGNOU Coaching
and Project Training
Call CPD
TM
: 011-65164822, 08860352748


(ii) Cost of performing the hash-join using build and probe will require at least one block transfer for
reading the partitions

Cost 2 = (blocks of r + blocks of s)
(iii) There are a few more blocks in the main memory that may be used for evaluation, they may be read
or written back. We ignore this cost as it will be too less in comparison to cost 1 and cost 2.

Thus, the total cost = cost 1 + cost 2
= 3 (blocks of r + blocks of s)
Cost of Hash-Join requiring recursive partitioning:
(i) The cost of partitioning in this case will increase to number of recursion required, it may be calculated
as:

Number of iterations required = ([log
M-1
(blocks of s] 1)
Thus, cost 1 will be modified as:
= 2 (blocks of r + blocks of s) ([log
M-1
(blocks of s)] 1)
The cost for step (ii) and (iii) here will be the same as that given in steps (ii) and (iii) above.
Thus, total cost = 2(blocks of r + blocks of s) ( [log
M1
(blocks of s) 1] ) + (blocks of r +
blocks of s).
Because s is in the inner term in this expression, it is advisable to choose the smaller relation as the build
relation. If the entire build input can be kept in the main memory, n can be set to 1 and the algorithm need
not partition the relations but may still build an in-memory index, in such cases the cost estimate goes
down to (Number of blocks r + Number of blocks of s).
Qns 5 i) List the feature of semantic database
Ans:

Semantic modeling is one of the tools for representing knowledge especially in Artificial Intelligence and
object-oriented applications. Thus, it may be a good idea to model some of the knowledge databases using
semantic database system.
Some of the features of semantic modeling and semantic databases are:
For 100% Result Oriented IGNOU Coaching
and Project Training
Call CPD
TM
: 011-65164822, 08860352748


these models represent information using high-level modeling abstractions,
these models reduce the semantic overloading of data type constructors,
semantic models represent objects explicitly along with their attributes,
semantic models are very strong in representing relationships among objects, and
they can also be modeled to represent IS A relationships, derived schema and also complex
objects.

ii) Explain clustering in data mining.
Ans:

database D={t1,t2,,tn} of tuples and an integer value k, the Clustering problem is to define a mapping
where each tuple ti is assigned to one cluster Kj, 1<=j<=k. A Cluster, Kj, contains precisely those tuples
mapped to it. Unlike the classification problem, clusters are not known in advance. The user has to the
enter the value of the number of clusters k. In other words a cluster can be defined as the collection of
data objects that are similar in nature, as per certain defining property, but these objects are dissimilar to
the objects in other clusters. Some of the clustering examples are as follows: To segment the customer
database of a departmental store based on similar buying patterns. To identify similar Web usage
patterns etc. Clustering is a very useful exercise specially for identifying similar groups from the given
data. Such data can be about buying patterns, geographical locations, web information and many more.
Some of the clustering Issues are as follows:

Outlier handling: How will the outlier be handled? (outliers are the objects that do not comply with the
general behaviour or model of the data) Whether it is to be considered or it is to be left aside while
calculating the clusters?
Dynamic data: How will you handle dynamic data?
Interpreting results: How will the result be interpreted?
Evaluating results: How will the result be calculated?
Number of clusters: How many clusters will you consider for the given data?
Data to be used: whether you are dealing with quality data or the noisy data? If, the data is noisy how is
it to be handled?
Scalability: Whether the algorithm that is used is to be scaled for small as well as large data
set/database. There are many different kinds of algorithms for clustering.

iii) Explain the characteristics of mobile database. Also give an application of mobile database.
Ans:

Characteristics of Mobile Databases The mobile environment has the following characteristics:
1) Communication Latency: Communication latency results due to wireless transmission between the
sources and the receiver. But why does this latency occur?
It is primarily due to the following reasons:
a) due to data conversion/coding into the wireless formats,
b) tracking and filtering of data on the receiver, and
c) the transmission time.
For 100% Result Oriented IGNOU Coaching
and Project Training
Call CPD
TM
: 011-65164822, 08860352748


2) Intermittent wireless connectivity: Mobile stations are not always connected to the base stations.
Sometimes they may be disconnected from the network.
3) Limited battery life: The size of the battery and its life is limited. Information communication is a
major consumer of the life of the battery.
4) Changing location of the client: The wireless client is expected to move from a present mobile
support station to an other mobile station where the device has been moved. Thus, in general, the
topology of such networks will keep on changing and the place where the data is requested also changes.
This would require implementation of dynamic routing protocols.

Because of the above characteristics the mobile database systems may have the following features:

Very often mobile databases are designed to work offline by caching replicas of the most recent state of
the database that may be broadcast by the mobile support station. The advantages of this scheme are:

it allows uninterrupted work, and
reduces power consumption as data communication is being controlled.


iv) How is audit trail done in database? How are they related to database security?
Ans:

AUDIT TRAILS IN THE DATABASES
One of the key issues to consider while procuring a database security solution is making sure you have a
secure audit-trail. An audit trail tracks and reports activities around confidential data. Many companies
have not realised the potential amount of risk associated with sensitive information within databases
unless they run an internal audit which details who has access to sensitive data and have assessed it.
Consider the situation that a DBA who has complete control of database information may conduct a
security breach, with respect to business details and financial information. This will cause tremendous
loss to the company. In such a situation database audit helps in locating the source of the problem. The
database audit process involves a review of log files to find and examine all reads and writes to database
items during a specific time period, to ascertain mischief if any, banking database is one such database
which contains very critical data and should have the security feature of auditing. An audit trail is a log
that is used for the purpose of security auditing

Database auditing is one of the essential requirements for security especially, for companies in possession
of critical data. Such companies should define their auditing strategy based on their knowledge of the
application or database activity. Auditing need not be of the type all or nothing. One must do intelligent
auditing to save time and reduce performance concerns. This also limits the volume of logs and also
causes more critical security events to be highlighted.

More often then not, it is the insiders who makes database intrusions as they often have network
authorisation, knowledge of database access codes and the idea about the value of data they want to
exploit. Sometimes despite having all the access rights and policies in place, database files may be
For 100% Result Oriented IGNOU Coaching
and Project Training
Call CPD
TM
: 011-65164822, 08860352748


directly accessible (either on the server or from backup media) to such users. Most of the database
applications, store information in form text that is completely unprotected and viewable.

As huge amounts are at stake, incidents of security breaches will increase and continue to be widespread.
For example, a large global investment bank conducted an audit of its proprietary banking data. It was
revealed that more than ten DBAs had unrestricted access to their key sensitive databases and over
hundred employees had administrative access to the operating systems. The security policy that was in
place was that proprietary information in the database should be denied to employees who did not require
access to such information to perform their duties. Further, the banks database internal audit also
reported that the backup data (which is taken once every day) was also cause for concern as tapes could
get stolen. Thus the risk to the database was high and real and that the bank needed to protect its data.

However, a word of caution, while considering ways to protect sensitive database information, please
ensure that the privacy protection process should not prevent authorized personnel from obtaining the
right data at the right time.

The credit card information is the single, most common financially traded information that is desired by
database attackers. The positive news is that database misuse or unauthorized access can be prevented
with currently available database security products and audit procedures.

Qns 6: i) How does OLAP support query processing in data warehouse.
Ans:
Data warehouses are not suitably designed for transaction processing, however, they support increased
efficiency in query processing. Therefore, a data warehouse is a very useful support for the analysis of
data. But are there any such tools that can utilise the data warehouse to extract useful analytical
information?
On Line Analytical Processing (OLAP) is an approach for performing analytical queries and statistical
analysis of multidimensional data. OLAP tools can be put in the category of business intelligence tools
along with data mining. Some of the typical applications of OLAP may include reporting of sales
projections, judging the performance of a business, budgeting and forecasting etc.
OLAP tools require multidimensional data and distributed query-processing capabilities. Thus, OLAP has
data warehouse as its major source of information and query processing. But how do OLAP tools work?
In an OLAP system a data analyst would like to see different cross tabulations by interactively selecting
the required attributes. Thus, the queries in an OLAP are expected to be executed extremely quickly.
The basic data model that may be supported by OLAP is the star schema, whereas, the OLAP tool may be
compatible to a data warehouse.
For 100% Result Oriented IGNOU Coaching
and Project Training
Call CPD
TM
: 011-65164822, 08860352748


Let us, try to give an example on how OLAP is more suitable to a data warehouse rather than to a
relational database. An OLAP creates an aggregation of information, for example, the sales figures of a
sales person can be grouped (aggregated) for a product and a period. This data can also be grouped for
sales projection of the sales person over the regions (North, South) or states or cities. Thus, producing
enormous amount of aggregated data. If we use a relational database, we would be generating such
data many times. However, this data has many dimensions so it is an ideal candidate for representation
through a data warehouse. The OLAP tool thus, can be used directly on the data of the data warehouse
to answer many analytical queries in a short time span. The term OLAP is sometimes confused with
OLTP. OLTP is online transaction processing. OLTP systems focus on highly concurrent transactions and
better commit protocols that support high rate of update transactions. On the other hand, OLAP focuses
on good query-evaluation and query-optimization algorithms.
ii) Differentiate between embedded SQL and dynamic SQL. Give an example of embedded
SQL.
Ans:
Embedded SQL: The embedded SQL statements can be put in the application program written in C, Java
or any other host language. These statements sometime may be called static. Why are they called static?
The term static is used to indicate that the embedded SQL commands, which are written in the host
program, do not change automatically during the lifetime of the program. Thus, such queries are
determined at the time of database application design. For example, a query statement embedded in C to
determine the status of train booking for a train will not change. However, this query may be executed for
many different trains. Please note that it will only change the input parameter to the query that is train-
number, date of boarding, etc., and not the query itself.
Dynamic SQL: Dynamic SQL, unlike embedded SQL statements, are built at the run time and placed in a
string in a host variable. The created SQL statements are then sent to the DBMS for processing. Dynamic
SQL is generally slower than statically embedded SQL as they require complete processing including
access plan generation during the run time.

However, they are more powerful than embedded SQL as they allow run time application logic. The basic
advantage of using dynamic embedded SQL is that we need not compile and test a new program for a
new query.

Example of Embedded SQL

Cursors and Embedded SQL: on execution of an embedded SQL query, the resulting tuples are
cached in the cursor. This operation is performed on the server. Sometimes the cursor is opened by
RDBMS itself these are called implicit cursors. However, in embedded SQL you need to declare these
cursors explicitly these are called explicit cursors. Any cursor needs to have the following operations
defined on them:
For 100% Result Oriented IGNOU Coaching
and Project Training
Call CPD
TM
: 011-65164822, 08860352748


DECLARE to declare the cursor
OPEN AND CLOSE - to open and close the cursor
FETCH get the current records one by one till end of tuples.
In addition, cursors have some attributes through which we can determine the state of the cursor. These
may be:

ISOPEN It is true if cursor is OPEN, otherwise false.
FOUND/NOT FOUND It is true if a row is fetched successfully/not successfully.
ROWCOUNT It determines the number of tuples in the cursor.

Let us explain the use of the cursor with the help of an example:

Example: Write a C program segment that inputs the final grade of the students of MCA programme.
Let us assume the relation:

STUDENT (enrolno:char(9), name:Char(25), phone:integer(12),
prog-code:char(3)); grade: char(1));

EXEC SQL BEGIN DECLARE SECTION;
Char enrolno[10], name[26], p-code[4], grade /* grade is just one character*/
int phone;
int SQLCODE;
char SQLSTATE[6]
EXEC SQL END DECLARE SECTION;

printf (enter the programme code);
scanf (%s, &p-code);
EXEC SQL DECLARE CURSOR GUPDATE
SELECT enrolno, name, phone, grade
FROM STUDENT
WHERE progcode =: p-code
FOR UPDATE OF grade;
EXEC SQL OPEN GUPDATE;

EXEC SQL FETCH FROM GUPDATE
INTO :enrolno, :name, :phone, :grade;
WHILE (SQLCODE==0)
{
printf (enter grade for enrolment number, %s, enrolno);
scanf (%c, grade);
EXEC SQL
UPDATE STUDENT
SET grade=:grade
WHERE CURRENT OF GUPDATE
EXEC SQL FETCH FROM GUPDATE;
For 100% Result Oriented IGNOU Coaching
and Project Training
Call CPD
TM
: 011-65164822, 08860352748


}
EXEC SQL CLOSE GUPDATE;

Qns 7: Explain the following with the help of an example:

i) Application of Datagrid

Ans:


ii) XML and HTML

Ans:

Most documents on Web are currently stored and transmitted in HTML. One strength of HTML is its
simplicity. However, it may be one of its weaknesses with the growing needs of users who want HTML
documents to be more attractive and dynamic. XML is a restricted version of SGML, designed especially
for Web documents. SGML defines the structure of the document (DTD), and text separately. By giving
documents a separately defined structure, and by giving web page designers ability to define custom
structures, SGML has and provides extremely powerful document management system but has not been
widely accepted as it is very complex. XML attempts to provide a similar function to SGML, but is less
complex. XML retains the key SGML advantages of extensibility, structure, and validation. XML cannot
replace HTML.

iii) Data-marts

Ans:
Data marts can be considered as the database or collection of databases that are designed to help
managers in making strategic decisions about business and the organization. Data marts are usually
smaller than data warehouse as they focus on some subject or a department of an organization (a data
warehouses combines databases across an entire enterprise). Some data marts are also called dependent
data marts and may be the subsets of larger data warehouses.

A data mart is like a data warehouse and contains operational data that helps in making strategic decisions
in an organization. The only difference between the two is that data marts are created for a certain limited
predefined application. Even in a data mart, the data is huge and from several operational systems,
therefore, they also need a multinational data model. In fact, the star schema is also one of the popular
schema choices for a data mart.

iv) Security classes

Ans:
vi) Deductive database

For 100% Result Oriented IGNOU Coaching
and Project Training
Call CPD
TM
: 011-65164822, 08860352748


Ans:

A deductive database is a database system that can be used to make deductions from the available rules
and facts that are stored in such databases. The following are the key characteristics of the deductive
databases:
the information in such systems is specified using a declarative language in the form of rules
and facts,
an inference engine that is contained within the system is used to deduce new facts from the
database of rules and facts,
these databases use concepts from the relational database domain (relational calculus) and
logic programming domain (Prolog Language),
the variant of Prolog known as Datalog is used in deductive databases. The Datalog has a
different way of executing programs than the Prolog and
the data in such databases is specified with the help of facts and rules.
Deductive databases normally operate in very narrow problem domains. These databases are quite close
to expert systems except that deductive databases use, the database to store facts and rules, whereas expert
systems store facts and rules in the main memory. Expert systems also find their knowledge through
experts whereas deductive database have their knowledge in the data. Deductive databases are applied to
knowledge discovery and hypothesis testing.

ix) Query optimization

Ans:

Query Optimisation: Amongst all equivalent plans choose the one with the lowest cost. Cost is
estimated using statistical information from the database catalogue, for example, number of tuples in each
relation, size of tuples, etc.

Thus, in query optimisation we find an evaluation plan with the lowest cost. The cost estimation is made
on the basis of heuristic rules.

Vous aimerez peut-être aussi