Académique Documents
Professionnel Documents
Culture Documents
Update Blakes and Clarks salary, check that the total company salary does not exceed 20000, If total
salary >20000 then rollback.
Answer 1
DECLARE
total_sal number(9);
BEGIN
Savepoint blake_sal;
Savepoint clark_sal;
if total_sal>20000 then
end if;
COMMIT;
END;
Question 2
Answer 2
No
Question 3
Answer 3
Question 4
Sailors(Sid,Sname,rating,age)
Boat(Bid,Bname,color)
Reserve(Sid,Bid,day)
Answer 4
Question 5
A company is having headquarter in delhi and major operations in bangalore and hyderabad.
Company wants to design a distributed relational database which consists of 3 tables table A, B
and C- table A consists of 1 lakh record and id frequently required in all cities. table B consists of
75000 records. record 1 to 30000 are most frequently used in delhi. record 30001 to 75000 are
most frequently used in bangalore. table C consists of 20000 records and is used exclusively in
Delhi.
Answer:-
In the distributed database system, databases are geographically separated across the sites
that share no physical components and are separately administered and are connected
with one another through various communication media such as high-speed networks or
telephone lines.
1. Sharing Data: The primary advantage of DDBMS is the ability to share and access data in an
efficient manner.
DDBMS provides an environment where users at a given site are able to access data
stored at other sites
For example, consider an organization having many branches, each branch stores data
related to that branch however a manager of a particular branch can access information of
any branch.
In distributed database systems, since, data are distributed across several sites, failure of a
single site does not halt the entire system. The other sites can continue to operate in case
of failure of one site.
Only the data that exist at the failed site cannot be accessed. This means, some of the data
may be inaccessible but other parts of database may still be accessible to the users.
Furthermore, if the data are replicated at one or more sites, further improvement can be
achieved.
3. Autonomy: The possibility of local autonomy is the major advantage of distributed database
system.
Local autonomy implies that all operations at a given site are controlled by that site. It
should not depend on any other site. Further, the security, integrity and representation of
local data are under the control of the local site.
4. Easier expansion: Distributed systems are more modular, hence, they can be expanded easily
as compared to centralized systems, where upgrading a system with changes in hardware and
soft-ware affects the entire database.
In a distributed system, the size of the database can be increased, and more processors or
sites can be added as needed with little effort.
In distributed DBMS, the relations are stored across several sites using two methods,
namely, fragmentation and replication.
BOOK2 = category=Textbook(BOOK)
BOOK3 = category=LanguageBook(BOOK)
Here each fragment consist of tuples of Book relation that belongs to a particular
category. Further the fragments are disjoints.
R as Ri = Ri (R)
The relation should be fragmented in such a way that the original relation can be
reconstructed by applying the natural join on fragments. That is
R=R1 R2 Rn
Mixed Fragmentation: It is the combination of horizontal and vertical fragmentation.
The fragmentation obtained by horizontally fragmenting a relation can be further
partitioned vertically or vice versa. The original relation is obtained by the combination
of join and union operation.
c) Access and justify which type of fragmentation you would choose for fragmenting table B
data.
Answer:-
Fragmentation is a technique in which a relation is divided into several fragments and each
fragment can be stored at sites where they are more often accessed. Fragmentation can be of
three types namely Horizontal, Vertical and Mixed.
So based upon all the three types of fragmentation techniques and the scenario, we would use
Horizontal Fragmentation Technique because according to the scenario first 30000 records
from Table B are used in one Delhi and rest are used in Banglore with all the attributes, so
stress must be given on the the tuples instead of attributes. In that case, we would use
Horizontal Fragmentation technique to split 75000 records of table B. Also, in case of
constructing the original records, union of these two fragments will be taken so as to
reconstruct the original relation.
d) In case company do not intend to implement full replication during first phase which table
should not be replicated and why?
Answer:-
Question 6
Briefly describe Recursive Triggers in context of trigger and by using appropriate example show
how triggers can cause recursion.
Answer:-
Recursion occurs when same code is executed again and again. It can lead to infinite loop and
which can result to governor limit sometime. Sometime it can also result in unexpected output. It
is very common to have recursion in trigger which can result to unexpected output or some error.
So we should write code in such a way that it does not result to recursion.
Example:-
In the above example, the trigger fires on after updation in any row of table tbAirport. But at the
same time, this trigger updates some objects of the same table which in-turn results to an infinite
loop.
In-short creating an after update statement trigger on one table that itself issues an update
statement on the same table, causes the trigger to fire recursively until it has run out of memory.
Question 7
book table(isbn, book title, category, price, copyright date, year, page count, pid)
a) retrieve city, phone and url of the author whose name is lewis.
Question 8
Employee(fname,minit,lname,essn,bdate,address,salary,superssn,dno,sex)
Department(dname,dno,mgrssn,mgrstart_date)
Department_location(dno,dlocation)
Works_on(essn,pno,hours)
Project(pname,pno,plocation,dno)
Dependent(essn,dependent_name,bdate,sex,relationship)
a) Whenever employee project assignment are changed, check if total hours per week spent on
employees project are less than 30 or greater than 40. If so notify the employees supervisor.
Answer:-
b) Whenever employee is deleted, delete the project tuple and dependent tuple related to that
employee and if employee is managing a department or supervising any employees, set the
mgrssn for that department to null and set the superssn for those employees to null.
Answer:-
Question 9
customer(custno,cname,city)
order(orderno,orderdate,custno,amount)
order_item(orderno,itemno,qty)
item(itemno,unitprice)
--- on the basis of relational schema write the relational algebra query
a) retrieve the number and date of order placed by customer residing in city delhi.
Answer:-
b) retrieve the number and unit price of items for which an order of quantity greater than 50
is placed.
Answer:-
Select i.itemno, i.unitprice from item i, order_item o where i.itemno=o.itemno and o.qty
>50;
Question 10
Question 11
Serial Schedule :-
Question 12
Answer:-
Question 13
Consultant offer different type of services to customer. Consultants are looking into possibilities
of investigating in data mining applications but unsure of potential benefits about such
applications. So, the consultant has requested your expertise as IT consultant to shed some light
in this area and address following aspects:
Simply storing information in a data warehouse does not provide the benefits the
health care consultant is seeking.
It is difficult for business analyst to identified business trends and relationships in the
data using simple SQL queries and report generating tools.
Data mining is one of the best ways to discover information within data warehouse
that queries and reports cannot effectively reveal.
b) Discuss any 3 applications of data mining which would benefit the consultant.
Answer:-
c) Describe using appropriate examples 4 problems that consultant face with data mining
Answer:-
d) Describe using appropriate examples how OLAP operations would benefit the agency.
Answer:-
Question 14
Why relational database management system is still widely used despite of emergence of object
oriented database. Provide 4 reasons.
Answer:-
There are several reasons as to why the relational model has gained its popularity until
now.
1. The model is well supported by mathematical concepts which result in the model as
simple and easy to understand.
2. The ability to perform complex queries using a query language, SQL which fits well
with the relational model exhibits that it is still relevant for organizations which
incorporated IT solution in their business operations to maximize ROI.
3. Relational DBMSs are currently the dominant database technology and because
business has invested so much money and resources in their development that change
is prohibitive. Moreover the relational model is easy to use and simple to understand.
4. Relational database are mature and extensively tested, while object oriented model is
new and there is a general shortage of experienced quality programmers.
5. In relational database model programmers know how to optimize for high speed
retrieval but object oriented database involves performance concerns.
Question 15
Star, Snowflake schema offer important advantage in data warehouse. Illustrate any 5 advantage
of star and snowflake schema of data warehouse.
Answer:-
STAR SCHEMA
The star schema is the simplest data warehouse schema, which consists of a fact table
with a single table for each dimension.
The centre of the star schema consists of a large fact table and the points of the star are
the dimension tables.
ADVANTAGES
Simple structure of the data Easy to understand how elements are connected.
Simplifies the reporting of the information.
Queries more effective The queries in these systems are usually simpler since the data
doesnt follow some strict rules of normalization. Another reason for this is the lesser
number of tables to join.
Performance enhancements The performance has substantial gains due to the de-
normalized form of the data.
Optimized for large data sets Due to the best performance of the system and its
queries, the star schema is efficient on data warehouses or data marts with huge data sets.
Rapid aggregational actions Tasks like sum, average, count, and others are performed
quickly on this systems.
The snowflake schema is a variation of star schema which may have multiple levels of
dimension tables.
ADVANTAGES
Better Data Quality The information stored on the dimensions has usually far less
anomalies than in star schema.
Less Storage Used Due to the optimization on the dimension tables, a lot of the storage
space is spared with the significant decrease of data replication.
Better Specific Query Performance Specific views are optimized by this structure
since it is built to support them because such queries are optimized.
Optimized Tools There are several tools built to work with this kind of data
organization.
More Structured Data Information is obviously much more organized than in non-
normalized structures.
Question 16
List out the selection criteria to select perfect DBMS. Describe any 5 parameters to prove your
answer.
Answer:-
Ease of use
Ease of administration
Reliability
Cost
Security
Compatibility
Minimum requirements
Question 17
Critically analyze how data mart differs from data warehouse and identify main reasons for
implementing data marts.
Answer:-
Data Mart: - A data mart is one piece of a data warehouse where all the information is related to
a specific business area. Therefore it is considered a subset of all the data stored in that particular
database, since all data marts together create a data warehouse.
Data Warehouse: - A data warehouse (DW) is a repository of suitable operational data (data that
document the everyday operations of an organization) gathered from multiple sources, stored
under a unified schema, at a single site. It can successfully answer any ad hoc, complex,
statistical or analytical queries.
Following comparison is divided on several key points that explain the differences between
them:-
Data Scope
The first, and most obvious difference is the information scope each one stores. On one hand,
data warehouses save all kinds of data related to system. On the other hand, data marts just store
specific subject information, becoming much more focused on these functionalities.
Size
Based on the definitions we can say that a data warehouse is usually much bigger than data
marts, because it keeps a lot more data.
Integration
As you may know, a data warehouse usually integrates several sources of data in order to feed its
database and the systems needs. In opposite, a data mart has a lot less integration to do, since its
data is very specific.
Creation
Creating a data warehouse is way more difficult and time consuming than building a data mart.
Building all the structure relationships between data its a long and very important step. Plus you
need to think and analyze how you will integrate all of your information sources. Since data
marts are smaller and subject oriented, these actions tend to be much simpler.
However a well built data warehouse can support large systems for the long run. In the other
hand a good data mart is only limited to its activity area.
Management
Like creation, the management of data warehouses is far more complex than data marts. For the
same reasons stated above, it is obvious that when you have a lot more data, relationships,
processes to manage, it becomes a harder task.
Cost
In overall, in terms of cost, data marts are cheaper than data warehouse. To build and maintain a
data warehouse you need significantly more physical resources like servers, disk space, memory
and cpu. Due to the complexity of the systems, a data mart requires less time to build and
operate. So, since time is money, we can easily reach to our conclusion.
Performance
The performance of a system always depends on how it is built, the infrastructure which supports
it, the processes, the number of users, etc. However, due to some previous conclusions, is safe to
say that usually a data mart is more performant than a data warehouse because of the inherited
complexity.
Because they need a mart for data mining. Processing a data mining model (training,
predictive analysis) creates heavy work load and we dont want it to affect the
performance of the central/core warehouse.
Because they want to change the data to simulate some business scenarios. They cant
change the data in the core warehouse (it will affect everybody!) so we provide a data
mart for them to running their scenarios.
To ease the query workload on the warehouse. Say 50% of the reports are querying one
certain fact table (and its dimensions). To enlighten the burden of the warehouse we
create a copy of that fact table (and its dimension) in a separated database located on
different server, and point some of the reports that way. This mart is refreshed/ updated
every day.
Question 18
Consider the following sql queries. which of these queries is faster to execute and why, draw the
query tree also.
a) select category, count(*) from book where category !=novel group by category.
b) select category, count(*) from book group by category having category !=novel
Answer:-
Question 19
c) and then derive teir optimised query tree after applying heuristics on them.
Answer:-
Question 20
Explain using schematic representation the architecture of data warehouse giving description of
each component of data warehouse.
Answer:-
Data sources: Large companies have various data sources, which include operational
databases (databases of the organizations at various sites) and external sources such as
Web, purchased data, etc.
These data sources may have been constructed independently by different groups and
likely to have different schemas.
If the companies want to use such diverse data for making business decisions, they need
to gather these data under a unified schema for efficient execution of queries.
ETL process: After the schema is designed, the warehouse must acquire data so that it
can fulfill the required objectives.
Acquisition of data for the warehouse involves the following steps:
3. The cleaned and transformed data is finally loaded into the warehouse.
Data are partitioned, and indexes or other access paths are built for fast and efficient
retrieval of data. Loading is a slow process due to the large volume of data.
For instance, loading a terabyte of data sequentially can take weeks and a gigabyte can
take hours. Thus, parallelism is important for loading warehouses. The raw data
generated by transaction-processing system may be too large to store in a data
warehouse; therefore, some data can be stored in the summarized form.
Thus, additional preprocessing such as sorting and generation of summarized data is
performed at this stage. This entire process of getting data into the data warehouse is
called extract, transform and load (ETL) process.Once the data are loaded into a
warehouse, it must be periodically refreshed to reflect the updates on the relations at the
data sources and periodically purge old data.