Vous êtes sur la page 1sur 9

FALL 2013 ASSIGNMENT

PROGRAM: BACHELOR OF COMPUTER APPLICATION


SEMESTER 6TH SEM
SUBJECT CODE & NAME
BC0058 DATA WAREHOUSING

CONTACT ME TO GET FULLY SOLVED SMU


ASSIGNMENTS/PROJECT/SYNOPSIS/EXAM GUIDE PAPER
Email Id: mrinal833@gmail.com
Contact no- 9706665251/9706665232/
www.smuassignmentandproject.com
COST= 100 RS PER SUBJECT

Q. No 1Differentiate between OLTP and Data Warehouse.


ANSWER: Data Warehousing and Online Analytical Processing
29 out of 32 rated this helpful - Rate this topic
A data warehouse is often used as the basis for a decision-support system (also referred to from an
analytical perspective as a business intelligence system). It is designed to overcome some of the problems
encountered when an organization attempts to perform strategic analysis using the same database that is
used to perform online transaction processing (OLTP).
A typical OLTP system is characterized by having large numbers of concurrent users actively adding and
modifying data. The database represents the state of a particular business function at a specific point in
time, such as an airline reservation system. However, the large volume of data maintained in many OLTP
systems can overwhelm an organization. As databases grow larger with more complex data, response time
can deteriorate quickly due to competition for available resources. A typical OLTP system has many users
adding new data to the database while fewer users generate reports from the database. As the volume of
data increases, reports take longer to generate.
As organizations collect increasing volumes of data by using OLTP database systems, the need to analyze
data becomes more acute. Typically, OLTP systems are designed specifically to manage transaction
processing and minimize disk storage requirements by a series of related, normalized tables. However,
when users need to analyze their data, a myriad of problems often prohibits the data from being used:

Users may not understand the complex relationships among the tables, and therefore cannot
generate ad hoc queries.
Application databases may be segmented across multiple servers, making it difficult for users to
find the tables in the first place.
Security restrictions may prevent users from accessing the detail data they need.
Database administrators prohibit ad hoc querying of OLTP systems, to prevent analytical users
from running queries that could slow down the performance of mission-critical production
databases.
By copying an OLTP system to a reporting server on a regularly scheduled basis, an organization can
improve response time for reports and queries. Yet a schema optimized for OLTP is often not flexible
enough for decision support applications, largely due to the volume of data involved and the complexity
of normalized relational tables.
For example, each regional sales manager in a company may wish to produce a monthly summary of the
sales per region. Because the reporting server contains data at the same level of detail as the OLTP
system, the entire month's data is summarized each time the report is generated. The result is longerrunning queries that lower user satisfaction.
Additionally, many organizations store data in multiple heterogeneous database systems. Reporting is
more difficult because data is not only stored in different places, but in different formats.
Data warehousing and online analytical processing (OLAP) provide solutions to these problems. Data
warehousing is an approach to storing data in which heterogeneous data sources (typically from multiple
OLTP databases) are migrated to a separate homogenous data store. Data warehouses provide these
benefits to analytical users:
Data is organized to facilitate analytical queries rather than transaction processing.
Differences among data structures across multiple heterogeneous databases can be resolved.
Data transformation rules can be applied to validate and consolidate data when data is moved
from the OLTP database into the data warehouse.
Security and performance issues can be resolved without requiring changes in the production
systems.
Sometimes organizations maintain smaller, more topic-oriented data stores called data marts. In contrast
to a data warehouse which typically encapsulates all of an enterprise's analytical data, a data mart is
typically a subset of the enterprise data targeted at a smaller set of users or business functions.
Whereas a data warehouse or data mart are the data stores for analytical data, OLAP is the technology that
enables client applications to efficiently access the data. OLAP provides these benefits to analytical users:
Pre-aggregation of frequently queried data, enabling a very fast response time to ad hoc queries.
An intuitive multidimensional data model that makes it easy to select, navigate, and explore the
data.
A powerful tool for creating new views of data based upon a rich array of ad hoc calculation
functions.
Technology to manage security, client/server query management and data caching, and facilities
to optimize system performance based upon user needs.
The terms data warehousing and OLAP are sometimes used interchangeably. However, it is important to
understand their differences because each represents a unique set of technologies, administrative issues,
and user implications.
SQL Server Tools for Data Warehousing and OLAP

Microsoft SQL Server provides several tools for building data warehouses and data marts, and OLAP
systems. Using DTS Designer, you can define the steps, workflow, and transformations necessary to build
a data warehouse from a variety of data sources. After the data warehouse is built, you can use Microsoft
SQL Server OLAP Services, which provides a robust OLAP server that can be used to analyze data stored
in many different formats, including SQL Server and Oracle databases.
2 What are the key issues in Planning a Data Warehouse
ANSWER: Bad planning and improper project management practice is the main factor in Data
Warehouse project failures. First of all, make sure that your company really needs data warehouse for
their business support. Then, prepare criteria for assessing the value expected from data warehouse.
Decide the software on this project and make sure where the data warehouse will collects its data sources.
You need to make rules on who will be using the data and who will operate the new systems. Next we
will elaborate one by one this step in planning your data warehouse for your company.
Important Key Issues How to make sure that the company is really needs the data warehouse? The best
way to find out is by answers the important key issues in planning your data warehouse. Following items
is
the
key
questions
in
prepare
your
data
warehouse
importance.
1.

Value

and

Expectations.

Will your data warehouse help the management to do better planning? Will this system help them make
the right decisions? How much this system could increase the company market share? What is
management expectation with this data warehouse?, all these questions is the starting point to valuate
your
project
planning.
Those all questions is the end to end guidelines in the all project phase. Whenever the project face the
difficulties and required the best solution, just simply go back to those primary questions.
2.

Risk

Assessment.

Assessment of risks in IT project is more than calculating the loss from the project costs. We should also
consider the risk for the company if they not implemented the system, how many opportunities will be
missed by the company? What possible impact if the project is not finished by the plan for the company
business plan? All of these need to include in your assessment of risk, besides also the loss of the project
costs.
3. Top-down or Bottom-up. Top-down approach starts at the enterprise-wide data warehouse. Data from
the large enterprise-wide process by data warehouse and used in the departmental and subject data marts.
Bottom-up approach starts from individual data marts to make the enterprise data warehouse.
The two approach that will be taken in your company data warehouse need to consider this following
things: - Do you have enough resources, time and budget to build a corporate-wide data warehouse? But
may advantages from the fully unified data warehouse. - Or your company need to prove the data

warehouse usefulness by implementing small amount of data marts and continues to another data marts.
4. Build or Buy. In a data warehouse, there is a large range of functions, such as: Data extraction, Data
transformation and loading data from storage. You have to decide are you going to buy all these function
from vendor or making some function customize with your own company business needs.
5.

Single

Vendor

or

Best-of-Breed.

Choosing a single vendor solution has a few advantages: - High level of integration among the tools Constant look and feel - Seamless cooperation among components - Centrally managed information
exchange
Overall
price
negotiable
Advantages on choosing the best-of-breed vendor selection: - Could build an environment to fit your
organization - No need to compromise between database and support tools - Select products best suited
for the function

3 Explain Source Data Component and Data Staging Components of Data Warehouse Architecture.
ANSWER: Data Warehouse Architecture
Different data warehousing systems have different structures. Some may have an ODS (operational data
store), while some may have multiple data marts. Some may have a small number of data sources, while
some may have dozens of data sources. In view of this, it is far more reasonable to present the different
layers of a data warehouse architecture rather than discussing the specifics of any one system.
In general, all data warehouse systems have the following layers:

Data Source Layer


Data Extraction Layer
Staging Area
ETL Layer
Data Storage Layer
Data Logic Layer
Data Presentation Layer
Metadata Layer
System Operations Layer

The picture below shows the relationships among the different components of the data warehouse
architecture:

Each component is discussed individually below:

Data Source Layer


This represents the different data sources that feed data into the data warehouse. The data source can be of
any format -- plain text file, relational database, other types of database, Excel file, etc., can all act as a
data source.
Many different types of data can be a data source:

Operations -- such as sales data, HR data, product data, inventory data, marketing data, systems data.
Web server logs with user browsing data.
Internal market research data.
Third-party data, such as census data, demographics data, or survey data.

All these data sources together form the Data Source Layer.
Data Extraction Layer
Data gets pulled from the data source into the data warehouse system. There is likely some minimal data
cleansing, but there is unlikely any major data transformation.
Staging Area
This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having
one common area makes it easier for subsequent data processing / integration.
ETL Layer
This is where data gains its "intelligence", as logic is applied to transform the data from a transactional
nature to an analytical nature. This layer is also where data cleansing happens. The ETL design phase is
often the most time-consuming phase in a data warehousing project, and an ETL tool is often used in this
layer.
Data Storage Layer
This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities
can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you
may have just one of the three, two of the three, or all three types.
Data Logic Layer
This is where business rules are stored. Business rules stored here do not affect the underlying data
transformation rules, but do affect what the report looks like.
Data Presentation Layer

This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in
a browser, an emailed report that gets automatically generated and sent everyday, or an alert that warns
users of exceptions, among others. Usually an OLAP tool and/or a reporting tool is used in this layer.
Metadata Layer
This is where information about the data stored in the data warehouse system is stored. A logical data
model would be an example of something that's in the metadata layer. A metadata tool is often used to
manage metadata.
System Operations Layer
This layer includes information on how the data warehouse system operates, such as ETL job status,
system performance, and user access history.

4 Discuss the Extraction Methods in Data Warehouses.


ANSWER: The extraction methods in data warehouse depend on the source system, performance and
business requirements. There are two types of extractions, Logical and Physical. We will see in detail
about
the
logical
and
physical
designs.
Logical
There

extraction
are

two

types

of

logical

extraction

methods:

Full Extraction: Full extraction is used when the data needs to be extracted and loaded for the first time.
In full extraction, the data from the source is extracted completely. This extraction reflects the current data
available
in
the
source
system.
Incremental Extraction: In incremental extraction, the changes in source data need to be tracked since
the last successful extraction. Only these changes in data will be extracted and then loaded. These changes
can be detected from the source data which have the last changed timestamp. Also a change table can be
created in the source system, which keeps track of the changes in the source data.
One more method to get the incremental changes is to extract the complete source data and then do a
difference (minus operation) between the current extraction and last extraction. This approach causes a
performance
issue.
Physical
The

extraction
data

can

be

extracted

physically

by

two

methods:

Online Extraction: In online extraction the data is extracted directly from the source system. The

extraction

process

connects

to

the

source

system

and

extracts

the

source

data.

Offline Extraction: The data from the source system is dumped outside of the source system into a flat
file. This flat file is used to extract the data. The flat file can be created by a routine process daily.

5 Define the process of Data Profiling, Data Cleansing and Data Enrichment.
ANSWER: Data quality is a critical factor for the success of enterprise intelligence initiatives. Bad data
on one system can easily and rapidly propagate to other systems. If information shared across the
organisation is contradictory, inconsistent or inaccurate, then interactions with customers, suppliers and
others will be based on inaccurate information, resulting in higher costs, reduced credibility and lost
business.
SAS Data Integration provides a single environment that seamlessly integrates data quality within the
data integration process, taking users from profiling and rules creation through execution and monitoring
of results. organisations can transform and combine disparate data, remove inaccuracies, standardise on
common values, parse values and cleanse dirty data to create consistent, reliable information.
Rules can be built quickly while profiling data, and then incorporated automatically into the data
transformation process. This speeds the development and implementation of cleansed data. A workflow
design environment facilitates the easy augmentation of existing data with new information to increase
the usefulness and value of all enterprise data.
Key Benefits

Speeds the delivery of credible information by embedding data quality into batch and real-time
processes.

Reduces costly errors by preventing the propagation of bad data and correcting mistakes at the
source.

Keeps data current and accurate with regular auditing and cleansing.
standardises data from multiple sources and reduces redundancy in corporate data to support
more accurate reporting, analysis and business decisions.
Adds value to existing data by generating and/or appending information from other sources.
Key Features
Database/data warehouse/data mart cleansing through a variety of techniques, including
standardization, transformation and rationalization, while maintaining an accurate audit trail.

Data profiling to identify incomplete, inaccurate or ambiguous data.

Data enrichment and augmentation.

Create reusable data quality business rules that are callable through custom exits, message queues
and Web services.
Real-time transaction cleansing using standard business rules.

Data summarization. Compress large static databases into representative points making them
more amenable for subsequent analysis.

Support for more than 20 worldwide regions with specific language awareness and localizations.

6 What is Metadata Management? Explain Integrated Metadata Management with a block


diagram.
ANSWER: Meta-data management (also known as metadata management, without the hyphen)
involves managing data about other data, whereby this "other data" is generally referred to
as content data. The term is used most often in relation to Digital media, but older forms of metadata are
catalogs, dictionaries, and taxonomies. For example, the Dewey Decimal Classification is a metadata
management system for books developed in 1876 for libraries.
Tools for data profiling, data modeling, data transformation, data quality, and business intelligence play a
key role in data integration. The integrated metadata management capabilities of IBM InfoSphere
Information Server enable these tools to work together to meet your enterprise goals.
Metadata management in InfoSphere Information Server offers many advantages:

Sharing metadata throughout the suite from a single metadata repository creates accurate,
consistent, and efficient processes.
Changes that you make to source systems can be quickly identified and propagated throughout
the flow of information.
You can identify downstream changes and use them to revise information in the source systems.
You can track and analyze the data flow across departments and processes.
Metadata is shared automatically among tools.
Glossary definitions provide business context for metadata that is used in jobs and reports.
Data stewards take responsibility for metadata assets such as schemas and tables that they have
authority over.
By using data lineage, you can focus on the end-to-end integration path, from the design tool to
the business intelligence (BI) report. Or you can drill down to view any element of the lineage.
You can eliminate duplicate or redundant metadata to create a single, reliable, version that can be
used by multiple tools.
Managing
metadata
The metadata repository of IBM InfoSphere Information Server stores metadata from suite tools
and external tools and databases and enables sharing among them. You can import metadata into
the repository from various sources, export metadata by various methods, and transfer metadata
assets between design, test, and production repositories.

CONTACT ME TO GET FULLY SOLVED SMU


ASSIGNMENTS/PROJECT/SYNOPSIS/EXAM GUIDE PAPER
Email Id: mrinal833@gmail.com
Contact no- 9706665251/9706665232/
www.smuassignmentandproject.com
COST= 100 RS PER SUBJECT

Vous aimerez peut-être aussi