Vous êtes sur la page 1sur 19

Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

Decision Support System (DSS)


 Decision Support System (DSS) is a computer based information system that supports business or
organizational decision making activities.
 Components of DSS are the database (or knowledge base), the model (decision context and user
criteria and the user interface.
 As technology evolved new computerized decision support applications were developed and studied.
Researchers used multiple frameworks to help built and understand these systems.
 The types of DSS are communication driven DSS, data driven DSS, Document driven DSS,
Knowledge driven DSS and model driven DSS. One can organize the history of DSS into one of the
five broad DSS category.
 The Application areas include medical diagnosis, Enterprise decision Management, Expert system,
Predictive analysis and it is extensively used in business and management.

DSS evolution relates to

 changing DSS features or components over time,


 changing technology on which the system is used,
 getting more efficient algorithms over time,
 evolving knowledge in the system over time,
 Changing users and user preferences over time.

Model driven DSS

 First widely discussed DSS is model driven DSS.


 A model driven DSS emphasizes access to and manipulation of financial optimization and/or
simulation models. Simple quantitative models provide the most elementary level of functionality.
 Model driven DSS use limited data and parameters provided by decision makers to aid decision
makers in analyzing a situation but in general large databases are not needed for model driven DSS.
 The first commercial tool for building model driven DSS using financial and quantitative models was
called IFPS, an acronym for Interactive Financial Planning System.
 As computerized models became more numerous, research focused on model management and on
enhancing more diverse types of models for use in DSS such as multicriteria optimization and
simulation models.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

Data Driven DSS

 In a general, Data driven DSS emphasizes access to and manipulation of a time series of internal
company data and sometimes external and real time data.
 Simple file systems accessed by the query and retrieval tools provides the most elementary levels of
functionality.
 Data driven DSS with online analytical processing provides the highest level of functionality and
decision support that is linked to analysis of large collections of historical data . Executive
Information systems are the examples of Data Driven DSS.
 One of the first data driven DSS was built using an APL based software package called AAIMS an
Acronym for An Analytical Information Management System.
 Business Intelligence (BI) is sometimes used interchangeably with books, report and query tools and
executive information system. In general Business Intelligence systems are Data driven DSS.

Communication Driven DSS

 Communication driven DSS use network and communications technologies to facilitate decision
relevant collaboration and communication.
 In these systems, communication technologies are the dominant architectural component. Tools
include groupware, video conferencing and computer based bulletin boards are primary technologies.
 In past few years voice and video delivered using internet protocol have greatly expanded the
possibilities for synchronous communication driven DSS.

Document Driven DSS

 A document driven DSS uses computer storage and processing technologies to provides document
retrieval and analysis.
 Large document databases may include scanned document, hypertext documents, images, sounds and
videos.
 The WWW technologies significantly increased the availability of documents and facilitated the
development of Document driven DSS.

Knowledge Driven DSS

 Knowledge based DSS can suggest or recommend actions to managers. These DSS are person
computer systems with specialised problem solving expertise.
 These system have been called as suggestion DSS.
 Artificial Intelligence systems have been developed to detect fraud and expedite financial
transactions; many medical diagnostic systems have been based on AI.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

 MYCIN project for blood disease diagnosis is example of knowledge based DSS.

Web Based DSS

 Beginning in approximately1995, the World Wide Web (WWW) global internet provided a
technology platform for further extending the capabilities and development of Computerized decision
support.
 Release of HTML with form tag and tables was turning point in development of Web based DSS.
 A Web based decision support system delivers decision support information to manager using Web
browsers like Netscape Navigator or Internet Explorer.

What is Data warehousing?


Ans:- A Complete repository of historical corporate data extracted from transaction systems that is available
for ad-hoc access by knowledge workers.

Or A Data warehouse refers to a data repository that is maintained separately from organizations operational
databases. DW systems allows for integration of a variety of allocation systems

Or According to Williams H. Inmon “Data warehouse is a subject oriented, integrated, time variant and
non volatile collection of data in support of management’s decision making process”

Or A single, complete and consistent store of data obtained from a variety of different sources made
available to end users in what they can understand and use in business context.

Or A Data warehouse is a subject oriented, integrated, time variant and non volatile collection of huge
amount of data which helps in management decision making process.

What are the different characteristics of data warehousing?


Or
What are features of the data warehousing?

Ans:- Fallowing are the charactestics of data warehousing.

 Subject Oriented: A data warehouse is organized around major subjects such as customer, supplier.
Product and sales. Rather than concentrating on day-to –day operations and transaction processing of
an organization, data warehouse focuses on the modelling and analysis of data for decision makers.
Hence data warehouses typically provide a simple and concise view of particular subject issue by
excluding data that are not useful in decision support system.
 Integrated: A Data Warehouse is usually constructed by integrating multiple heterogeneous sources,
such as a relational databases, flat files, and online transaction records. Data cleaning and integration
techniques are applied to ensure consistency in naming conventions, encoding structure, attributes
measurement and so on.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

 Time variant: Data are stored to provide information from an historic perspective (e.g. past 5-10
years).Every key structure in the data warehouse contains either implicitly or explicitly, a time
variant. Time Variant nature of the data in a data warehouse allows for analysis of the past, relates
information to the present, and enables forecast for the future.
 Non Volatile: Non volatile means, once data entered into the warehouse, data should not change. A
Data warehouse is always a physically separate store of data transformed from the application data
found in operational environment. Due to this separation, data warehouse does not require transaction
processing, recovery and concurrency mechanisms.
 Data Granularity: When a user queries the data warehouse for analysis, they usually start by
looking at summary data. Therefore we find it efficient to keep data summarized at different levels.
Depending on the query, we can then go to the particular level of the detail and satisfy the query.
Data granularity refers to the level of details. The lower the level of details, finer the data granularity.

Q.) What do you mean by Strategic Information? Describe its characteristics features.(W-2015)
Or
Explain the Compelling need for data warehousing.

Ans: A Strategic Information (SI) is a information that helps companies change or otherwise alter their
business strategy and/or structure. It is typically utilized to streamline and quicken the reaction time to
environmental changes and aid it in achieving a competitive advantage.
The executives and managers who are responsible for keeping the enterprise competitive need
information to make proper decisions. They need information to formulate the business strategies, establish
goals and monitor results.
The type of information needed to make decisions in the formulation and execution of business
strategies and objectives are broad-based and encompass the entire organization. We may combine all these
types of essential information into one group and call it strategic information.
Processing large volume of data and providing interactive analysis requires extra computing power.
The explosive increase in computing power and its lower costs make provision of strategic information
feasible.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

Characteristics features of Information System:

What are the Application areas of Data Warehouses?


Ans:
a data warehouse helps business executives to organize, analyze, and use their data for decision making.
Data warehouses are widely used in the following fields:
 Financial services
 Banking services
 Consumer goods
 Retail sectors
 Controlled manufacturing
 Weather forecasting
 Medical diagnosis
Q) Explain in Detail life- cycle of Data warehouse System.
Ans:
Following are the main phases for the DW life-cycle

DW planning: This phase is aimed at determining the scope and the goals of the DW, and determines the
number and the order in which the data marts are to be implemented according to the business priorities and
the technical constraints .At this stage the physical architecture of the system must be defined .

Data mart design and implementation: This macro-phase will be repeated for each data mart to be
implemented and will be discussed in more detail in the following. At each iteration a new data mart is
designed and deployed. Multidimensional modelling of each data mart must be carried out considering the
available conformed dimensions and the constraints deriving from previous implementations.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

DW maintenance and evolution: DW maintenance mainly concerns performance optimization that must be
periodically carried out due to user requirements that change according to the problems and the opportunities
the managers run into. On the other hand, DW evolution concerns keeping the DW schema up-to-date with
respect to the business domain and the business requirement changes.

Figure. The main phases for the DW life-cycle.

Data mart design and implementation goes through fallowing phases.


 Requirement analysis: it identifies which information is relevant to the decisional process by either
considering the user needs or the actual availability of data in the operational sources.
 Conceptual design: aims at deriving an implementation-independent and expressive conceptual
schema for the DW, according to the conceptual model chosen.
 Logical design: takes the conceptual schema and creates a corresponding logical schema on the
chosen logical model. While nowadays most of the DW systems are based on the relational logical
model (ROLAP).
 ETL process design: designs the mappings and the data transformations necessary to load into the
logical schema of the DW the data available at the operational data source level.
 Physical design: addresses all the issues specifically related to the suite of tools chosen for
implementation – such as indexing and allocation.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

Q.) Define Knowledge Discovery process and Explain KDD process.


Ans:

 Knowledge Discovery from Data (KDD) is the process of discovering useful knowledge from
collection of data.
 Major KDD application areas include marketing, fraud detection and telecommunications.
 KDD process includes fallowing iterative useful steps.
 Data Cleaning: This step is use to remove noise and inconsistent data.
 Data Integration: In this step multiple data sources may be combined.
 Data Selection: In this step, the data which is relevant to the analysis task are retrieved from the
database. On other hand data which is not relevant for analysis task is omitted.
 Data Transformation: In this step , data are transformed and consolidated into forms appropriate for
mining by performing summary or aggregation operations.
 Data Mining: This is an essential process where intelligence methods are applied to extract data
patterns.
 Pattern Evaluation: This step is used to identify the truly interesting patterns representing
knowledge based on interestingness measures.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

 Knowledge presentation: In this step visualization and knowledge representation techniques are
used to present mined knowledge to users.

Steps including data cleaning, data integration, data selection and data transformation are the data pre-
processing steps.
Data mining step may interact with the user or knowledge base. The interesting patterns are presented to the
user and may be stored as new knowledge in the knowledge base.
Data mining is the process of discovering the interesting patterns and knowledge from large amount of data.
Data sources may includes databases, data warehouses, Web and other information repository etc.

Q.) Why do you need separate data staging area in DWH? Explain its function.(W-15)
Ans: A staging area is an intermediate storage area used for data processing during the extract, transform
and load (ETL) process. The data staging area sits between the data sources and the data target, which are
often data warehouses, data marts, or other data repositories. It is also called as landing zone,
The primary motivations for their use are to increase efficiency of ETL processes, ensure data integrity and
support data quality operations. The functions of the staging area include the following:

 Consolidation: One of the primary functions performed by a staging area is consolidation of data
from multiple source systems. In performing this function the staging area acts as a large "bucket" in
which data from multiple source systems can be temporarily placed for further processing.
 Alignment: Aligning data includes standardization of reference data across multiple source systems
and validation of relationships between records and data elements from different sources.
 Minimizing contention: The staging area and ETL processes it supports are often designed with a
goal of minimizing contention within source systems.
 Independent scheduling/multiple targets: The staging area can support hosting of data to be
processed on independent schedules, and data that is meant to be directed to multiple targets.

 Change detection: This functionality is particularly useful when the source systems do not support
reliable forms of change detection, such as system-enforced time stamping.

 Cleansing data: Data cleansing includes identification and removal (or update) of invalid data from
the source systems.

 Data archiving and troubleshooting: In this, staging area can be used to maintain historical records
during the load process, or it can be used to push data into a target archive structure.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

Q.) Explain three tier architecture of data warehouse. (S-16)


Ans:

Data Warehouses often adopt three-tier architecture.


1. The bottom tier of the architecture is the data warehouse database server. It is the relational database
system. Back end tools and utilities to feed data into the bottom tier. These back end tools and utilities
perform the Extract, Clean, Load, and refresh functions. The data are extracted using application program
interfaces known as gateways. Gateway is supported by underlying DBMS and allows client programs to
generate SQL code to be executed on servers. Gateways includes ODBC ( Open database Connection) and
JDBC (Java Database Connection). This tier also contains a metadata repository, which stores information
about data warehouses and its contents.
2. Middle tier is an OLAP Server that can be implemented in either of the following ways.
 By Relational OLAP ROLAP, which is an extended relational database management system. The
ROLAP maps the operations on multidimensional data to standard relational operations.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

 By Multidimensional OLAP MOLAP model, which directly implements the multidimensional data
and operations
3. Top-Tier -This tier is the front-end client layer. This layer holds the query tools and reporting tools,
analysis tools and data mining tools ( eg.. Trend analysis, prediction and so on).

Q.) What are different Data Warehouse Models?


Ans: From the perspective of data warehouse architecture, we have the following data warehouse models:
 Virtual Warehouse
 Data mart
 Enterprise Warehouse

Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build a virtual
warehouse. Building a virtual warehouse requires excess capacity on operational database servers.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific groups of
an organization.
In other words, we can claim that data marts contain data specific to a particular group. For example, the
marketing data mart may contain data related to items, customers, and sales. Data marts are confined to
subjects.
Points to remember about data marts:
 Window-based or Unix/Linux-based servers are used to implement data marts. They are
implemented on low-cost servers.
 The implementation data mart cycles is measured in short periods of time, i.e., in weeks rather than
months or years.
 The life cycle of a data mart may be complex in long run, if its planning and design are not
organization-wide.
 Data marts are small in size.
 Data marts are customized by department.
 The source of a data mart is departmentally structured data warehouse.
 Data mart is flexible.

Enterprise Warehouse
 An enterprise warehouse collects all the information and the subjects spanning an entire organization.
 It provides us enterprise-wide data integration.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

 The data is integrated from operational systems and external information providers.
 This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or beyond.
 This type of warehouse can be implemented on traditional mainframes, super computer servers or
parallel architecture platforms.

Q.) Explain the Role of Metadata in building the data Warehouse.(S-16)


Or
What is Metadata? State and Explain its Categories. (W-15)

Ans:- Metadata is simply defined as data about data. The data that are used to represent other data is known
as metadata. Metadata in data warehouse defines the warehouse objects. Metadata acts as a directory. This
directory helps the decision support system to locate the contents of a data warehouse. Metadata is a road-
map to data warehouse. It is created for the data names and definition of given data warehouse. For example,
the index of a book serves as a metadata for the contents in the book.

Metadata Repository
Metadata repository is an integral part of a data warehouse system. It contains the following metadata:
 Business metadata - It contains the data ownership information, business definition, and changing
policies.
 Operational metadata - It includes currency of data and data lineage. Currency of data refers to the
data being active, archived, or purged. Lineage of data means history of data migrated and
transformation applied on it.
 Algorithms used for summarization, which includes measure and dimension definition algorithms,
partitions, subject areas, aggregation summarization and predefined queries and reports..
 Data for mapping from operational environment to data warehouse - It metadata includes
source databases and their contents, data extraction, data partition, cleaning, transformation rules,
data refresh and purging rules.
 Data related to system performance, which includes indices and profiles that improve data access
and retrieval performance, replication cycles etc.

Types of Metadata

Metadata in a data warehouse fall into three major categories:

 Operational Metadata
 Extraction and Transformation Metadata
 End-User Metadata
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

Operational Metadata: Data for the data warehouse comes from several operational systems of the
enterprise. These source systems contain different data structures. The data elements selected for the data
warehouse have various field lengths and data types. In selecting data from the source systems for the data
warehouse, we split records, combine parts of records from different source files, and deal with multiple
coding schemes and field lengths. When you deliver information to the end-users, we must be able to tie that
back to the original source data sets. Operational metadata contain all of this information about the
operational data sources.

Extraction and Transformation Metadata. Extraction and transformation metadata contain data about the
extraction of data from the source systems, namely, the extraction frequencies, extraction methods, and
business rules for the data extraction. Also, this category of metadata contains information about all the data
transformations that take place in the data staging area.

End-User Metadata. The end-user metadata is the navigational map of the data warehouse. It enables the
end-users to find information from the data warehouse. The end-user metadata allows the end-users to use
their own business terminology and look for information in those ways in which they normally think of the
business.

Q.) Differentiate between Operational and Decision-Support Systems. (W-15)

Ans:

 The operational systems such as order processing, inventory control, claims processing, outpatient
billing, and so on are not the signed or intended to provide strategic information. If we need the
ability to provide strategic information, we must get the information from altogether different types
of systems. Only specially designed decision support systems or informational systems can provide
strategic information.

 Operational systems are online transaction processing (OLTP) systems. These are the systems that
are used to run the day-to-day core business of the company. They support the basic business
processes of the company. These systems typically get the data into the database.

 On the other hand, specially designed and built decision-support systems are not meant to run the
core business processes. They are used to watch how the business runs, and then make strategic
decisions to improve the business.

 From the data analyst’s point of view, decision support data differ from operational data in three
main areas: time span, granularity, and dimensionality.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

 Time span: Operational data cover a short time frame. In contrast, decision support data tend to
cover a longer time frame.
 Granularity (level of aggregation): Decision support data must be presented at different levels of
aggregation, from highly summarized to near-atomic.
 Dimensionality: Operational data focus on representing individual transactions rather than on the
effects of the transactions over time. In contrast, data analysts tend to include many data dimensions
and are interested in how the data relate over those dimensions.

Benefits of DSS

 Improves personal efficiency


 Speed up the process of decision making
 Increases organizational control
 Facilitate interpersonal communication

Benefits of Operational database system

 Quick retrieval
 The ability to share information across the company
 Provides simultaneous read/write requests through pre-defined queries
 The amount of data that can be stored that pertains to a business

Q.) Describe in brief the evaluation of database system technology.(S-16)

Ans:
 Database systems are the one of the key enabling forces behind the business transformation.
 Database system technology also needs to be efficient in terms of storage and speed.
 Modern database system thus needs to build high reliability mechanisms in their designs.
 Performance evaluation of database system technology is thus an important concern. Performance
evaluation of database is a non trivial activity make more complicated by the existence of different
flavors of database systems turned for specific requirement.
 Database is the shared resource that is at centre of such system. The databases functionality is
optimal storage and maintains the correctness of the data and maintains the consistency of the
system at all time.
 Database management is complex set of software program that controls the organization, storage,
management and retrieval of data in database.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

 Database management is complex set of software program that allows multiple users to access,
create, update and retrieve the data to and from the database.
 Storage manager is a program module that provides interface between low –level data storage in
the database and application programs and queries submitted to the system. The storage manager is
responsible for fallowing task.
 Such as interaction with file manager, efficient storing, retrieving and updating the data.
 Users are differentiating by the way they want interact with system.Specilazed users, writes
specialized database application that do not fit into in traditional data processing framework.
 Sophisticated users form requests in database query language.
 A naïve Users invoke one of the permanent application programs that have been written previously.
 Data Model is just way of structuring the data. It also defines set of operations that can be
performed on the data. Flat model consists of single, two-dimensional array of data elements.
 Network model organizes data using two fundamental structures called records and sets. Relational
database contains multiple table which similar to one flat database model.
 Dimensional model is often implemented on the top of relational model using star schema
consisting of one table containing the facts and surrounding tables containing the dimensions.
 Object Database models This aims to avoid overhead (referred as independent mismatch) of
converting information.

Que.Why do you need a separate data staging components?


Or What are the building blocks of Data warehouse?
Or Discuss the components of Data warehouse in details.
Ans:
 Data warehouse architecture is the proper arrangement of the components.
 We build data warehouse with software and hardware components.
 In figure shown below, the source data component is shown on left. The data staging component
serves as the next building block. In the middle we have data storage component that manage the
data warehouse data. This component also keeps track of the data by means of metadata repository.
Information delivery component shown on right consist of all the different ways of thinking the
information from data warehouse available to the user.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

Figure: Data warehouse building blocks or components.


A) Source data components
Source data consist of the following data
i) Production Data
 This category of data comes from the various operational systems of the enterprise.
 Data comes from various operational systems varies in different data formats, so it’s challenging
task to standardize and transform the disparate data and also to convert and integrate for string in data
warehouse.
ii) Internal Data
 In every organization, users keep their private spreadsheet, documents, customer profiles and even
departmental databases. This is internal data.
 We cannot ignore the internal data held in private files in our organization.
 Internal data adds additional complexity to the process of transforming and integrating the data
before it can be stored in the data warehouse.
iii) Archived Data
 Operational systems are primarily intended to run the current business. In every operational system,
we periodically take odd data and store it in achieved files.
 Some data is archived after a year. Sometimes data is left in the operational system databases for as
long as five years.
 Many different methods exist for archiving data.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

iv) External Data


 Most executives depend on the data from external sources for a high percentage of the information
they use. They use statistics relating to their industry produced by external agencies.
 The purpose served by such external data sources cannot be fulfilled by the data available within our
organization itself.
B) Data staging component

 After we have extracted data from various operational system external sources , we have to prepare
the data for storing in the data warehouse.
 Extracted data made available are available in different format hence different functions such as
transformation are applied to prepared for loading in staging area.
 Data staging provides a place and area with set of functions to clean,change,combine,convert for
storage and use in the data warehouse.

c) Data storage Component

 Data storage for the data warehouse is a separate repository.


 In the data repository for data warehouse, we need to keep large volume of historical data for
analysis.
 We have to keep the data in the data warehouse in structure suitable for analysis not for quick
retrieval of piece of information. Therefore data storage for data warehouse keep separate from the
data storage for operational systems.
 The data in the operational databases could change from moment to moment.
D) Information delivery component
 The novice user comes to data warehouse with no training and needs preferred reports and present
queries.
 Casual user needs information once in a while, not regularly. This type of user also needs pre-
packaged information.
 The business analyst looks for ability to do complex analysis using the information in the data
warehouse
 Ad hoc reports are predefined reports primarily meant for novice and casual users.
 Information fed into Executive Information Systems (EIS) is meant for senior executives and high
levels managers.
 Some data warehouses are also provides data to data warehouse applications with mining algorithms
to help us discover trends and patterns from usage of our data.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

E) Metadata Component
 Metadata in data warehouse is similar to the data dictionary or data catalog in the database
management system.
 The data dictionary contains data about the data in the database. Similarly metadata component is the
data about the data in data warehouse.
F) Management and Control Component
 This component of the data warehouse architecture site on top of all the other components.
 Management and Control Component coordinates the services and activities within the data
warehouse.
 This component controls the data transformation and data transfer into data warehouse storage.
 It monitors the movement of data into staging area and from there into data warehouse storage itself.
 Management and Control Component interact with the metadata component to perform the
management and control functions.

Que. Information Delivery Methods

Ans: A data warehouse is never static; it evolves as the business expands. As the business evolves, its
requirements keep changing and therefore a data warehouse must be designed to ride with these changes.
Hence a data warehouse system needs to be flexible. The delivery method is a variant of the joint application
development approach adopted for the delivery of a data warehouse.

1. Standard Reports:

Purpose: Provides a pre-made document to provide information needed by user.

Usage: Reports that require infrequent structural changes, and can be easily accessed electronically.

2. Queries

Purpose: Provides ability to data using a pre-defined query, or on an ad hoc basis.

Usage: Research, analysis and reporting.

3. Analytical Applications

Purpose: Provides ability to easily access key performance indicators or metrics.

Usage: Monitoring and accessing performance.

4. OLAP Analysis

Purpose: Alerts users to pre-defined conditions that occur.


Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

Usage: Research and Analysis.

5. Exception Based Reporting

Purpose: Provides ability to perform summary, detailed or trend analysis on requested data.

Usage: Notification without the need to perform detailed analysis.

6. Data Mining

Purpose: Ability to discover hidden trends with the data.

Usage: Research and analysis of hidden trends within the data.

Que. Data Warehouse Design Process

Ans: To design an effective data warehouse we need to understand and analyze business needs and construct
a business analysis framework. A data warehouse can be built using a top-down approach, a bottom-up
approach or a combination of both.

The top-down approach starts with overall design and planning. It is useful in cases where the
technology is mature and well known, and where the business problems that must be solved are clear and
well understood.

The bottom up approach starts with experiments and prototypes. This is useful in the early stage of
business modeling and technology development. It allows an organization to move forward at considerably
less expense and to evaluate the technological benefits before making significant commitments.

In the combined approach, an organization can exploit the planned and strategic nature of the top-
down approach while retaining the rapid implementation and opportunistic application of the bottom-up
approach.

From the software engineering point of view, the design and construction of a data warehouse may consist of
the following steps: planning, requirements study, problem analysis, warehouse design, data integration and
testing, and finally deployment of the data warehouse. Large software systems can be developed using one
of two methodologies: the waterfall method or the spiral method. The waterfall method performs a
structured and systematic analysis at each step before proceeding to the next, which is like a waterfall,
falling from one step to the next. The spiral method involves the rapid generation of increasingly functional
systems, with short intervals between successive releases. This is considered a good choice for data
warehouse development, especially for data marts, because the turnaround time is short, modifications can
be done quickly, and new designs and technologies can be adapted in a timely manner.
Data Warehousing & Mining Prof. J. N. Rajurkar MIET Bhandara.

In general, the warehouse design process consists of the following steps:

1. Choose a business process to model (e.g., orders, invoices, shipments, inventory, account administration,
sales, or the general ledger). If the business process is organizational and involves multiple complex object
collections, a data warehouse model should be followed. However, if the process is departmental and
focuses on the analysis of one kind of business process, a data mart model should be chosen.

2. Choose the business process grain, which is the fundamental, atomic level of data to be represented in the
fact table for this process (e.g., individual transactions, individual daily snapshots, and so on).

3. Choose the dimensions that will apply to each fact table record. Typical dimensions are time, item,
customer, supplier, warehouse, transaction type, and status.

4. Choose the measures that will populate each fact table record. Typical measures are numeric additive
quantities like dollars_sold and units_sold.

Because data warehouse construction is a difficult and long-term task, its implementation scope
should be clearly defined. The goals of an initial data warehouse implementation should be specific,
achievable, and measurable. This involves determining the time and budget allocations, the subset of the
organization that is to be modeled, the number of data sources selected, and the number and types of
departments to be served.

Once a data warehouse is designed and constructed, the initial deployment of the warehouse includes
initial installation, roll-out planning, training, and orientation. Platform upgrades and maintenance must also
be considered.

Various kinds of data warehouse design tools are available. Data warehouse development tools
provide functions to define and edit metadata repository contents (e.g., schemas, scripts, or rules), answer
queries, output reports, and ship metadata to and from relational database system catalogs. Planning and
analysis tools study the impact of schema changes and of refresh performance when changing refresh rates
or time windows.

*********All the Best***********

Vous aimerez peut-être aussi