Vous êtes sur la page 1sur 88

Master of Business Administration (MBA) Assignment Semester-III Name: Vishal Kaluram Tope 521114602 Arihant Education Foundation 02860

MBA III MI0036 Business Intelligence Tools - 4 Credits Set-1 & Set-2

Registration Number: Learning Center Name: Learning Center Code: Course: Semester: Subject: Set No:

Date of Submission at the Learning Center: 31st Oct 2012. Marks Awarded:

Directorate of Distance Learning,Sikkim Manipal UniversityII Floor, Syndicate Building Manipal 576 104

Signature of the Coordinator Evaluator

Signature of the LC

Signature of

Master of Business Administration-MBA Semester 3

MI0036 Business Intelligence Tools - 4 Credits Behavior Assignment Set- 1 (60 Marks) Q.1 Define the term business intelligence tools? Discuss the roles in Business Intelligence project?

Business Intelligence (BI) is a generic term used to describe leveraging the organizational internal and external data, information for making the best possible business decisions. The field of Businessintelligence is very diverse and comprises the tools and technologies used to access and analyze various types of business information. These tools gather and store the data and allow the user to view and analyze the information from a wide variety of dimensions and thereby assist are the decision-makers in make better business decisions. improved decisions in Thus the the BusinessIntelligence (BI) systems and tools play a vital role as far as organizations concerned making current cut throat competitive scenario. In simple terms, BusinessIntelligence is an environment in which business users receive reliable, consistent, meaningful and timely information. This data enables the business users conduct analyses that yield overall understanding of how the business has been, how it is now and how it will be in the near future. Also, the BI tools monitor the financial and operational health of the organization through generation of various types of reports, alerts, alarms, key performance indicators and dashboards. Business intelligence tools are a type of application software designed to help in making better business decisions. These tools aid in the analysis and presentation of data in a more meaningful way and so play a key role in the strategic planning process of an organization. They illustrate business intelligence in the areas of market research and segmentation, customer profiling, customer support, profitability, and inventory and distribution analysis to name a few. Various types of BI systems viz. Decision Support Systems, Executive Information Systems (EIS), Multidimensional

Analysis software or OLAP (On-Line

Analytical Processing) tools, data mining tools are discussed further. Whatever is the type, the Business Intelligencecapabilities of the system is to let its users slice and dice the information from their organization's numerous databases without having to wait for their IT departments to develop complex queries and elicit answers.

Although it is possible to build BI systems without the benefit of a data warehouse, most of the systems are an integral part of the user-facing end of the data warehouse in practice. In fact, we can never think of building a data warehouse without BI Systems. That is the reason; sometimes, the words data warehousing and business intelligence are being used other end for business information. Roles in Business Intelligence project: A typical BI Project consists of the following roles and the responsibilities of each of these roles are detailed below: Project Manager: Monitors the progress on continuum basis and is responsible for the success of the project. Technical Architect: Develops and implements the overall technical architecture of the BI system, from the backend hardware/software to the client desktop configurations. Database Administrator (DBA): Keeps the database available for the applications to run smoothly and also involves in planning and executing a backup/recovery plan, as well as performance tuning. ETL Developer: Involves himself in planning, developing, and deploying the extraction, transformation, and loading routine for the data warehouse from the legacysystems. Front End Developer: Develops the front-end, whether it be client-server or over the web. OLAP Developer: Dexlops the OLAP cubes. Data Modeler: Is responsible for taking the data structure that exists in the enterprise and model it into a scheme that is suitable for OLAP analysis. interchangeably. Below Figure depicts how the data from one end gets transformed to information at the

QA Group: Ensures the correctness of the data in the data warehouse. Trainer: Works with the end users to make them familiar with how the front end is set up so that the end users can get the most benefit out of the system.

Q.2. What do you mean by data ware house? What are the major concepts and terminology used in the study of data ware house?

In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through anoperational data store for additional operations before it is used in the DW for reporting. A data warehouse maintains its functions in three layers: staging, integration, and access. Staging is used to store raw data for use by developers. The integration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users. Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse.

This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support (Marakas & O'Brien 2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata. A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by William Inmon: Subject Oriented Integrated Nonvolatile Time Variant

Subject Oriented Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. Integrated Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.

Time Variant In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant. DATA WAREHOUSE TERMINOLOGY Bruce W. Johnson, M.S. Ad Hoc Query: A database search that is designed to extract specific information from a database. It is ad hoc if it is designed at the point of execution as opposed to being a canned report. Most ad hoc query software uses the structured query language (SQL). Aggregation: The process of summarizing or combining data. Catalog: A component of a data dictionary that describes and organizes the various aspects of a database such as its folders, dimensions, measures, prompts, functions, queries and other database objects. It is used to create queries, reports, analyses and cubes. Cross Tab: A type of multi-dimensional report that displays values or measures in cells created by the intersection of two or more dimensions in a table format. Dashboard: A data visualization method and workflow management tool that brings together useful information on a series of screens and/or web pages. Some of the information that may be contained on a dashboard includes reports, web links, calendar, news, tasks, e-mail, etc. When incorporated into a DSS or EIS key performance indicators may be represented as graphics that are linked to various hyperlinks, graphs, tables and other reports. The

dashboard draws its information from multiple sources applications, office products, databases, Internet, etc. Cube: A multi-dimensional matrix of data that has multiple dimensions (independent variables) and measures (dependent variables) that are created by an Online Analytical Processing System (OLAP). Each dimension may be organized into a hierarchy with multiple levels. The intersection of two or more dimensional categories is referred to as a cell. Data-based Knowledge: Factual information used in the decision making process that is derived from data marts or warehouses using business intelligence tools. Data warehousing organizes information into a format so that it represents an organizations knowledge with respect to a particular subject area, e.g. finance or clinical outcomes. Data Cleansing: The process of cleaning or removing errors, redundancies and inconsistencies in the data that is being imported into a data mart or data warehouse. It is part of the quality assurance process. Data Mart: A database that is similar in structure to a data warehouse, but is typically smaller and is focused on a more limited area. Multiple, integrated data marts are sometimes referred to as an Integrated Data Warehouse. Data marts may be used in place of a larger data warehouse or in conjunction with it. They are typically less expensive to develop and faster to deploy and are therefore becoming more popular with smaller organizations. Data Migration: The transfer of data from one platform to another. This may include conversion from one language, file structure and/or operating environment to another.

Data Mining: The process of researching data marts and data warehouses to detect specific patterns in the data sets. Data mining may be performed on databases and multi-dimensional data cubes with ad hoc query tools and OLAP software. The queries and reports are typically designed to answer specific questions to uncover trends or hidden relationships in the data. Data Scrubbing: See Data Cleansing Data Transformation: The modification of transaction data extracted from one or more data sources before it is loaded into the data mart or warehouse. The modifications may include data cleansing, translation of data into a common format so that is can be aggregated and compared, summarizing the data, etc. Data Warehouse: An integrated, non-volatile database of historical information that is designed around specific content areas and is used to answer questions regarding an organizations operations and environment. Database Management System: The software that is used to create data warehouses and data marts. For the purposes of data warehousing, they typically include relational database management systems and multi-dimensional database management systems. Both types of database management systems create the database structures, store and retrieve the data and include various administrative functions. Decision Support System (DSS): A set of queries, reports, rule-based analyses, tables and charts that are designed to aid management with their decision-making responsibilities. These functions are typically

wrapped around a data mart or data warehouse. The DSS tends to employ more detailed level data than an EIS.

Dimension: A variable, perspective or general category of information that is used to organize and analyze information in a multi-dimensional data cube. Drill Down: The ability of a data-mining tool to move down into increasing levels of detail in a data mart, data warehouse or multi-dimensional data cube. Drill Up: The ability of a data-mining tool to move back up into higher levels of data in a data mart, data warehouse or multi-dimensional data cube. Executive Information Management System (EIS): A type of decision support system designed for executive management that reports summary level information as opposed to greater detail derived in a decision support system. Extraction, Transformation and Loading (ETL) Tool: Software that is used to extract data from a data source like a operational system or data warehouse, modify the data and then load it into a data mart, data warehouse or multidimensional data cube. Granularity: The level of detail in a data store or report. Hierarchy:

The organization of data, e.g. a dimension, into a outline or logical tree structure. The strata of a hierarchy are referred to as levels. The individual elements within a level are referred to as categories. The next lower level in a hierarchy is the child; the next higher level containing the children is their parent. Legacy System: Older systems developed on platforms that tend to be one or more generations behind the current state-of-the-art applications. Data marts and warehouses were developed in large part due to the difficulty in extracting data from these system and the inconsistencies and incompatibilities among them. Level: A tier or strata in a dimensional hierarchy. Each lower level represents an increasing degree of detail. Levels in a location dimension might include country, region, state, county, city, zip code, etc. Measure: A quantifiable variable or value stored in a multi-dimensional OLAP cube. It is a value in the cell at the intersection of two or more dimensions. Member: One of the data points for a level of a dimension. Meta Data: Information in a data mart or warehouse that describes the tables, fields, data types, attributes and other objects in the data warehouse and how they map to their data sources. Meta data is contained in database catalogs and data dictionaries. Multi-Dimensional Online Processing (MOLAP): Software that creates and analyzes multi-dimensional cubes to store its information. Non-Volatile Data:

Data that is static or that does not change. In transaction processing systems the data is updated on a continual regular basis. In a data warehouse the database is added to or appended, but the existing data seldom changes. Normalization: The process of eliminating duplicate information in a database by creating a separate table that stores the redundant information. For example, it would be highly inefficient to reenter the address of an insurance company with every claim. Instead, the database uses a key field to link the claims table to the address table. Operational or transaction processing systems are typically normalized. On the other hand, some data warehouses find it advantageous to de-normalize the data allowing for some degree of redundancy. Online Analytical Processing (OLAP): The process employed by multi-dimensional analysis software to analyze the data resident in data cubes. There are different types of OLAP systems named for the type of database employed to create them and the data structures produced. Open Database Connectivity (ODBC): A database standard developed by Microsoft and the SQL Access Group Consortium that defines the rules for accessing or retrieving data from a database. Relational Database Management System: Database management systems that have the ability to link tables of data through a common or key field. Most databases today use relational technologies and support a standard programming language called Structured Query Language (SQL). Relational Online Analytical Processing (ROLAP): OLAP software that employs a relational strategy to organize and store the data in its database. Replication:

The process of copying data from one database table to another. Scalable: The attribute or capability of a database to significantly expand the number of records that it can manage. It also refers to hardware systems and their ability to be expanded or upgraded to increase their processing speed and handle larger volumes of data.

Structured Query Language (SQL): A standard programming language used by contemporary relational database management systems. Synchronization: The process by which the data in two or more separate database are synchronized so that the records contain the same information. If the fields and records are updated in one database the same fields and records are updated in the other. About the Author: Bruce W. Johnson, MS, PMP is the CEO of Johnson Consulting Services, Inc. He is an information management consultant who specializes in working with social service, healthcare and government agencies. He can be reached at (800) 988-0934 or by e-mail at jcsinc@fuse.net.

Q.3. what are the data modeling techniques used in data warehousing environment? Two data modeling techniques that are relevant in a data warehousing environment are ER modeling and dimensional modeling. ER modeling produces a data model of the specific area of interest, using two basic concepts: entities and the relationships between those entities. Detailed

ER models also contain attributes, which can be properties of either the entities or the relationships. The ER model is an abstraction tool because it can be used to understand and simplify the ambiguous data relationships in the business world and complex systems environments. Dimensional modeling uses three basic concepts: measures, facts, and dimensions. Dimensional modeling is powerful in representing the requirements of the business user in the context of database tables. Both ER and dimensional modeling can be used to create an abstract model of a specific subject. However, each has its own limited set of modeling concepts and associated notation conventions. Consequently, the techniques look different, and they are indeed different in terms of semantic representation. The following sections describe the modeling concepts and notation conventions for both ER modeling and dimensional modeling that will be used throughout this book. ER Modeling A prerequisite for reading this book is a basic knowledge of ER modeling. Therefore we do not focus on that traditional technique. We simply define the necessary terms to form some consensus and present notation conventions used in the rest of this book.

Figure 12. A Sample ER Model. Entity, relationship, and attributes in an ER diagram.

Basic Concepts An ER model is represented by an ER diagram, which uses three basic graphic symbols to conceptualize the data: entity, relationship, and attribute. 6.3.1.1 Entity An entity is defined to be a person, place, thing, or event of interest to the business or the organization. An entity represents a class of objects, which are things in the real world that can be observed and classified by their properties and characteristics. In some books on IE, the term entity type is used to represent classes of objects and entity for an instance of an entity type. In this book, we will use them interchangeably. 6.3.1.2 Relationship A relationship is represented with lines drawn between entities. It depicts the structural interaction and association among the entities in a model. A relationship is designated grammatically by a verb, such as owns, belongs, and has. The relationship between two entities can be defined in terms of the cardinality. This is the maximum number of instances of one entity that are related to a single instance in another table and vice versa. The possible cardinalities are: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:M). In a detailed (normalized) ER model, any M:M relationship is not shown because it is resolved to an associative entity. 6.3.1.3 Attributes Attributes describe the characteristics of properties of the entities. In Figure 12, Product ID, Description, and Picture are attributes of the PRODUCT entity. For clarification, attribute naming conventions are very important. An attribute name should be unique in an entity and should be self-explanatory. For example, simply saying date1 or date2 is not allowed, we must clearly define each. As examples, they could be defined as the order date and delivery date. Dimensional Modeling In some respects, dimensional modeling is simpler, more expressive, and easier to understand than ER modeling. But, dimensional modeling is a relatively new concept and not firmly defined yet in details, especially when compared to ER modeling techniques.

This section presents the terminology that we use in this book as we discuss dimensional modeling. Basic Concepts Dimensional modeling is a technique for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. It is especially useful for summarizing and rearranging the data and presenting views of the data to support data analysis. Dimensional modeling focuses on numeric data, such as values, counts, weights, balances, and occurrences. Dimensional modeling has several basic concepts: Facts Dimensions Measures (variables) 6.4.1.1 Fact A fact is a collection of related data items, consisting of measures and context data. Each fact typically represents a business item, a business transaction, or an event that can be used in analyzing the business or business processes. In a data warehouse, facts are implemented in the core tables in which all of the numeric data is stored. 6.4.1.2 Dimension A dimension is a collection of members or units of the same type of views. In a diagram, a dimension is usually represented by an axis. In a dimensional model, every data point in the fact table is associated with one and only one member from each of the multiple dimensions. That is, dimensions determine the contextual background for the facts. Many analytical processes are used to quantify the impact of dimensions on the facts. Dimensions are the parameters over which we want to perform Online Analytical Processing (OLAP). 6.4.1.3 Measure A measure is a numeric attribute of a fact, representing the performance or behavior of the business relative to the dimensions. The actual numbers are called as variables. For example, measures are the sales in money, the sales volume, the quantity supplied, the

supply cost, the transaction amount, and so forth. A measure is determined by combinations of the members of the dimensions and is located on facts. Q.4 Discuss the categories in which data is divided before structuring it into data ware house? A Data Warehouse is not an individual repository product. Rather, it is an overall strategy, or process, for building decision support systems and a knowledge-based applications architecture and environment that supports both everyday tactical decision making and long-term business strategizing. The Data Warehouse environment positions a business to utilize an enterprise-wide data store to link information from diverse sources and make the information accessible for a variety of user purposes, most notably, strategic analysis. Business analysts must be able to use the Warehouse for such strategic purposes as trend identification, forecasting, competitive analysis, and targeted market research. Data Warehouses and Data Warehouse applications are designed primarily to support executives, senior managers, and business analysts in making complex business decisions. Data Warehouse applications provide the business community with access to accurate, consolidated information from various internal and external sources.

The primary objective of Data Warehousing is to bring together information from disparate sources and put the information into a format that is conducive to making business decisions. This objective necessitates a set of activities that are far more complex than just collecting data and reporting against it. Data Warehousing requires both business and technical expertise and involves the following activities: Accurately identifying the business information that must be contained in the Warehouse Identifying and prioritizing subject areas to be included in the Data Warehouse Managing the scope of each subject area which will be implemented into the Warehouse on an iterative basis

Developing a scaleable architecture to serve as the Warehouses technical and application foundation, and identifying and selecting the hardware/software/middleware components to implement it

Extracting, cleansing, aggregating, transforming and validating the data to ensure accuracy and consistency

Defining the correct level of summarization to support business decision making Establishing a refresh program that is consistent with business needs, timing and cycles

Providing user-friendly, powerful tools at the desktop to access the data in the Warehouse

Educating the business community about the realm of possibilities that are available to them through Data Warehousing

Establishing a Data Warehouse Help Desk and training users to effectively utilize the desktop tools

Establishing processes for maintaining, enhancing, and ensuring the ongoing success and applicability of the Warehouse

Until the advent of Data Warehouses, enterprise databases were expected to serve multiple purposes, including online transaction processing, batch processing, reporting, and analytical processing. In most cases, the primary focus of computing resources was on satisfying operational needs and requirements. Information reporting and analysis needs were secondary considerations. As the use of PCs, relational databases, 4GL technology and end-user computing grew and changed the complexion of information processing, more and more business users demanded that their needs for information be addressed. Data Warehousing has evolved to meet those needs without disrupting operational processing.

In the Data Warehouse model, operational databases are not accessed directly to perform information processing. Rather, they act as the source of data for the Data Warehouse, which is the information repository and point of access for information processing. There are sound reasons for separating operational and informational databases, as described below. The users of informational and operational data are different. Users of informational data are generally managers and analysts; users of operational data tend to be clerical, operational and administrative staff. Operational data differs from informational data in context and currency. Informational data contains an historical perspective that is not generally used by operational systems. The technology used for operational processing frequently differs from the technology required to support informational needs. The processing characteristics for the operational environment and the informational environment are fundamentally different. The Data Warehouse functions as a Decision Support System (DSS) and an Executive Information System (EIS), meaning that it supports informational and analytical needs by providing integrated and transformed enterprise-wide historical data from which to do management analysis. A variety of sophisticated tools are readily available in the marketplace to provide user-friendly access to the information stored in the Data Warehouse. Data Warehouses can be defined as subject-oriented, integrated, time-variant, non-volatile collections of data used to support analytical decision making. The data in the Warehouse comes from the operational environment and external sources. Data Warehouses are physically separated from operational systems, even though the operational systems feed the Warehouse with source data.

Subject Orientation Data Warehouses are designed around the major subject areas of the enterprise; the operational environment is designed around applications and functions. This difference in orientation (data vs. process) is evident in the content of the database. Data Warehouses do not contain information that will not be used for informational or analytical processing; operational databases contain detailed data that is needed to satisfy processing requirements but which has no relevance to management or analysis.

Integration and Transformation The data within the Data Warehouse is integrated. This means that there is consistency among naming conventions, measurements of variables, encoding structures, physical attributes, and other salient data characteristics. An example of this integration is the treatment of codes such as gender codes. Within a single corporation, various applications may represent gender codes in different ways: male vs. female, m vs. f, and 1 vs. 0, etc. In the Data Warehouse, gender is always represented in a consistent way, regardless of the many ways by which it may be encoded and stored in the source data. As the data is moved to the Warehouse, it is transformed into a consistent representation as required.

Time Variance All data in Data Warehouse is accurate as of some moment in time, providing an historical perspective. This differs from the operational environment in which data is intended to be accurate as of the moment of access. The data in the Data Warehouse is, in effect, a series of snapshots. Once the data is loaded into the enterprise data store and data marts, it cannot be updated. It is refreshed on a periodic basis, as determined by the business need. The operational data store, if included in the Warehouse architecture, may be updated.

Non-Volatility

Data in the Warehouse is static, not dynamic. The only operations that occur in Data Warehouse applications are the initial loading of data, access of data, and refresh of data. For these reasons, the physical design of a Data Warehouse optimizes the access of data, rather than focusing on the requirements of data update and delete processing.

Data Warehouse Configurations A Data Warehouse configuration, also known as the logical architecture, includes the following components: One Enterprise Data Store (EDS) - a central repository which supplies atomic (detail level) integrated information to the whole organization. (optional) one Operational Data Store - a "snapshot" of a moment in time's enterprise-wide data (optional) one or more individual Data Mart(s) - summarized subset of the enterprise's data specific to a functional area or department, geographical region, or time period One or more Metadata Store(s) or Repository(ies) - catalog(s) of reference information about the primary data. Metadata is divided into two categories: information for technical use, and information for business end-users. The EDS is the cornerstone of the Data Warehouse. It can be accessed for both immediate informational needs and for analytical processing in support of strategic decision making, and can be used for drill-down support for the Data Marts which contain only summarized data. It is fed by the existing subject area operational systems and may also contain data from external sources. The EDS in turn feeds individual Data Marts that are accessed by end-user query tools at the user's desktop. It is used to consolidate related data from multiple sources into a single source, while the Data Marts are used to physically distribute the consolidated data into logical categories of data, such as business functional departments or geographical regions. The EDS is a collection of daily "snapshots" of enterprise-wide data taken over an extended time period, and thus retains and makes

available for tracking purposes the history of changes to a given data element over time. This creates an optimum environment for strategic analysis. However, access to the EDS can be slow, due to the volume of data it contains, which is a good reason for using Data Marts to filter, condense and summarize information for specific business areas. In the absence of the Data Mart layer, users can access the EDS directly. Metadata is "data about data," a catalog of information about the primary data that defines access to the Warehouse. It is the key to providing users and developers with a road map to the information in the Warehouse. Metadata comes in two different forms: end-user and transformational. End-user metadata serves a business purpose; it translates a cryptic name code that represents a data element into a meaningful description of the data element so that end-users can recognize and use the data. For example, metadata would clarify that the data element "ACCT_CD" represents "Account Code for Small Business." Transformational metadata serves a technical purpose for development and maintenance of the Warehouse. It maps the data element from its source system to the Data Warehouse, identifying it by source field name, destination field code, transformation routine, business rules for usage and derivation, format, key, size, index and other relevant transformational and structural information. Each type of metadata is kept in one or more repositories that service the Enterprise Data Store. While an Enterprise Data Store and Metadata Store(s) are always included in a sound Data Warehouse design, the specific number of Data Marts (if any) and the need for an Operational Data Store are judgment calls. Potential Data Warehouse configurations should be evaluated and a logical architecture determined according to business requirements. The Data Warehouse Process The james martin + co Data Warehouse Process does not encompass the analysis and identification of organizational value streams, strategic initiatives, and related business goals, but it is a prescription for achieving such goals through a specific architecture. The Process is conducted in an iterative fashion after the initial business requirements and architectural foundations have been developed with the emphasis on populating the Data Warehouse with "chunks" of functional subject-area information each iteration. The

Process guides the development team through identifying the business requirements, developing the business plan and Warehouse solution to business requirements, and implementing the configuration, technical, and application architecture for the overall Data Warehouse. It then specifies the iterative activities for the cyclical planning, design, construction, and deployment of each population project. The following is a description of each stage in the Data Warehouse Process. (Note: The Data Warehouse Process also includes conventional project management, startup, and wrap-up activities which are detailed in the Plan, Activate, Control and End stages, not described here.) Business Case Development A variety of kinds of strategic analysis, including Value Stream Assessment, have likely already been done by the customer organization at the point when it is necessary to develop a Business Case. The Business Case Development stage launches the Data Warehouse development in response to previously identified strategic business initiatives and "predator" (key) value streams of the organization. The organization will likely have identified more than one important value stream. In the long term it is possible to implement Data Warehouse solutions that address multiple value streams, but it is the predator value stream or highest priority strategic initiative that usually becomes the focus of the short-term strategy and first run population projects resulting in a Data Warehouse. At the conclusion of the relevant business reengineering, strategic visioning, and/or value stream assessment activities conducted by the organization, a Business Case can be built to justify the use of the Data Warehouse architecture and implementation approach to solve key business issues directed at the most important goals. The Business Case defines the outlying activities, costs, benefits, and critical success factors for a multi-generation implementation plan that results in a Data Warehouse framework of an information storage/access system. The Warehouse is an iterative designed/developed/refined solution to the tactical and strategic business requirements. The Business Case addresses both the short-term and long-term Warehouse strategies (how multiple data stores will work together to fulfill primary and secondary business goals) and identifies both immediate and extended costs so that the organization is better able to plan its short and long-term budget appropriation.

Business Question Assessment Once a Business Case has been developed, the short-term strategy for implementing the Data Warehouse is mapped out by means of the Business Question Assessment (BQA) stage. The purpose of BQA is to: Establish the scope of the Warehouse and its intended use Define and prioritize the business requirements and the subsequent information (data) needs the Warehouse will address Identify the business directions and objectives that may influence the required data and application architectures Determine which business subject areas provide the most needed information; prioritize and sequence implementation projects accordingly Drive out the logical data model that will direct the physical implementation model Measure the quality, availability, and related costs of needed source data at a high level Define the iterative population projects based on business needs and data validation

The prioritized predator value stream or most important strategic initiative is analyzed to determine the specific business questions that need to be answered through a Warehouse implementation. Each business question is assessed to determine its overall importance to the organization, and a high-level analysis of the data needed to provide the answers is undertaken. The data is assessed for quality, availability, and cost associated with bringing it into the Data Warehouse. The business questions are then revisited and prioritized based upon their relative importance and the cost and feasibility of acquiring the associated data.

The prioritized list of business questions is used to determine the scope of the first and subsequent iterations of the Data Warehouse, in the form of population projects. Iteration scoping is dependent on source data acquisition issues and is guided by determining how many business questions can be answered in a three to six month implementation time frame. A "business question" is a question deemed by the business to provide useful information in determining strategic direction. A business question can be answered through objective analysis of the data that is available.

Architecture Review and Design The Architecture is the logical and physical foundation on which the Data Warehouse will be built. The Architecture Review and Design stage, as the name implies, is both a requirements analysis and a gap analysis activity. It is important to assess what pieces of the architecture already exist in the organization (and in what form) and to assess what pieces are missing which are needed to build the complete Data Warehouse architecture. During the Architecture Review and Design stage, the logical Data Warehouse architecture is developed. The logical architecture is a configuration map of the necessary data stores that make up the Warehouse; it includes a central Enterprise Data Store, an optional Operational Data Store, one or more (optional) individual business area Data Marts, and one or more Metadata stores. In the metadata store(s) are two different kinds of metadata that catalog reference information about the primary data. Once the logical configuration is defined, the Data, Application, Technical and Support Architectures are designed to physically implement it. Requirements of these four architectures are carefully analyzed so that the Data Warehouse can be optimized to serve the users. Gap analysis is conducted to determine which components of each architecture already exist in the organization and can be reused, and which components must be developed (or purchased) and configured for the Data Warehouse. The Data Architecture organizes the sources and stores of business information and defines the quality and management standards for data and metadata.

The Application Architecture is the software framework that guides the overall implementation of business functionality within the Warehouse environment; it controls the movement of data from source to user, including the functions of data extraction, data cleansing, data transformation, data loading, data refresh, and data access (reporting, querying). The Technical Architecture provides the underlying computing infrastructure that enables the data and application architectures. It includes platform/server, network, communications and connectivity hardware/software/middleware, DBMS, client/server 2tier vs.3-tier approach, and end-user workstation hardware/software. Technical architecture design must address the requirements of scalability, capacity and volume handling (including sizing and partitioning of tables), performance, availability, stability, chargeback, and security. The Support Architecture includes the software components (e.g., tools and structures for backup/recovery, disaster recovery, performance monitoring, reliability/stability compliance reporting, data archiving, and version control/configuration management) and organizational functions necessary to effectively manage the technology investment. Architecture Review and Design applies to the long-term strategy for development and refinement of the overall Data Warehouse, and is not conducted merely for a single iteration. This stage develops the blueprint of an encompassing data and technical structure, software application configuration, and organizational support structure for the Warehouse. It forms a foundation that drives the iterative Detail Design activities. Where Design tells you what to do; Architecture Review and Design tells you what pieces you need in order to do it. The Architecture Review and Design stage can be conducted as a separate project that runs mostly in parallel with the Business Question Assessment stage. For the technical, data, application and support infrastructure that enables and supports the storage and access of information is generally independent from the business requirements of which data is needed to drive the Warehouse. However, the data architecture is dependent on receiving

input from certain BQA activities (data source system identification and data modeling), so the BQA stage must conclude before the Architecture stage can conclude. The Architecture will be developed based on the organization's long-term Data Warehouse strategy, so that future iterations of the Warehouse will have been provided for and will fit within the overall architecture. Tool Selection The purpose of this stage is to identify the candidate tools for developing and implementing the Data Warehouse data and application architectures, and for performing technical and support architecture functions where appropriate. Select the candidate tools that best meet the business and technical requirements as defined by the Data Warehouse architecture, and recommend the selections to the customer organization. Procure the tools upon approval from the organization. It is important to note that the process of selecting tools is often dependent on the existing technical infrastructure of the organization. Many organizations feel strongly for various reasons about using tools for the Data Warehouse applications that they already have in their "arsenal" and are reluctant to purchase new application packages. It is recommended that a thorough evaluation of existing tools and the feasibility of their reuse be done in the context of all tool evaluation activities. In some cases, existing tools can be form-fitted to the Data Warehouse; in other cases, the customer organization may need to be convinced that new tools would better serve their needs. It may even be feasible that this series of activities is skipped altogether, if the organization is insistent that particular tools be used (no room for negotiation), or if tools have already been assessed and selected in anticipation of the Data Warehouse project. Tools may be categorized according to the following data, technical, application, or support functions: Source Data Extraction and Transformation Data Cleansing

Data Load Data Refresh Data Access Security Enforcement Version Control/Configuration Management Backup and Recovery Disaster Recovery Performance Monitoring Database Management Platform Data Modeling Metadata Management

Iteration Project Planning The Data Warehouse is implemented (populated) one subject area at a time, driven by specific business questions to be answered by each implementation cycle. The first and subsequent implementation cycles of the Data Warehouse are determined during the BQA stage. At this point in the Process the first (or next if not first) subject area implementation project is planned. The business requirements discovered in BQA and, to a lesser extent, the technical requirements of the Architecture Design stage are now refined through user interviews and focus sessions to the subject area level. The results are further analyzed to yield the detail needed to design and implement a single population project, whether initial or follow-on. The Data Warehouse project team is expanded to include the members

needed to construct and deploy the Warehouse, and a detailed work plan for the design and implementation of the iteration project is developed and presented to the customer organization for approval. Detail Design In the Detail Design stage, the physical Data Warehouse model (database schema) is developed, the metadata is defined, and the source data inventory is updated and expanded to include all of the necessary information needed for the subject area implementation project, and is validated with users. Finally, the detailed design of all procedures for the implementation project is completed and documented. Procedures to achieve the following activities are designed: Warehouse Capacity Growth Data Extraction/Transformation/Cleansing Data Load Security Data Refresh Data Access Backup and Recovery Disaster Recovery Data Archiving Configuration Management Testing Transition to Production User Training

Help Desk Change Management

Implementation Once the Planning and Design stages are complete, the project to implement the current Data Warehouse iteration can proceed quickly. Necessary hardware, software and middleware components are purchased and installed, the development and test environment is established, and the configuration management processes are implemented. Programs are developed to extract, cleanse, transform and load the source data and to periodically refresh the existing data in the Warehouse, and the programs are individually unit tested against a test database with sample source data. Metrics are captured for the load process. The metadata repository is loaded with transformational and business user metadata. Canned production reports are developed and sample ad-hoc queries are run against the test database, and the validity of the output is measured. User access to the data in the Warehouse is established. Once the programs have been developed and unit tested and the components are in place, system functionality and user acceptance testing is conducted for the complete integrated Data Warehouse system. System support processes of database security, system backup and recovery, system disaster recovery, and data archiving are implemented and tested as the system is prepared for deployment. The final step is to conduct the Production Readiness Review prior to transitioning the Data Warehouse system into production. During this review, the system is evaluated for acceptance by the customer organization. Transition to Production The Transition to Production stage moves the Data Warehouse development project into the production environment. The production database is created, and the extraction/cleanse/transformation routines are run on the operations system source data. The development team works with the Operations staff to perform the initial load of this data to the Warehouse and execute the first refresh cycle. The Operations staff is trained, and the Data Warehouse programs and processes are moved into the production libraries and catalogs. Rollout presentations and tool demonstrations are given to the entire

customer community, and end-user training is scheduled and conducted. The Help Desk is established and put into operation. A Service Level Agreement is developed and approved by the customer organization. Finally, the new system is positioned for ongoing maintenance through the establishment of a Change Management Board and the implementation of change control procedures for future development cycles.

Q.5 Discuss the purpose of executive information system in an organization? Implementing an Executive Information System (EIS) An EIS is a tool that provides direct on-line access to relevant information about aspects of a business that are of particular interest to the senior manager. Introduction Many senior managers find that direct on-line access to organizational data is helpful. For example, Paul Frech, president of Lockheed-Georgia, monitored employee contributions to company-sponsored programs (United Way, blood drives) as a surrogate measure of employee morale (Houdeshel and Watson, 1987). C. Robert Kidder, CEO of Duracell, found that productivity problems were due to salespeople in Germany wasting time calling on small stores and took corrective action (Main, 1989). Information systems have long been used to gather and store information, to produce specific reports for workers, and to produce aggregate reports for managers. However, senior managers rarely use these systems directly, and often find the aggregate information to be of little use without the ability to explore underlying details (Watson & Rainer, 1991, Crockett, 1992). An Executive Information System (EIS) is a tool that provides direct on-line access to relevant information in a useful and navigable format. Relevant information is timely, accurate, and actionable information about aspects of a business that are of particular interest to the senior manager. The useful and navigable format of the system means that it is specifically designed to be used by individuals with limited time, limited keyboarding skills, and little direct experience with computers. An EIS is easy to navigate so that managers can identify broad strategic issues, and then explore the information to find the root causes of those issues.

Executive Information Systems differ from traditional information systems in the following ways: They are specifically tailored to executive's information needs. They are able to access data about specific issues and problems as well as aggregate reports They provide extensive on-line analysis tools including trend analysis, exception reporting & "drill-down" capability They access a broad range of internal and external data They are particularly easy to use (typically mouse or touchscreen driven) They are used directly by executives without assistance They present information in a graphical form

Purpose of EIS The primary purpose of an Executive Information System is to support managerial learning about an organization, its work processes, and its interaction with the external environment. Informed managers can ask better questions and make better decisions. Vandenbosch and Huff (1992) from the University of Western Ontario found that Canadian firms using an EIS achieved better business results if their EIS promoted managerial learning. Firms with an EIS designed to maintain managers' "mental models" were less effective than firms with an EIS designed to build or enhance managers' knowledge. This distinction is supported by Peter Senge in The Fifth Dimension. He illustrates the benefits of learning about the behaviour of systems versus simply learning more about their states. Learning more about the state of a system leads to reactive management fixes. Typically these reactions feed into the underlying system behaviour and contribute to a downward spiral. Learning more about system behaviour and how various system inputs and actions interrelate will allow managers to make more proactive changes to create longterm improvement. A secondary purpose for an EIS is to allow timely access to information. All of the information contained in an EIS can typically be obtained by a manager through traditional methods. However, the resources and time required to manually compile information in a

wide variety of formats, and in response to ever changing and ever more specific questions usually inhibit managers from obtaining this information. Often, by the time a useful report can be compiled, the strategic issues facing the manager have changed, and the report is never fully utilized. Timely access also influences learning. When a manager obtains the answer to a question, that answer typically sparks other related questions in the manager's mind. If those questions can be posed immediately, and the next answer retrieved, the learning cycle continues unbroken. Using traditional methods, by the time the answer is produced, the context of the question may be lost, and the learning cycle will not continue. An executive in Rockart & Treacy's 1982 study noted that: Your staff really can't help you think. The problem with giving a question to the staff is that they provide you with the answer. You learn the nature of the real question you should have asked when you muck around in the data (p. 9). A third purpose of an EIS is commonly misperceived. An EIS has a powerful ability to direct management attention to specific areas of the organization or specific business problems. Some managers see this as an opportunity to discipline subordinates. Some subordinates fear the directive nature of the system and spend a great deal of time trying to outwit or discredit it. Neither of these behaviours is appropriate or productive. Rather, managers and subordinates can work together to determine the root causes of issues highlighted by the EIS. The powerful focus of an EIS is due to the maxim "what gets measured gets done." Managers are particularly attentive to concrete information about their performance when it is available to their superiors. This focus is very valuable to an organization if the information reported is actually important and represents a balanced view of the organization's objectives. Misaligned reporting systems can result in inordinate management attention to things that are not important or to things which are important but to the exclusion of other equally important things. For example, a production reporting system might lead managers to emphasize volume of work done rather than quality of work. Worse yet, productivity might have little to do with the organization's overriding customer service objectives.

Contents of EIS A general answer to the question of what data is appropriate for inclusion in an Executive Information System is "whatever is interesting to executives." While this advice is rather simplistic, it does reflect the variety of systems currently in use. Executive Information Systems in government have been constructed to track data about Ministerial correspondence, case management, worker productivity, finances, and human resources to name only a few. Other sectors use EIS implementations to monitor information about competitors in the news media and databases of public information in addition to the traditional revenue, cost, volume, sales, market share and quality applications. Frequently, EIS implementations begin with just a few measures that are clearly of interest to senior managers, and then expand in response to questions asked by those managers as they use the system. Over time, the presentation of this information becomes stale, and the information diverges from what is strategically important for the organization. A "Critical Success Factors" approach is recommended by many management theorists (Daniel, 1961, Crockett, 1992, Watson and Frolick, 1992). Practitioners such as Vandenbosch (1993) found that: While our efforts usually met with initial success, we often found that after six months to a year, executives were almost as bored with the new information as they had been with the old. A strategy we developed to rectify this problem required organizations to create a report of the month. That is, in addition to the regular information provided for management committee meetings, the CEO was charged with selecting a different indicator to focus on each month (Vandenbosch, 1993, pp. 8-9). While the above indicates that selection of data for inclusion in an EIS is difficult, there are several guidelines that help to make that assessment. A practical set of principles to guide the design of measures and indicators to be included in an EIS is presented below (Kelly, 1992b). For a more detailed discussion of methods for selecting measures that reflect organizational objectives, see the section "EIS and Organizational Objectives." EIS measures must be easy to understand and collect. Wherever possible, data should be collected naturally as part of the process of work. An EIS should not add substantially to the workload of managers or staff.

EIS measures must be based on a balanced view of the organization's objective. Data in the system should reflect the objectives of the organization in the areas of productivity, resource management, quality and customer service. Performance indicators in an EIS must reflect everyone's contribution in a fair and consistent manner. Indicators should be as independent as possible from variables outside the control of managers. EIS measures must encourage management and staff to share ownership of the organization's objectives. Performance indicators must promote both team-work and friendly competition. Measures will be meaningful for all staff; people must feel that they, as individuals, can contribute to improving the performance of the organization. EIS information must be available to everyone in the organization. The objective is to provide everyone with useful information about the organization's performance. Information that must remain confidential should not be part of the EIS or the management system of the organization. EIS measures must evolve to meet the changing needs of the organization. Barriers to Effectiveness There are many ways in which an EIS can fail. Dozens of high profile, high cost EIS projects have been cancelled, implemented and rarely used, or implemented and used with negative results. An EIS is a high risk project precisely because it is intended for use by the most powerful people in an organization. Senior managers can easily misuse the information in the system with strongly detrimental effects on the organization. Senior managers can refuse to use a system if it does not respond to their immediate personal needs or is too difficult to learn and use. Unproductive Organizational Behaviour Norms Issues of organizational behaviour and culture are perhaps the most deadly barriers to effective Executive Information Systems. Because an EIS is typically positioned at the top of an organization, it can create powerful learning experiences and lead to drastic changes in organizational direction. However, there is also great potential for misuse of the information. Green, Higgins and Irving (1988) found that performance monitoring can promote bureaucratic and unproductive behaviour, can unduly focus organizational

attention to the point where other important aspects are ignored, and can have a strongly negative impact on morale. The key barrier to EIS effectiveness, therefore, is the way in which the organization uses the information in the system. Managers must be aware of the dangers of statistical data, and be skilled at interpreting and using data in an effective way. Even more important is the manager's ability to communicate with others about statistical data in a nondefensive, trustworthy, and constructive manner. Argyris (1991) suggests a universal human tendency towards strategies that avoid embarrassment or threat, and towards feelings of vulnerability or incompetence. These strategies include: The stating criticism of others in a way that you feel is valid but also in a way that prevents others from deciding for themselves Failing to include any data that others could use to objectively evaluate your criticism Stating your conclusions in ways that disguise their logical implications and denying those implications if they are suggested To make effective use of an EIS, mangers must have the self-confidence to accept negative results and focus on the resolution of problems rather than on denial and blame. Since organizations with limited exposure to planning and targeting, data-based decisionmaking, statistical process control, and team-based work models may not have dealt with these behavioural issues in the past, they are more likely to react defensively and reject an EIS. Technical Excellence An interesting result from the Vandenbosch & Huff (1988) study was that the technical excellence of an EIS has an inverse relationship with effectiveness. Systems that are technical masterpieces tend to be inflexible, and thus discourage innovation, experimentation and mental model development. Flexibility is important because an EIS has such a powerful ability to direct attention to specific issues in an organization. A technical masterpiece may accurately direct management attention when the system is first implemented, but continue to direct attention to issues that were important a year ago on its first anniversary. There is

substantial danger that the exploration of issues necessary for managerial learning will be limited to those subjects that were important when the EIS was first developed. Managers must understand that as the organization and its work changes, an EIS must continually be updated to address the strategic issues of the day. A number of explanations as to why technical masterpieces tend to be less flexible are possible. Developers who create a masterpiece EIS may become attached to the system and consciously or unconsciously dissuade managers from asking for changes. Managers who are uncertain that the benefits outweigh the initial cost of a masterpiece EIS may not want to spend more on system maintenance and improvements. The time required to create a masterpiece EIS may mean that it is outdated before it is implemented. While usability and response time are important factors in determining whether executives will use a system, cost and flexibility are paramount. A senior manager will be more accepting of an inexpensive system that provides 20% of the needed information within a month or two than with an expensive system that provides 80% of the needed information after a year of development. The manager may also find that the inexpensive system is easier to change and adapt to the evolving needs of the business. Changing a large system would involve throwing away parts of a substantial investment. Changing the inexpensive system means losing a few weeks of work. As a result, fast, cheap, incremental approaches to developing an EIS increase the chance of success. Technical Problems Paradoxically, technical problems are also frequently reported as a significant barrier to EIS success. The most difficult technical problem -- that of integrating data from a wide range of data sources both inside and outside the organization -- is also one of the most critical issues for EIS users. A marketing vice-president, who had spent several hundred thousand dollars on an EIS, attended a final briefing on the system. The technical experts demonstrated the many graphs and charts of sales results, market share and profitability. However, when the vice-president asked for a graph of market share and advertising expense over the past ten years, the system was unable to access historical data. The project was cancelled in that meeting.

The ability to integrate data from many different systems is important because it allows managerial learning that is unavailable in other ways. The president of a manufacturing company can easily get information about sales and manufacturing from the relevant VPs. Unfortunately, the information the president receives will likely be incompatible, and learning about the ways in which sales and manufacturing processes influence each other will not be easy. An EIS will be particularly effective if it can overcome this challenge, allowing executives to learn about business processes that cross organizational boundaries and to compare business results in disparate functions. Another technical problem that can kill EIS projects is usability. Senior managers simply have the choice to stop using a system if they find it too difficult to learn or use. They have very little time to invest in learning the system, a low tolerance for errors, and initially may have very little incentive to use it. Even if the information in the system is useful, a difficult interface will quickly result in the manager assigning an analyst to manipulate the system and print out the required reports. This is counter-productive because managerial learning is enhanced by the immediacy of the question - answer learning cycle provided by an EIS. If an analyst is interacting with the system, the analyst will acquire more learning than the manager, but will not be in a position to put that learning to its most effective use. Usability of Executive Information Systems can be enhanced through the use of prototyping and usability evaluation methods. These methods ensure that clear communication occurs between the developers of the system and its users. Managers have an opportunity to interact with systems that closely resemble the functionality of the final system and thus can offer more constructive criticism than they might be able to after reading an abstract specification document. Systems developers also are in a position to listen more openly to criticisms of a system since a prototype is expected to be disposable. Several evaluation protocols are available including observation and monitoring, software logging, experiments and benchmarking, etc. (Preece et al, 1994). The most appropriate methods for EIS design are those with an ethnographic flavour because the experience base of system developers is typically so different from that of their user population (senior executives).

Misalignment Between Objectives & EIS A final barrier to EIS effectiveness was mentioned earlier in the section on purpose. As noted there, the powerful ability of an EIS to direct organizational attention can be destructive if the system directs attention to the wrong variables. There are many examples of this sort of destructive reporting. Grant, Higgins and Irving (1988) report the account of an employee working under a misaligned reporting system. I like the challenge of solving customer problems, but they get in the way of hitting my quota. I'd like to get rid of the telephone work. If (the company) thought dealing with customers was important, I'd keep it; but if it's just going to be production that matters, I'd gladly give all the calls to somebody else. Traditional cost accounting systems are also often misaligned with organizational objectives, and placing these measures in an EIS will continue to draw attention to the wrong things. Cost accounting allocates overhead costs to direct labour hours. In some cases the overhead burden on each direct labour hour is as much as 1000%. A manager operating under this system might decide to sub-contract 100 hours of direct labor at $20 per hour. On the books, this $2,000 saving is accompanied by $20,000 of savings in overhead. If the sub-contractor charges $5,000 for the work, the book savings are $2,000 + $20,000 - $5,000 = $17,000. In reality, however, the overhead costs for an idle machine in a factory do not go down much at all. The sub-contract actually ends up costing $5,000 $2,000 = $3,000. (Peters, 1987) Characteristics of Successful EIS Implementations Find an Appropriate Executive Champion EIS projects that succeed do so because at least one member of the senior management team agrees to champion the project. The executive champion need not fully understand the technical issues, but must be a person who works closely with all of the senior management team and understands their needs, work styles and their current methods of obtaining organizational information. The champion's commitment must include a willingness to set aside time for reviewing prototypes and implementation plans,

influencing and coaching other members of the senior management team, and suggesting modifications and enhancements to the system. Deliver a Simple Prototype Quickly Executives judge a new EIS on the basis of how easy it is to use and how relevant the information in the system is to the current strategic issues in the organization. As a result, the best EIS projects begin as a simple prototype, delivered quickly, that provides data about at least one critical issue. If the information delivered is worth the hassle of learning the system, a flurry of requirements will shortly be generated by executives who like what they see, but want more. These requests are the best way to plan an EIS that truly supports the organization, and are more valuable than months of planning by a consultant or analyst. One caveat concerning the simple prototype approach is that executive requests will quickly scatter to questions of curiosity rather than strategy in an organization where strategic direction and objectives are not clearly defined. A number of methods are available to support executives in defining business objectives and linking them to performance monitors in an EIS. These are discussed further in the section on EIS and Organizational Objectives below. Involve Your Information Systems Department In some organizations, the motivation for an EIS project arises in the business units quite apart from the traditional information systems (IS) organization. Consultants may be called in, or managers and analysts in the business units may take the project on without consulting or involving IS. This is a serious mistake. Executive Information Systems rely entirely on the information contained in the systems created and maintained by this department. IS professionals know best what information is available in an organization's systems and how to get it. They must be involved in the team. Involvement in such a project can also be beneficial to IS by giving them a more strategic perspective on how their work influences the organization. Communicate & Train to Overcome Resistance

A final characteristic of successful EIS implementations is that of communication. Executive Information Systems have the potential to drastically alter the prevailing patterns of organizational communication and thus will typically be met with resistance. Some of this resistance is simply a matter of a lack of knowledge. Training on how to use statistics and performance measures can help. However, resistance can also be rooted in the feelings of fear, insecurity and cynicism experienced by individuals throughout the organization. These attitudes can only be influenced by a strong and vocal executive champion who consistently reinforces the purpose of the system and directs the attention of the executive group away from unproductive and punitive behaviours. EIS and Organizational Culture Henry Mintzberg (1972) has argued that impersonal statistical data is irrelevant to managers. John Dearden (1966) argued that the promise of real-time management information systems was a myth and would never be of use to top managers. Grant, Higgins, and Irving (1988) argue that computerized performance monitors undermine trust, reduce autonomy and fail to illuminate the most important issues. Many of these arguments against EISs have objective merit. Manager's really do value the tangible tidbits of detail they encounter in their daily interactions more highly than abstract numerical reports. Rumours suggest a future, while numbers describe a past. Conversations are rich in detail and continuously probe the reasons for the situation, while statistics are vague approximations of reality. When these vague approximations are used to intimidate or control behaviour rather than to guide learning, they really do have a negative impact on the organization. Yet both of these objections point to a deeper set of problems -- the assumptions, beliefs, values and behaviours that people in the organization hold and use to respond to their environment. Perhaps senior managers find statistical data to be irrelevant because they have found too many errors in previous reports? Perhaps people in the organization prefer to assign blame rather than discover the true root cause of problems. The culture of an organization can have a dramatic influence on the adoption and use of an Executive Information System. The following cultural characteristics will contribute directly to the success or failure of an EIS project.

Learning vs Blaming A learning organization is one that seeks first to understand why a problem occurred, and not who is to blame. It is a common and natural response for managers to try to deflect responsibility for a problem on to someone else. An EIS can help to do this by indicating very specifically who failed to meet a statistical target, and by how much. A senior manager, armed with EIS data, can intimidate and blame the appropriate person. The blamed person can respond by questioning the integrity of the system, blaming someone else, or even reacting in frustration by slowing work down further. In a learning organization, any unusual result is seen as an opportunity to learn more about the business and its processes. Managers who find an unusual statistic explore it further, breaking it down to understand its components and comparing it with other numbers to establish cause and effect relationships. Together as a team, management uses numerical results to focus learning and improve business processes across the organization. An EIS facilitates this approach by allowing instant exploration of a number, its components and its relationship to other numbers. Continuous Improvement vs Crisis Management Some organizations find themselves constantly reacting to crises, with little time for any proactive measures. Others have managed to respond to each individual crisis with an approach that prevents other similar problems in the future. They are engaged in a continual cycle of improving business practices and finding ways to avoid crisis. Crises in government are frequently caused by questions about organizational performance raised by an auditor, the Minister, or members of the Opposition. An EIS can be helpful in responding to this sort of crisis by providing instant data about the actual facts of the situation. However, this use of the EIS does little to prevent future crises. An organizational culture in which continual improvement is the norm can use the EIS as an early warning system pointing to issues that have not yet reached the crisis point, but are perhaps the most important areas on which to focus management attention and learning. Organizations with a culture of continuous improvement already have an appetite for the sort of data an EIS can provide, and thus will exhibit less resistance.

Team Work vs Hierarchy An EIS has the potential to substantially disrupt an organization that relies upon adherence to a strict chain of command. The EIS provides senior managers with the ability to micro-manage details at the lowest levels in the organization. A senior manger with an EIS report who is surprised at the individual results of a front-line worker might call that person directly to understand why the result is unusual. This could be very threatening for the managers between the senior manager and the front-line worker. An EIS can also provide lower level managers with access to information about peer performance and even the performance of their superiors. Organizations that are familiar with work teams, matrix managed projects and other forms of interaction outside the chain of command will find an EIS less disruptive. Senior managers in these organizations have learned when micro-management is appropriate and when it is not. Middle managers have learned that most interactions between their superiors and their staff are not threatening to their position. Workers are more comfortable interacting with senior managers when the need arises, and know what their supervisor expects from them in such an interaction. Data-based Decisions vs Decisions in a Vacuum The total quality movement, popular in many organizations today, emphasizes a set of tools referred to as Statistical Process Control (SPC). These analytical tools provide managers and workers with methods of understanding a problem and finding solutions rather than allocating blame and passing the buck. Organizations with training and exposure to SPC and analytical tools will be more open to an EIS than those who are suspicious of numerical measures and the motives of those who use them. It should be noted that data-based decision making does not deny the role of intuition, experience, or negotiation amongst a group. Rather, it encourages decisionmakers to probe the facts of a situation further before coming to a decision. Even if the final decision contradicts the data, chances are that an exploration of the data will help the decision-maker to understand the situation better before a decision is reached. An EIS can help with this decision-making process.

Information Sharing vs Information Hoarding Information is power in many organizations, and managers are motivated to hoard information rather than to share it widely. For example, managers may hide information about their own organizational performance, but jump at any chance to see information about performance of their peers. A properly designed EIS promotes information sharing throughout the organization. Peers have access to information about each other's domain; junior managers have information about how their performance contributes to overall organizational performance. An organization that is comfortable with information sharing will have developed a set of "good manners" for dealing with this broad access to information. These behavioural norms are key to the success of an EIS. Specific Objectives vs Vague Directions An organization that has experience developing and working toward Specific, Measurable, Achievable and Consistent (SMAC) objectives will also find an EIS to be less threatening. Many organizations are uncomfortable with specific performance measures and targets because they believe their work to be too specialized or unpredictable. Managers in these organizations tend to adopt vague generalizations and statements of the exceedingly obvious in place of SMAC objectives that actually focus and direct organizational performance. In a few cases, it may actually be true that numerical measures are completely inappropriate for a few aspects of the business. In most cases, managers with this attitude have a poor understanding of the purpose of objective and target-setting exercises. Some business processes are more difficult to measure and set targets for than others. Yet almost all business processes have at least a few characteristics that can be measured and improved through conscientious objective setting. (See the following section on EIS and Organizational Objectives.) EIS and Organizational Objectives

A number of writers have discovered that one of the major difficulties with EIS implementations is that the information contained in the EIS either does not meet executive requirements, or meets executive requirements, but fails to guide the organization towards its objectives. As discussed earlier, organizations that are comfortable in establishing and working towards Specific, Measurable, Achievable, and Consistent (SMAC) objectives will find it easier to create an EIS that actually drives organizational performance. Yet even these organizations may have difficulty because their stated objectives do not represent all of the things that are important. Crockett (1992) suggests a four step process for developing EIS information requirements based on a broader understanding of organizational objectives. The steps are: (1) identify critical success factors and stakeholder expectations, (2) document performance measures that monitor the critical success factors and stakeholder expectations, (3) determine reporting formats and frequency, and (4) outline information flows and how information can be used. Crockett begins with stakeholders to ensure that all relevant objectives and critical success factors are reflected in the EIS. Kaplan and Norton (1992) suggest that goals and measures need to be developed from each of four perspectives: financial, customer, internal business, and innovation and learning. These perspectives help managers to achieve a balance in setting objectives, and presenting them in a unified report exposes the tough tradeoffs in any management system. An EIS built on this basis will not promote productivity while ignoring quality, or customer satisfaction while ignoring cost. Meyer (1994) raises several questions that should be asked about measurement systems for teams. Four are appropriate for evaluating objectives and measures represented in an EIS. They are: Are all critical organizational outcomes tracked? Are all "out-of-bounds" conditions tracked? (Conditions that are serious enough to trigger a management review.) Are all the critical variables required to reach each outcome tracked? Is there any measure that would not cause the organization to change its behaviour?

In summary, proper definition of organizational objectives and measures is a helpful precondition for reducing organizational resistance to an EIS and is the root of effective EIS use. The benefits of an EIS will be fully realized only when it helps to focus management attention on issues of true importance to the organization. Methodology Implementation of an effective EIS requires clear consensus on the objectives and measures to be monitored in the system and a plan for obtaining the data on which those measures are based. The sections below outline a methodology for achieving these two results. As noted earlier, successful EIS implementations generally begin with a simple prototype rather than a detailed planning process. For that reason, the proposed planning methodologies are as simple and scope-limited as possible. Q.6 Discuss the challenges involved in data integration and coordination process? Data Integration Primer Challenges to Data Integration One of the most fundamental challenges in the process of data integration is setting realistic expectations. The term data integration conjures a perfect coordination of diversified databases, software, equipment, and personnel into a smoothly functioning alliance, free of the persistent headaches that mark less comprehensive systems of information management. Think again. The requirements analysis stage offers one of the best opportunities in the process to recognize and digest the full scope of complexity of the data integration task. Thorough attention to this analysis is possibly the most important ingredient in creating a system that will live to see adoption and maximum use. As the field of data integration progresses, however, other common impediments and compensatory solutions will be easily identified. Current integration practices have already highlighted a few familiar challenges as well as strategies to address them, as outlined below.

Heterogeneous Data Challenges For most transportation agencies, data integration involves synchronizing huge quantities of variable, heterogeneous data resulting from internal legacy systems that vary in data format. Legacy systems may have been created around flat file, network, or hierarchical databases, unlike newer generations of databases which use relational data. Data in different formats from external sources continue to be added to the legacy databases to improve the value of the information. Each generation, product, and homegrown system has unique demands to fulfill in order to store or extract data. So data integration can involve various strategies for coping with heterogeneity. In some cases, the effort becomes a major exercise in data homogenization, which may not enhance the quality of the data offered. Strategies A detailed analysis of the characteristics and uses of data is necessary to mitigate issues with heterogeneous data. First, a model is chosen-either a federated or data warehouse environment- that serves the requirements of the business applications and other uses of the data. Then the database developer will need to ensure that various applications can use this format or, alternatively, that standard operating procedures are adopted to convert the data to another format. Bringing disparate data together in a database system or migrating and fusing highly incompatible databases is painstaking work that can sometimes feel like an overwhelming challenge. Thankfully, software technology has advanced to minimize obstacles through a series of data access routines that allow structured query languages to access nearly all DBM and data file systems-relational or nonrelational. Bad Data

Challenges Data quality is a top concern in any data integration strategy. Legacy data must be cleaned up prior to conversion and integration, or an agency will almost certainly face serious data problems later. Legacy data impurities have a compounding effect; by nature, they tend to concentrate around high volume data users. If this information is corrupt, so, too, will be the decisions made from it. It is not unusual for undiscovered data quality problems to emerge in the process of cleaning information for use by the integrated system. The issue of bad data leads to procedures for regularly auditing the quality of information used. But who holds the ultimate responsibility for this job is not always clear. Strategies: The issue of data quality exists throughout the life of any data integration system. So it is best to establish both practices and responsibilities right from the start, and make provisions for each to continue in perpetuity. The best processes result when developers and users work together to determine the quality controls that will be put in place in both the development phase and the ongoing use of the system. Lack of Storage Capacity Challenges The unanticipated need for additional performance and capacity is one of the most common challenges to data integration, particularly in data warehousing. Two storagerelated requirements generally come into play: extensibility and scalability. Anticipating the extent of growth in an environment in which the need for storage can increase exponentially once a system is initiated drives fears that the storage cost will exceed the benefit of data integration. Introducing such massive quantities of data can push the limits of hardware and software. This may force developers to instigate costly fixes if an

architecture for processing much larger amounts of data must be retrofitted into the planned system. Strategies Alternative storage is becoming routine for data warehouses that are likely to grow in size. Planning for such options helps keep expanding databases affordable. The cost per gigabyte of storage on disk drives continues to decline as technology improves. From 2000 to 2004, for instance, the cost of data storage declined tenfold. High-performance storage disks are expected to follow the downward pricing spiral. Unanticipated Costs Challenges Data integration costs are fueled largely by items that are difficult for the uninitiated to quantify, and thus predict. These might include: Labor costs for initial planning, evaluation, programming and additional data acquisition Software and hardware purchases Unanticipated technology changes/advances Both labor and the direct costs of data storage and maintenance

It is important to note that, regardless of efforts to streamline maintenance, the realities of a fully functioning data integration system may demand a great deal more maintenance than could be anticipated. Unrealistic estimating can be driven by an overly optimistic budget, particularly in these times of budget shortfall and doing more with less. More users, more analysis needs and more complex requirements may drive performance and capacity problems. Limited resources may cause project timelines to be extended, without commensurate funding.

Unanticipated issues, or new issues, may call for expensive consulting help. And the dynamic atmosphere of today's transportation agency must be taken into account, in which lack of staff, changes in business processes, problems with hardware and software, and shifting leadership can drive additional expense. The investment in time and labor required to extract, clean, load, and maintain data can creep if the quality of the data presented is weak. It is not unusual for this to produce unanticipated labor costs that are rather alarmingly out of proportion to the total project budget. Strategies The approach to estimating project costs must be both far-sighted and realistic. This requires an investment in experienced analysts, as well as cooperation, where possible, among sister agencies on lessons learned. Special effort should be made to identify items that may seem unlikely but could dramatically impact total project cost. Extraordinary care in planning, investing in expertise, obtaining stakeholder buy-in and participation, and managing the process will each help ensure that cost overruns are minimized and, when encountered, can be most effectively resolved. Data integration is a fluid process in which such overruns may occur at each step along the way, so trained personnel with vigilant oversight are likely to return dividends instead of adding to cost. A viable data integration approach must recognize that the better data integration works for users, the more fundamental it will become to business processes. This level of use must be supported by consistent maintenance. It might be tempting to think that a well designed system will, by nature, function without much upkeep or tweaking. In fact, the best systems and processes tend to thrive on the routine care and support of well-trained personnel, a fact that wise managers generously anticipate in the data integration plan and budget. Lack of Cooperation from Staff

Challenges User groups within an agency may have developed databases on their own, sometimes independently from information systems staff, that are highly responsive to the users' particular needs. It is natural that owners of these functioning standalone units might be skeptical that the new system would support their needs as effectively. Other proprietary interests may come into play. For example, division staff may not want the data they collect and track to be at all times transparently visible to headquarters staff without the opportunity to address the nuances of what the data appear to show. Owners or users may fear that higher ups without appreciation of the peculiarities of a given method of operation will gain more control over how data is collected and accessed organization-wide. In some agencies, the level of personnel, consultants, and financial support emanating from the highest echelons of management may be insufficient to dispel these fears and gain cooperation. Top management must be fully invested in the project. Otherwise, the likelihood is smaller that the strategic data integration plan and the resources associated with it will be approved. The additional support required to engage and convey to everyone in the agency the need for and benefits of data integration is unlikely to flow from leaders who lack awareness of or commitment to the benefits of data integration.

Strategies Any large-scale data integration project, regardless of model, demands that executive management be fully on board. Without it, the initiative is, quite simply, likely to fail. Informing and involving the diversity of players during the crucial requirements analysis stage, and then in each subsequent phase and step, is probably the single most effective way to gain buy-in, trust, and cooperation. Collecting and addressing each user's concerns may be a daunting proposition, particularly for knowledgeable

information professionals who prefer to "cut to the chase." However, without a personal stake in the process and a sense of ownership of the final product, the long-term health of this major investment is likely to be compromised by users who feel that change has been enforced upon them rather than designed to advance their interests. Incremental education, another benefit of stakeholder involvement, is easier to impart than after-the-fact training, particularly since it addresses both the capabilities and limitations of the system, helping to calibrate appropriate expectations along the way. Since so much of the project's success is dependent upon understanding and conveying both human and technical issues, skilled communicators are a logical component of any data integration team. Whether staff or consultants, professional communications personnel are most effective as core participants, rather than occasional or outside contributors. They are trained to recognize and ameliorate gaps in understanding and motivation. Their skills also help maximize the conditions for cooperation and enthusiastic adoption. In many transportation agencies, public information personnel actually focus a significant amount of their time and budget on internal audiences rather than external customers. This makes them well attuned to the operational realities of a variety of internal stakeholders. Peer Perspectives... At least three conditions were required for the success of Virginia DOT's development effort:

Upper management had to support the business objectives of the project and the creation of a new system to meet the objectives

Project managers had to receive the budget, staff, and IT resources necessary to initiate and complete the process

All stakeholders and eventual system users from the agency's districts and headquarters had to cooperate with the project team throughout the process(22)

Lack of Data Management Expertise Challenges As more transportation agencies nationwide undertake the integration of data, the availability of experienced personnel increases. However, since data integration is a multiyear, highly complex proposition, even these leaders may not have the kind of expertise that evolves over a full project life-cycle. Common problems develop at different stages of the process and these can better be anticipated and addressed when key personnel have managed the typical variables of each project phase. Also, the process of transferring historical data from its independent source to the integrated system may benefit from the knowledge of the manager who originally captured and stored the information. High turnover in such positions, along with early retirements and other personnel shifts driven by an historically tight budget environment, may complicate the mining and preparation of this data for convergence with the new system. Strategies A seasoned and highly knowledgeable data integration project leader and a data manager with state of the practice experience are the minimum required to design a viable approach to integration. Choosing this expertise very carefully can help ensure that the resulting architecture is sufficiently modular, can be maintained, and is robust enough to support a wide range of owner and user needs while remaining flexible enough to accommodate changing transportation decision-support requirements over a period of years. Perception of Data Integration as an Overwhelming Effort Challenges When transportation agencies consider data integration, one pervasive notion is that the analysis of existing information needs and infrastructure, much less the organization of

data into viable channels for integration, requires a monumental initial commitment of resources and staff. Resource-scarce agencies identify this perceived major upfront overhaul as "unachievable" and "disruptive." In addition, uncertainties about funding priorities and potential shortfalls can exacerbate efforts to move forward. Strategies Methodical planning is essential in data integration. Setting incremental (or phased) goals helps ensure that each phase can be understood, achieved, and funded adequately. This approach also allows the integration process to be flexible and agile, minimizing risks associated with funding and other resource uncertainties and priority shifts. In addition, the smaller, more accurate goals will help sustain the integration effort and make it less disruptive to those using and providing data.

Master of Business Administration-MBA Semester 3 MI0036 Business Intelligence Tools - 4 Credits Behavior

Assignment Set- 2 (60 Marks)

Q.1 Explain business development life cycle in detail? Business Life Cycle Your business is changing. With the passage of time, your company will go through various stages of the business life cycle. Learn what upcoming focuses, challenges and financing sources you will need to succeed. A business goes through stages of development similar to the cycle of life for the human race. Parenting strategies that work for your toddler can not be applied to your teenager. The same goes for your small business. It will be faced with a different cycle throughout its life. What you focus on today will change and require different approaches to be successful. The 7 Stages of the Business Life Cycle

Seed

The seed stage of your business life cycle is when your business is just a thought or an idea. This is the very conception or birth of a new business. Challenge: Most seed stage companies will have to overcome the challenge of market acceptance and pursue one niche opportunity. Do not spread money and time resources too thin. Focus: At this stage of the business the focus is on matching the business opportunity with your skills, experience and passions. Other focal points include: deciding on a business ownership structure, finding professional advisors, and business planning. Money Sources: Early in the business life cycle with no proven market or customers the business will rely on cash from owners, friends and family. Other potential sources include suppliers, customers, government grants and banks. WNB products to consider: Classic Checking Account / Business Savings Account / SBA Resources / Minnesota SBDC / Minnesota Community Capital Fund Start-Up Your business is born and now exists legally. Products or services are in production and you have your first customers. Challenge: If your business is in the start-up life cycle stage, it is likely you have overestimated money needs and the time to market. The main challenge is not to burn through what little cash you have. You need to learn what profitable needs your clients have and do a reality check to see if your business is on the right track. Focus: Start-ups require establishing a customer base and market presence along with tracking and conserving cash flow. Money Sources: Owner, friends, family, suppliers, customers, grants, and banks. WNB products to consider: Seed Stage Products / Working Capital Loan / Line of Credit / Equipment Financing / Business Internet Banking / Bill Payer / Credit Card Processing Growth

Your business has made it through the toddler years and is now a child. Revenues and customers are increasing with many new opportunities and issues. Profits are strong, but competition is surfacing. Challenge: The biggest challenge growth companies face is dealing with the constant range of issues bidding for more time and money. Effective management is required and a possible new business plan. Learn how to train and delegate to conquer this stage of development. Focus: Growth life cycle businesses are focused on running the business in a more formal fashion to deal with the increased sales and customers. Better accounting and management systems will have to be set-up. New employees will have to be hired to deal with the influx of business. Money Sources: Banks, profits, partnerships, grants and leasing options. WNB products to consider: Line of Credit / Equipment Financing / Construction Loan / Commercial Real Estate Loan / Health Savings Account / Remote Deposit / Cash Management / Business Credit Card Established Your business has now matured into a thriving company with a place in the market and loyal customers. Sales growth is not explosive but manageable. Business life has become more routine. Challenge: It is far too easy to rest on your laurels during this life stage. You have worked hard and have earned a rest but the marketplace is relentless and competitive. Stay focused on the bigger picture. Issues like the economy, competitors or changing customer tastes can quickly end all you have work for. Focus: An established life cycle company will be focused on improvement and productivity. To compete in an established market, you will require better business practices along with automation and outsourcing to improve productivity. Money Sources: Profits, banks, investors and government.

WNB products to consider: Premium Checking Account / Business Money Fund Account / Sweep Account / Private Financial / 401K Planning / Investment Brokerage / Health Savings Account / Remote Deposit / Cash Management / Business Credit Card / Line of Credit

Expansion This life cycle is characterized by a new period of growth into new markets and distribution channels. This stage is often the choice of the business owner to gain a larger market share and find new revenue and profit channels. Challenge: Moving into new markets requires the planning and research of a seed or start-up stage business. Focus should be on businesses that complement your existing experience and capabilities. Moving into unrelated businesses can be disastrous. Focus: Add new products or services to existing markets or expand existing business into new markets and customer types. Money Sources: Joint ventures, banks, licensing, new investors and partners. WNB products to consider: Acquisition Financing / Private Financial / Line of Credit / Equipment Financing / Construction Loan / Commercial Real Estate Loan / Investment Brokerage Mature Year over year sales and profits tend to be stable, however competition remains fierce. Eventually sales start to fall off and a decision is needed whether to expand or exit the company. Challenge: Businesses in the mature stage of the life cycle will be challenged with dropping sales, profits, and negative cash flow. The biggest issue is how long the business can support a negative cash flow. Ask is it time to move back to the expansion stage or move on to the final life cycle stage...exit. Focus: Search for new opportunities and business ventures. Cutting costs and finding ways to sustain cash flow are vital for the mature stage. Money Sources: Suppliers, customers, owners, and banks.

WNB products to consider: Private Financial / 401K Planning / Employee Stock Ownership Plans (ESOP) / Investment Brokerage / Health Savings Account / Remote Deposit / Cash Management / Line of Credit

Exit This is the big opportunity for your business to cash out on all the effort and years of hard work. Or it can mean shutting down the business. Challenge: Selling a business requires your realistic valuation. It may have been years of hard work to build the company, but what is its real value in the current market place. If you decide to close your business, the challenge is to deal with the financial and psychological aspects of a business loss. Focus: Get a proper valuation on your company. Look at your business operations, management and competitive barriers to make the company worth more to the buyer. Set-up legal buy-sell agreements along with a business transition plan. Money Sources: Find a business valuation partner. Consult with your accountant and financial advisors for the best tax strategy to sell or close-out down business. WNB products to consider: Acquisition Financing / Employee Stock Ownership Plans (ESOP) / Investment Brokerage / Trust Q.2. Discuss the various components of data ware house? Components of a Data Warehouse Overall Architecture The data warehouse architecture is based on a relational database management system server that functions as the central repository for informational data. Operational data and processing is completely separated from data warehouse processing. This central information repository is surrounded by a number of key components designed to make the entire environment functional, manageable and accessible by both the operational systems that source data into the warehouse and by end-user query and analysis tools. Typically, the source data for the warehouse is coming from the operational applications. As the data enters the warehouse, it is cleaned up and transformed into an

integrated structure and format. The transformation process may involve conversion, summarization, filtering and condensation of data. Because the data contains a historical component, the warehouse must be capable of holding and managing large volumes of data as well as different data structures for the same database over time. The next sections look at the seven major components of data warehousing: Data Warehouse Database The central data warehouse database is the cornerstone of the data warehousing environment. This database is almost always implemented on the relational database management system (RDBMS) technology. However, this kind of implementation is often constrained by the fact that traditional RDBMS products are optimized for transactional database processing. Certain data warehouse attributes, such as very large database size, ad hoc query processing and the need for flexible user view creation including aggregates, multi-table joins and drill-downs, have become drivers for different technological approaches to the data warehouse database. These approaches include: Parallel relational database designs for scalability that include shared-memory, shared disk, or shared-nothing models implemented on various multiprocessor configurations (symmetric multiprocessors or SMP, massively parallel processors or MPP, and/or clusters of uni- or multiprocessors). An innovative approach to speed up a traditional RDBMS by using new index structures to bypass relational table scans. Multidimensional databases (MDDBs) that are based on proprietary database technology; conversely, a dimensional data model can be implemented using a familiar RDBMS. Multi-dimensional databases are designed to overcome any limitations placed on the warehouse by the nature of the relational data model. MDDBs enable on-line analytical processing (OLAP) tools that architecturally belong to a group of data warehousing components jointly categorized as the data query, reporting, analysis and mining tools. Sourcing, Acquisition, Cleanup and Transformation Tools

A significant portion of the implementation effort is spent extracting data from operational systems and putting it in a format suitable for informational applications that run off the data warehouse. The data sourcing, cleanup, transformation and migration tools perform all of the conversions, summarizations, key changes, structural changes and condensations needed to transform disparate data into information that can be used by the decision support tool. They produce the programs and control statements, including the COBOL programs, MVS job-control language (JCL), UNIX scripts, and SQL data definition language (DDL) needed to move data into the data warehouse for multiple operational systems. These tools also maintain the meta data. The functionality includes: Removing unwanted data from operational databases Converting to common data names and definitions Establishing defaults for missing data Accommodating source data definition changes

The data sourcing, cleanup, extract, transformation and migration tools have to deal with some significant issues including: Database heterogeneity. DBMSs are very different in data models, data access language, data navigation, operations, concurrency, integrity, recovery etc. Data heterogeneity. This is the difference in the way data is defined and used in different models - homonyms, synonyms, unit compatibility (U.S. vs metric), different attributes for the same entity and different ways of modeling the same fact. These tools can save a considerable amount of time and effort. However, significant shortcomings do exist. For example, many available tools are generally useful for simpler data extracts. Frequently, customized extract routines need to be developed for the more complicated data extraction procedures. Meta data

Meta data is data about data that describes the data warehouse. It is used for building, maintaining, managing and using the data warehouse. Meta data can be classified into: Technical meta data, which contains information about warehouse data for use by warehouse designers and administrators when carrying out warehouse development and management tasks. Business meta data, which contains information that gives users an easy-tounderstand perspective of the information stored in the data warehouse. Equally important, meta data provides interactive access to users to help understand content and find data. One of the issues dealing with meta data relates to the fact that many data extraction tool capabilities to gather meta data remain fairly immature. Therefore, there is often the need to create a meta data interface for users, which may involve some duplication of effort. Meta data management is provided via a meta data repository and accompanying software. Meta data repository management software, which typically runs on a workstation, can be used to map the source data to the target database; generate code for data transformations; integrate and transform the data; and control moving data to the warehouse. As user's interactions with the data warehouse increase, their approaches to reviewing the results of their requests for information can be expected to evolve from relatively simple manual analysis for trends and exceptions to agent-driven initiation of the analysis based on user-defined thresholds. The definition of these thresholds, configuration parameters for the software agents using them, and the information directory indicating where the appropriate sources for the information can be found are all stored in the meta data repository as well. Access Tools The principal purpose of data warehousing is to provide information to business users for strategic decision-making. These users interact with the data warehouse using front-end tools. Many of these tools require an information specialist, although many end

users develop expertise in the tools. Tools fall into four main categories: query and reporting tools, application development tools, online analytical processing tools, and data mining tools. Query and Reporting tools can be divided into two groups: reporting tools and managed query tools. Reporting tools can be further divided into production reporting tools and report writers. Production reporting tools let companies generate regular operational reports or support high-volume batch jobs such as calculating and printing paychecks. Report writers, on the other hand, are inexpensive desktop tools designed for end-users. Managed query tools shield end users from the complexities of SQL and database structures by inserting a metalayer between users and the database. These tools are designed for easy-to-use, point-and-click operations that either accept SQL or generate SQL database queries. Often, the analytical needs of the data warehouse user community exceed the builtin capabilities of query and reporting tools. In these cases, organizations will often rely on the tried-and-true approach of in-house application development using graphical development environments such as PowerBuilder, Visual Basic and Forte. These application development platforms integrate well with popular OLAP tools and access all major database systems including Oracle, Sybase, and Informix. OLAP tools are based on the concepts of dimensional data models and corresponding databases, and allow users to analyze the data using elaborate, multidimensional views. Typical business applications include product performance and profitability, effectiveness of a sales program or marketing campaign, sales forecasting and capacity planning. These tools assume that the data is organized in a multidimensional model. A critical success factor for any business today is the ability to use information effectively. Data mining is the process of discovering meaningful new correlations, patterns and trends by digging into large amounts of data stored in the warehouse using artificial intelligence, statistical and mathematical techniques. Data Marts

The concept of a data mart is causing a lot of excitement and attracts much attention in the data warehouse industry. Mostly, data marts are presented as an alternative to a data warehouse that takes significantly less time and money to build. However, the term data mart means different things to different people. A rigorous definition of this term is a data store that is subsidiary to a data warehouse of integrated data. The data mart is directed at a partition of data (often called a subject area) that is created for the use of a dedicated group of users. A data mart might, in fact, be a set of denormalized, summarized, or aggregated data. Sometimes, such a set could be placed on the data warehouse rather than a physically separate store of data. In most instances, however, the data mart is a physically separate store of data and is resident on separate database server, often a local area network serving a dedicated user group. Sometimes the data mart simply comprises relational OLAP technology which creates highly denormalized dimensional model (e.g., star schema) implemented on a relational database. The resulting hypercubes of data are used for analysis by groups of users with a common interest in a limited portion of the database. These types of data marts, called dependent data marts because their data is sourced from the data warehouse, have a high value because no matter how they are deployed and how many different enabling technologies are used, different users are all accessing the information views derived from the single integrated version of the data. Unfortunately, the misleading statements about the simplicity and low cost of data marts sometimes result in organizations or vendors incorrectly positioning them as an alternative to the data warehouse. This viewpoint defines independent data marts that in fact, represent fragmented point solutions to a range of business problems in the enterprise. This type of implementation should be rarely deployed in the context of an overall technology or applications architecture. Indeed, it is missing the ingredient that is at the heart of the data warehousing concept -- that of data integration. Each independent data mart makes its own assumptions about how to consolidate the data, and the data across several data marts may not be consistent. Moreover, the concept of an independent data mart is dangerous -- as soon as the first data mart is created, other organizations, groups, and subject areas within the enterprise embark on the task of building their own data marts. As a result, you create an

environment where multiple operational systems feed multiple non-integrated data marts that are often overlapping in data content, job scheduling, connectivity and management. In other words, you have transformed a complex many-to-one problem of building a data warehouse from operational and external data sources to a many-to-many sourcing and management nightmare. Data Warehouse Administration and Management Data warehouses tend to be as much as 4 times as large as related operational databases, reaching terabytes in size depending on how much history needs to be saved. They are not synchronized in real time to the associated operational data but are updated as often as once a day if the application requires it. In addition, almost all data warehouse products include gateways to transparently access multiple enterprise data sources without having to rewrite applications to interpret and utilize the data. Furthermore, in a heterogeneous data warehouse environment, the various databases reside on disparate systems, thus requiring inter-networking tools. The need to manage this environment is obvious. Managing data warehouses includes security and priority management; monitoring updates from the multiple sources; data quality checks; managing and updating meta data; auditing and reporting data warehouse usage and status; purging data; replicating, subsetting and distributing data; backup and recovery and data warehouse storage management. Q.3. Discuss data extraction process? What are the various methods being used for data extraction? Overview of Extraction in Data Warehouses Extraction is the operation of extracting data from a source system for further use in a data warehouse environment. This is the first step of the ETL process. After the extraction, this data can be transformed and loaded into the data warehouse. The source systems for a data warehouse are typically transaction processing applications. For example, one of the source systems for a sales analysis data warehouse might be an order entry system that records all of the current order activities.

Designing and creating the extraction process is often one of the most timeconsuming tasks in the ETL process and, indeed, in the entire data warehousing process. The source systems might be very complex and poorly documented, and thus determining which data needs to be extracted can be difficult. The data has to be extracted normally not only once, but several times in a periodic manner to supply all changed data to the data warehouse and keep it up-to-date. Moreover, the source system typically cannot be modified, nor can its performance or availability be adjusted, to accommodate the needs of the data warehouse extraction process. These are important considerations for extraction and ETL in general. This chapter, however, focuses on the technical considerations of having different kinds of sources and extraction methods. It assumes that the data warehouse team has already identified the data that will be extracted, and discusses common techniques used for extracting data from source databases. Designing this process means making decisions about the following two main aspects: Which extraction method do I choose? This influences the source system, the transportation process, and the time needed for refreshing the warehouse. How do I provide the extracted data for further processing? This influences the transportation method, and the need for cleaning and transforming the data. Introduction to Extraction Methods in Data Warehouses The extraction method you should choose is highly dependent on the source system and also from the business needs in the target data warehouse environment. Very often, there is no possibility to add additional logic to the source systems to enhance an incremental extraction of data due to the performance or the increased workload of these systems. Sometimes even the customer is not allowed to add anything to an out-of-the-box application system.

The estimated amount of the data to be extracted and the stage in the ETL process (initial load or maintenance of data) may also impact the decision of how to extract, from a logical and a physical perspective. Basically, you have to decide how to extract data logically and physically. Logical Extraction Methods There are two types of logical extraction: Full Extraction Incremental Extraction

Full Extraction The data is extracted completely from the source system. Because this extraction reflects all the data currently available on the source system, there's no need to keep track of changes to the data source since the last successful extraction. The source data will be provided as-is and no additional logical information (for example, timestamps) is necessary on the source site. An example for a full extraction may be an export file of a distinct table or a remote SQL statement scanning the complete source table. Incremental Extraction At a specific point in time, only the data that has changed since a well-defined event back in history will be extracted. This event may be the last time of extraction or a more complex business event like the last booking day of a fiscal period. To identify this delta change there must be a possibility to identify all the changed information since this specific time event. This information can be either provided by the source data itself such as an application column, reflecting the last-changed timestamp or a change table where an appropriate additional mechanism keeps track of the changes besides the originating transactions. In most cases, using the latter method means adding extraction logic to the source system. Many data warehouses do not use any change-capture techniques as part of the extraction process. Instead, entire tables from the source systems are extracted to the data warehouse or staging area, and these tables are compared with a previous extract from the source system to identify the changed data. This approach may not have significant impact

on the source systems, but it clearly can place a considerable burden on the data warehouse processes, particularly if the data volumes are large. Oracle's Change Data Capture mechanism can extract and maintain such delta information. See Chapter 16, " Change Data Capture" for further details about the Change Data Capture framework.

Physical Extraction Methods Depending on the chosen logical extraction method and the capabilities and restrictions on the source side, the extracted data can be physically extracted by two mechanisms. The data can either be extracted online from the source system or from an offline structure. Such an offline structure might already exist or it might be generated by an extraction routine. There are the following methods of physical extraction: Online Extraction Offline Extraction

Online Extraction The data is extracted directly from the source system itself. The extraction process can connect directly to the source system to access the source tables themselves or to an intermediate system that stores the data in a preconfigured manner (for example, snapshot logs or change tables). Note that the intermediate system is not necessarily physically different from the source system. With online extractions, you need to consider whether the distributed transactions are using original source objects or prepared source objects. Offline Extraction The data is not extracted directly from the source system but is staged explicitly outside the original source system. The data already has an existing structure (for example, redo logs, archive logs or transportable tablespaces) or was created by an extraction routine. You should consider the following structures:

Flat files Data in a defined, generic format. Additional information about the source object is necessary for further processing.

Dump files Oracle-specific format. Information about the containing objects may or may not be included, depending on the chosen utility.

Redo and archive logs Information is in a special, additional dump file.

Transportable tablespaces A powerful way to extract and move large volumes of data between Oracle databases. A more detailed example of using this feature to extract and transport data is provided in Chapter 13, " Transportation in Data Warehouses". Oracle Corporation recommends that you use transportable tablespaces whenever possible, because they can provide considerable advantages in performance and manageability over other extraction techniques. See Oracle Database Utilities for more information on using export/import.

Change Data Capture An important consideration for extraction is incremental extraction, also called Change Data Capture. If a data warehouse extracts data from an operational system on a nightly basis, then the data warehouse requires only the data that has changed since the last extraction (that is, the data that has been modified in the past 24 hours). Change Data Capture is also the key-enabling technology for providing near real-time, or on-time, data warehousing. When it is possible to efficiently identify and extract only the most recently changed data, the extraction process (as well as all downstream operations in the ETL process) can be much more efficient, because it must extract a much smaller volume of data. Unfortunately, for many source systems, identifying the recently modified data may

be difficult or intrusive to the operation of the system. Change Data Capture is typically the most challenging technical issue in data extraction. Because change data capture is often desirable as part of the extraction process and it might not be possible to use the Change Data Capture mechanism, this section describes several techniques for implementing a self-developed change capture on Oracle Database source systems: Timestamps Partitioning Triggers

These techniques are based upon the characteristics of the source systems, or may require modifications to the source systems. Thus, each of these techniques must be carefully evaluated by the owners of the source system prior to implementation. Each of these techniques can work in conjunction with the data extraction technique discussed previously. For example, timestamps can be used whether the data is being unloaded to a file or accessed through a distributed query. See Chapter 16, " Change Data Capture" for further details. Timestamps The tables in some operational systems have timestamp columns. The timestamp specifies the time and date that a given row was last modified. If the tables in an operational system have columns containing timestamps, then the latest data can easily be identified using the timestamp columns. For example, the following query might be useful for extracting today's data from an orders table: SELECT * FROM orders WHERE TRUNC(CAST(order_date AS date),'dd') = TO_DATE(SYSDATE,'dd-mon-yyyy'); If the timestamp information is not available in an operational source system, you will not always be able to modify the system to include timestamps. Such modification would require, first, modifying the operational system's tables to include a new timestamp

column and then creating a trigger to update the timestamp column following every operation that modifies a given row. Partitioning Some source systems might use range partitioning, such that the source tables are partitioned along a date key, which allows for easy identification of new data. For example, if you are extracting from an orders table, and the orders table is partitioned by week, then it is easy to identify the current week's data. Triggers Triggers can be created in operational systems to keep track of recently updated records. They can then be used in conjunction with timestamp columns to identify the exact time and date when a given row was last modified. You do this by creating a trigger on each source table that requires change data capture. Following each DML statement that is executed on the source table, this trigger updates the timestamp column with the current time. Thus, the timestamp column provides the exact time and date when a given row was last modified. A similar internalized trigger-based technique is used for Oracle materialized view logs. These logs are used by materialized views to identify changed data, and these logs are accessible to end users. However, the format of the materialized view logs is not documented and might change over time. If you want to use a trigger-based mechanism, use synchronous change data capture. It is recommended that you use synchronous Change Data Capture for trigger based change capture, because CDC provides an externalized interface for accessing the change information and provides a framework for maintaining the distribution of this information to various clients. Materialized view logs rely on triggers, but they provide an advantage in that the creation and maintenance of this change-data system is largely managed by the database. However, Oracle Corporation recommends the usage of synchronous Change Data Capture for trigger-based change capture, since CDC provides an externalized interface for

accessing the change information and provides a framework for maintaining the distribution of this information to various clients Trigger-based techniques might affect performance on the source systems, and this impact should be carefully considered prior to implementation on a production source system. Data Warehousing Extraction Examples You can extract data in two ways: Extraction Using Data Files Extraction Through Distributed Operations

Extraction Using Data Files Most database systems provide mechanisms for exporting or unloading data from the internal database format into flat files. Extracts from mainframe systems often use COBOL programs, but many databases, as well as third-party software vendors, provide export or unload utilities. Data extraction does not necessarily mean that entire database structures are unloaded in flat files. In many cases, it may be appropriate to unload entire database tables or objects. In other cases, it may be more appropriate to unload only a subset of a given table such as the changes on the source system since the last extraction or the results of joining multiple tables together. Different extraction techniques vary in their capabilities to support these two scenarios. When the source system is an Oracle database, several alternatives are available for extracting data into files: Extracting into Flat Files Using SQL*Plus Extracting into Flat Files Using OCI or Pro*C Programs Exporting into Export Files Using the Export Utility Extracting into Export Files Using External Tables

Extracting into Flat Files Using SQL*Plus The most basic technique for extracting data is to execute a SQL query in SQL*Plus and direct the output of the query to a file. For example, to extract a flat file, country_city.log, with the pipe sign as delimiter between column values, containing a list of the cities in the US in the tables countries and customers, the following SQL script could be run: SET echo off SET pagesize 0 SPOOL country_city.log SELECT distinct t1.country_name ||'|'|| t2.cust_city FROM countries t1, customers t2 WHERE t1.country_id = t2.country_id AND t1.country_name= 'United States of America'; SPOOL off The exact format of the output file can be specified using SQL*Plus system variables. This extraction technique offers the advantage of storing the result in a customized format. Note that using the external table data pump unload facility, you can also extract the result of an arbitrary SQL operation. The example previously extracts the results of a join. This extraction technique can be parallelized by initiating multiple, concurrent SQL*Plus sessions, each session running a separate query representing a different portion of the data to be extracted. For example, suppose that you wish to extract data from an orders table, and that the orders table has been range partitioned by month, with partitions orders_jan1998, orders_feb1998, and so on. To extract a single year of data from the orders table, you could initiate 12 concurrent SQL*Plus sessions, each extracting a single partition. The SQL script for one such session could be: SPOOL order_jan.dat SELECT * FROM orders PARTITION (orders_jan1998); SPOOL OFF These 12 SQL*Plus processes would concurrently spool data to 12 separate files. You can then concatenate them if necessary (using operating system utilities) following the

extraction. If you are planning to use SQL*Loader for loading into the target, these 12 files can be used as is for a parallel load with 12 SQL*Loader sessions. See Chapter 13, " Transportation in Data Warehouses" for an example. Even if the orders table is not partitioned, it is still possible to parallelize the extraction either based on logical or physical criteria. The logical method is based on logical ranges of column values, for example: SELECT ... WHERE order_date BETWEEN TO_DATE('01-JAN-99') AND TO_DATE('31-JAN-99');

The physical method is based on a range of values. By viewing the data dictionary, it is possible to identify the Oracle Database data blocks that make up the orders table. Using this information, you could then derive a set of rowid-range queries for extracting data from the orders table: SELECT * FROM orders WHERE rowid BETWEEN value1 and value2; Parallelizing the extraction of complex SQL queries is sometimes possible, although the process of breaking a single complex query into multiple components can be challenging. In particular, the coordination of independent processes to guarantee a globally consistent view can be difficult. Unlike the SQL*Plus approach, using the new external table data pump unload functionality provides transparent parallel capabilities. Note that all parallel techniques can use considerably more CPU and I/O resources on the source system, and the impact on the source system should be evaluated before parallelizing any extraction technique. Extracting into Flat Files Using OCI or Pro*C Programs OCI programs (or other programs using Oracle call interfaces, such as Pro*C programs), can also be used to extract data. These techniques typically provide improved performance over the SQL*Plus approach, although they also require additional programming. Like the SQL*Plus approach, an OCI program can extract the results of any SQL query. Furthermore, the parallelization techniques described for the SQL*Plus approach can be readily applied to OCI programs as well.

When using OCI or SQL*Plus for extraction, you need additional information besides the data itself. At minimum, you need information about the extracted columns. It is also helpful to know the extraction format, which might be the separator between distinct columns. Exporting into Export Files Using the Export Utility The Export utility allows tables (including data) to be exported into Oracle Database export files. Unlike the SQL*Plus and OCI approaches, which describe the extraction of the results of a SQL statement, Export provides a mechanism for extracting database objects. Thus, Export differs from the previous approaches in several important ways: The export files contain metadata as well as data. An export file contains not only the raw data of a table, but also information on how to re-create the table, potentially including any indexes, constraints, grants, and other attributes associated with that table. A single export file may contain a subset of a single object, many database objects, or even an entire schema. Export cannot be directly used to export the results of a complex SQL query. Export can be used only to extract subsets of distinct database objects. The output of the Export utility must be processed using the Import utility.

Oracle provides the original Export and Import utilities for backward compatibility and the data pump export/import infrastructure for high-performant, scalable and parallel extraction. See Oracle Database Utilities for further details. Extracting into Export Files Using External Tables In addition to the Export Utility, you can use external tables to extract the results from any SELECT operation. The data is stored in the platform independent, Oracleinternal data pump format and can be processed as regular external table on the target system. The following example extracts the result of a join operation in parallel into the four specified files. The only allowed external table type for extracting data is the Oracleinternal format ORACLE_DATAPUMP.

CREATE DIRECTORY def_dir AS '/net/dlsun48/private/hbaer/WORK/FEATURES/et'; DROP TABLE extract_cust; CREATE TABLE extract_cust ORGANIZATION EXTERNAL (TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY def_dir ACCESS PARAMETERS (NOBADFILE NOLOGFILE) LOCATION ('extract_cust1.exp', 'extract_cust2.exp', 'extract_cust3.exp', 'extract_cust4.exp')) PARALLEL 4 REJECT LIMIT UNLIMITED AS SELECT c.*, co.country_name, co.country_subregion, co.country_region FROM customers c, countries co where co.country_id=c.country_id; The total number of extraction files specified limits the maximum degree of parallelism for the write operation. Note that the parallelizing of the extraction does not automatically parallelize the SELECT portion of the statement. Unlike using any kind of export/import, the metadata for the external table is not part of the created files when using the external table data pump unload. To extract the appropriate metadata for the external table, use the DBMS_METADATA package, as illustrated in the following statement: SET LONG 2000 SELECT DBMS_METADATA.GET_DDL('TABLE','EXTRACT_CUST') FROM DUAL; Extraction Through Distributed Operations Using distributed-query technology, one Oracle database can directly query tables located in various different source systems, such as another Oracle database or a legacy system connected with the Oracle gateway technology. Specifically, a data warehouse or staging database can directly access tables and data located in a connected source system. Gateways are another form of distributed-query technology. Gateways allow an Oracle database (such as a data warehouse) to access database tables stored in remote, non-Oracle databases. This is the simplest method for moving data between two Oracle databases

because it combines the extraction and transformation into a single step, and requires minimal programming. However, this is not always feasible. Suppose that you wanted to extract a list of employee names with department names from a source database and store this data into the data warehouse. Using an Oracle Net connection and distributed-query technology, this can be achieved using a single SQL statement: CREATE TABLE country_city AS SELECT distinct t1.country_name, t2.cust_city FROM countries@source_db t1, customers@source_db t2 WHERE t1.country_id = t2.country_id AND t1.country_name='United States of America'; This statement creates a local table in a data mart, country_city, and populates it with data from the countries and customers tables on the source system. This technique is ideal for moving small volumes of data. However, the data is transported from the source system to the data warehouse through a single Oracle Net connection. Thus, the scalability of this technique is limited. For larger data volumes, filebased data extraction and transportation techniques are often more scalable and thus more appropriate. Q.4 Discuss the needs of developing OLAP tools in details? MOLAP or ROLAP OLAP tools take you a step beyond query and reporting tools. Via OLAP tools, data is represented using a multidimensional model rather than the more traditional tabular data model. The traditional model defines a database schema that focuses on modeling a process of function, and the information is viewed as a set of transactions, each which occurred at some single point in time. The multidimensional model usually defines a star schema, viewing data not as a single event but rather as the cumulative effect of events over some period of time, such as weeks, then months, then years. With OLAP tools, the user generally vies the data in grids or corsstabs that can be pivoted to offer different perspectives on the data. OLAP also enables interactive querying of the data. For example,

a user can look at information at one aggregation (such as a sales region) and then drill down to more detail information, such as sales by state, then city, then store. OLAP tools do not indicate how the data is actually stored. Given that, its not surprising that there are multiple ways to store the data, including storing the data in a dedicated multidimensional database (also referred to as MOLAP or MDD). Examples include Arbors Softwares Essbase and Oracle Express Server. The other choice involves storing the data in relational databases and having an OLAP tool work directly against the data, referred to as relational OLAP (also referred to as ROLAP or RDBMS). Examples include MicroStrategys DSS server and related products, Informixs Informix-MetaCube, Information Advantages Decision Suite, and Platinum Technologies Plantinum InfoBeacon. (Some also include Red Bricks Warehouse in this category, but it isnt really an OLAP tool. Rather, it is a relations database optimized for performing the types of operations that ROLAP tools need.) ROLAP versus MOLAP Relational OLAP (ROLAP) Multidimensional OLAP (MOLAP) Scale to terabytes Under 50 DB capacity Managing of summary tables /indexes Instant response Platform portability Easier to implement SMP and MPP SMP only Secure Integrated meta data Proven technology Data modeling required Data warehouses can be implemented on standard or extended relational DBMSs, called relational OLAP (ROLAP) servers. these serves assume that data is stored in relational databases and they support extensions to SQL and special access and implementation methods to efficiently implement the multidimensional data model and operations. In contrast, multidimensional OLAP (MOLAP) servers are servers that directly store multidimensional data in special data structures (like arrays or cubs) and implement OLAP operations over these data in free-form fashion (free-from within the framework of the DMBS that holds the multidimensional data). MOLAP servers have sparsely populated matrices, numeric data, and a rigid structure of data once the data enters the MOLAP DBMS framework.

Relational Databases ROLAP servers contain both numeric and textual data, serving a much wider purpose than their MOLAP counterparts. Unlike MOLAP DBMSs (supported by specialized database management systems). ROLAP DBMSs (or RDMBSs) are supported by relational technology. RDBMSs support numeric, textual, spatial, audio, graphic, and video data, general-purpose DSS analysis, freely structured data, numerous indexes, and star schemas. ROLAP servers can have both disciplined and ad hoc usage and can contain both detailed and summarized data. ROLAP supports large databases while enabling good performance, platform portability, exploitation of hardware advances such as parallel processing, robust security, multi-user concurrent access (including read-write with locking), recognized standards, and openness to multiple vendors tools. ROLAP is based on familiar, proven, and already selected technologies. ROLAP tools take advantage of parallel RDBMSs for those parts of the application processed using SQL (SQL not being a multidimensional access or processing language). SO, although it is always possible to store multidimensional data in a number of relations tables (the star schema), SQL does not, by itself, support multidimensional manipulation of calculations. Therefore, ROLAP products must do these calculations either in the client software or intermediate server engine. Note, however, that Informix has integrated the ROLAP calculation engine into the RDBMS, effectively mitigating the above disadvantage. Multidimensional Databases MDDs deliver impressive query performance by pre-calculating or preconsolidating transactional data rather than calculating on-the-fly. (MDDs pre-calculate and store every measure at every hierarchy summary level at load time and store them in efficiently indexed cells for immediate retrieval.) However, to fully preconsolidate incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB; obviously, a file this size take many minutes to load and consolidate. As a result, MDDs do not scale, making them a

lackluster choice for the enterprise atomic-level data in the data warehouse. However, MDDs are great candidates for the <50GB department data marts. To manage large amounts of data, MDD servers aggregate data along hierarchies. Not only do hierarchies provide a mechanism for aggregating data, they also provide a technique for navigation. The ability to navigate data by zooming in and out of detail is key. With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake. With MDDs, for example, it is common to see the structure of time separated from the repletion of time. One dimension may be the structure of a year, month, quarter, half-year, and year. A separate dimension might be different years: 1996, 1997, and so on. Adding a new year to the MDD simply means adding a new member to the calendar dimension. Adding a new year to a RDBMS usually requires that each month, quarter, halfyear and year also be added. In General Usually, a scaleable, parallel database is used for the large, atomic. organizationallystructured data warehouse, and subsets or summarized data from the warehouse are extracted and replicated to proprietary MDDs. Because MDD vendors have enabled drillthrough features, when a user reaches the limit of what is actually stored in the MDD and seeks more detail data, he/she can drill through to the detail stored in the enterprise database. However, the drill through functionality usually requires creating views for every possible query. As relational database vendors incorporate sophisticated analytical multidimensional features into their core database technology, the resulting capacity for higher performance salability and parallelism will enable more sophisticated analysis. Proprietary database and nonitegrated relational OLAP query tool vendors will find it difficult to compete with this integrated ROLAP solution. Both storage methods have strengths and weaknesses -- the weaknesses, however, are being rapidly addressed by the respective vendors. Currently, data warehouses are predominantly built using RDBMSs. If you have a warehouse built on a relational database and you want to perform OLAP analysis against it, ROLAP is a natural fit. This isnt to say

that MDDs cant be a part of your data warehouse solution. Its just that MDDs arent currently well-suited for large volumes of data (10-50GB is fine, but anything over 50GB is stretching their capabilities). If your really want the functionality benefits that come with MDD, consider subsetting the data into smaller MDD-based data marts. When deciding which technology to go for, consider: 1) Performance: How fast will the system appear to the end-user? MDD server vendors believe this is a key point in their favor. MDD server databases typically contain indexes that provide direct access to the data, making MDD servers quicker when trying to solve a multidimensional business problem. However, MDDs have significant performance differences due to the differing ability of data models to be held in memory, sparsely handling, and use of data compression. And, the relational database vendors argue that they have developed performance improvement techniques, such as IBMs DB2 Starburst optimizer and Red Bricks Warehouse VPT STARindex capabilities. (Before you use performance as an objective measure for selecting an OLAP server, remember that OLAP systems are about effectiveness (how to make better decisions), not efficiency (how to make faster decisions).) 2) Data volume and scalability: While MDD servers can handle up to 50GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes. And, although MDD servers can require up to 50% less disk space than relational databases to store the same amount of data (because of relational indexes and overhead), relational databases have more capacity. MDD advocates believe that you should perform multidimensional modeling on summary, not detail, information, thus mitigating the need for large databases. in addition to performance, data volume, and scalabiltiy, you should consider which architecture better supports systems management and data distribution, which vendors have a better user interface and functionality, which architecture is easier to understand, which architecture better handles aggregation and complex calculations, and your perception of open versus proprietary architectures. Besides these issues, you must also consider which architecture will be a more strategic technology. In fact, MDD servers and RDBMS products can be used together -- one for fast reposes, the other for access to large databases.

What if? IF A. You require write access for What if? analysis B. Your data is under 50 GB C. Your timetable to implement is 60-90 days D. You dont have a DBA or data modeler personnel E. Youre developing a general-purpose application for inventory movement or assets management THEN Consider an MDD solution for your data mart (like Oracle Express, Arbors Essbase, and Pilots Lightship) IF A. Your data is over 100 GB B. You have a "read-only" requirement THEN Consider an RDBMS for your data mart. IF A. Your data is over 1TB B. Need data mining at a detail level Consider an MPP hardware platform like IBMs SP and DB2 RDBMS If, youve decided to build a data mart using a MDD, you dont need a data modeler. Rather, you need an MDD data mart application builder who will design the business model (identifying dimensions and defining business measures based on the source systems identified. Prior to building separate stove pipe data marts, understand that at some point you will need to: 1) integrate and consolidate these data marts at the detail enterprise level; 2) load the MDD data marts; and 3) drill through from the data marts to the detail. Note that your data mart may outgrow the storage limitations an MDD, creating the need for an RDMBS (in turn, requiring data modeling similar to constructing the detailed, atomic enterprise-level RDBMS).

Q.5 what do you understand by the term statistical analysis? Discuss the most important statistical techniques? Data mining is a relatively new data analysis technique. It is very different from query and reporting and multidimensional analysis in that is uses what is called a discovery technique. That is, you do not ask a particular question of the data but rather use specific algorithms that analyze the data and report what they have discovered. Unlike query and reporting and multidimensional analysis where the user has to create and execute queries based on hypotheses, data mining searches for answers to questions that may have not been previously asked. This discovery could take the form of finding significance in relationships between certain data elements, a clustering together of specific data elements, or other patterns in the usage of specific sets of data elements. After finding these patterns, the algorithms can infer rules. These rules can then be used to generate a model that can predict a desired behavior, identify relationships among the data, discover patterns, and group clusters of records with similar attributes. Data mining is most typically used for statistical data analysis and knowledge discovery. Statistical data analysis detects unusual patterns in data and applies statistical and mathematical modeling techniques to explain the patterns. The models are then used to forecast and predict. Types of statistical data analysis techniques include linear and nonlinear analysis, regression analysis, multivariant analysis, and time series analysis. Knowledge discovery extracts implicit, previously unknown information from the data. This often results in uncovering unknown business facts. Data mining is data driven (see Figure 4 on page 13). There is a high level of complexity in stored data and data interrelations in the data warehouse that are difficult to discover without data mining. Data mining offers new insights into the business that may not be discovered with query and reporting or multidimensional analysis. Data mining can help discover new insights about the business by giving us answers to questions we might never have thought to ask. Even within the scope of your data warehouse project, when mining data you want to define a data scope, or possibly multiple data scopes. Because patterns are based on various forms of statistical analysis, you must define a scope in which a statistically significant

pattern is likely to emerge. For example, buying patterns that show different products being purchased together may differ greatly in different geographical locations. To simply lump all of the data together may hide all of the patterns that exist in each location. Of course, by imposing such a scope you are defining some, though not all, of the business rules. It is therefore important that data scoping be done in concert with someone knowledgeable in both the business and in statistical analysis so that artificial patterns are not imposed and real
patterns are not lost.

Data architecture modeling and advanced modeling techniques such as those suitable for multimedia databases and statistical databases are beyond the scope Q.6 what are the methods for determining the executive needs? An EIS is a tool that provides direct on-line access to relevant information about aspects of a business that are of particular interest to the senior manager. Contents of EIS A general answer to the question of what data is appropriate for inclusion in an Executive Information System is "whatever is interesting to executives." While this advice is rather simplistic, it does reflect the variety of systems currently in use. Executive Information Systems in government have been constructed to track data about Ministerial correspondence, case management, worker productivity, finances, and human resources to name only a few. Other sectors use EIS implementations to monitor information about competitors in the news media and databases of public information in addition to the traditional revenue, cost, volume, sales, market share and quality applications. Frequently, EIS implementations begin with just a few measures that are clearly of interest to senior managers, and then expand in response to questions asked by those managers as they use the system. Over time, the presentation of this information becomes stale, and the information diverges from what is strategically important for the organization. A "Critical Success Factors" approach is recommended by many management theorists (Daniel, 1961, Crockett, 1992, Watson and Frolick, 1992). Practitioners such as Vandenbosch (1993) found that:

While our efforts usually met with initial success, we often found that after six months to a year, executives were almost as bored with the new information as they had been with the old. A strategy we developed to rectify this problem required organizations to create a report of the month. That is, in addition to the regular information provided for management committee meetings, the CEO was charged with selecting a different indicator to focus on each month (Vandenbosch, 1993, pp. 8-9). While the above indicates that selection of data for inclusion in an EIS is difficult, there are several guidelines that help to make that assessment. A practical set of principles to guide the design of measures and indicators to be included in an EIS is presented below (Kelly, 1992b). For a more detailed discussion of methods for selecting measures that reflect organizational objectives, see the section "EIS and Organizational Objectives." EIS measures must be easy to understand and collect. Wherever possible, data should be collected naturally as part of the process of work. An EIS should not add substantially to the workload of managers or staff. EIS measures must be based on a balanced view of the organization's objective. Data in the system should reflect the objectives of the organization in the areas of productivity, resource management, quality and customer service. Performance indicators in an EIS must reflect everyone's contribution in a fair and consistent manner. Indicators should be as independent as possible from variables outside the control of managers. EIS measures must encourage management and staff to share ownership of the organization's objectives. Performance indicators must promote both team-work and friendly competition. Measures will be meaningful for all staff; people must feel that they, as individuals, can contribute to improving the performance of the organization. EIS information must be available to everyone in the organization. The objective is to provide everyone with useful information about the organization's performance. Information that must remain confidential should not be part of the EIS or the management system of the organization.

EIS measures must evolve to meet the changing needs of the organization. Barriers to Effectiveness There are many ways in which an EIS can fail. Dozens of high profile, high cost EIS projects have been cancelled, implemented and rarely used, or implemented and used with negative results. An EIS is a high risk project precisely because it is intended for use by the most powerful people in an organization. Senior managers can easily misuse the information in the system with strongly detrimental effects on the organization. Senior managers can refuse to use a system if it does not respond to their immediate personal needs or is too difficult to learn and use.

Unproductive Organizational Behaviour Norms Issues of organizational behaviour and culture are perhaps the most deadly barriers to effective Executive Information Systems. Because an EIS is typically positioned at the top of an organization, it can create powerful learning experiences and lead to drastic changes in organizational direction. However, there is also great potential for misuse of the information. Green, Higgins and Irving (1988) found that performance monitoring can promote bureaucratic and unproductive behaviour, can unduly focus organizational attention to the point where other important aspects are ignored, and can have a strongly negative impact on morale. Technical Excellence An interesting result from the Vandenbosch & Huff (1988) study was that the technical excellence of an EIS has an inverse relationship with effectiveness. Systems that are technical masterpieces tend to be inflexible, and thus discourage innovation, experimentation and mental model development. Flexibility is important because an EIS has such a powerful ability to direct attention to specific issues in an organization. A technical masterpiece may accurately

direct management attention when the system is first implemented, but continue to direct attention to issues that were important a year ago on its first anniversary. There is substantial danger that the exploration of issues necessary for managerial learning will be limited to those subjects that were important when the EIS was first developed. Managers must understand that as the organization and its work changes, an EIS must continually be updated to address the strategic issues of the day. A number of explanations as to why technical masterpieces tend to be less flexible are possible. Developers who create a masterpiece EIS may become attached to the system and consciously or unconsciously dissuade managers from asking for changes. Managers who are uncertain that the benefits outweigh the initial cost of a masterpiece EIS may not want to spend more on system maintenance and improvements. The time required to create a masterpiece EIS may mean that it is outdated before it is implemented.

While usability and response time are important factors in determining whether executives will use a system, cost and flexibility are paramount. A senior manager will be more accepting of an inexpensive system that provides 20% of the needed information within a month or two than with an expensive system that provides 80% of the needed information after a year of development. The manager may also find that the inexpensive system is easier to change and adapt to the evolving needs of the business. Changing a large system would involve throwing away parts of a substantial investment. Changing the inexpensive system means losing a few weeks of work. As a result, fast, cheap, incremental approaches to developing an EIS increase the chance of success. Methodology Implementation of an effective EIS requires clear consensus on the objectives and measures to be monitored in the system and a plan for obtaining the data on which those measures are based. The sections below outline a methodology for achieving these two results. As noted earlier, successful EIS implementations generally begin with a simple prototype rather than a detailed planning process. For that reason, the proposed planning methodologies are as simple and scope-limited as possible.

EIS Project Team The process of establishing organizational objectives and measures is intimately linked with the task of locating relevant data in existing computer systems to support those measures. Objectives must be specific and measurable, and data availability is critical to measuring progress against objectives. Since there is little use in defining measures for which data is not available, it is recommended that an EIS project team including technical staff be established at the outset. This cross-functional team can provide early warning if data is not available to support objectives or if senior manager's expectations for the system are impractical. A preliminary EIS project team might consist of as few as three people. An EIS Project Leader organizes and directs the project. An Executive Sponsor promotes the project in the organization, contributes senior management requirements on behalf of the senior management team, and reviews project progress regularly. A Technical Leader participates in requirements gathering, reviewing plans, and ensuring technical feasibility of all proposals during EIS definition. As the focus of the project becomes more technical, the EIS project team may be complemented by additional technical staff who will be directly involved in extracting data from legacy systems and constructing the EIS data repository and user interface. Establishing Measures & EIS Requirements Most organizations have a number of high-level objectives and direction statements that help to shape organizational behaviour and priorities. In many cases, however, these direction statements have not yet been linked to performance measures and targets. As well, senior managers may have other critical information requirements that would not be reflected in a simple analysis of existing direction statements. Therefore it is essential that EIS requirements be derived directly from interaction with the senior managers who will use the systems. It is also essential that practical measures of progress towards organizational objectives be established during these interactions.

Measures and EIS requirements are best established through a three-stage process. First, the EIS team solicits the input of the most senior executives in the organization in order to establish a broad, top-down perspective on EIS requirements. Second, interviews are conducted with the managers who will be most directly involved in the collection, analysis, and monitoring of data in the system to assess bottom-up requirements. Third, a summary of results and recommendations is presented to senior executives and operational managers in a workshop where final decisions are made. Interview Format The focus of the interviews would be to establish all of the measures managers require in the EIS. Questions would include the following: What are the five most important pieces of information you need to do your job? What expectations does the Board of Directors have for you? What results do you think the general public expects you to accomplish? On what basis would consumers and customers judge your effectiveness? What expectations do other stakeholders impose on you? What is it that you have to accomplish in your current position? Senior Management Workshop