Vous êtes sur la page 1sur 4

Data Warehousing Mid-Term Answers (Tentative). 1. (a.)Describe STAR Schema?

The entity-relationship data model is commonly used in the design of relational databases, where a database schema consists of a set of entities and the relationships between them. Such a data model is appropriate for on-line transaction processing. A data warehouse, however, requires a concise, subject-oriented schema that facilitates on-line data analysis. The most popular data model for a data warehouse is a multidimensional model. Such a model can exist in the form of a star schema, a snowflake schema, or a fact constellation schema. Star Schema: The most common modeling paradigm is the star schema, in which the data warehouse contains (1) a large central table (fact table) containing the bulk of the data, with no redundancy, and (2) a set of smaller attendant tables (dimension tables), one for each dimension. The schema graph resembles a starburst, with the dimension tables displayed in a radial pattern around the central fact table. Example: A star schema for All Electronics sales is shown in Figure 3.4. Sales are considered along four dimensions, namely, time, item, branch, and location. The schema contains a central fact table for sales that contains keys to each of the four dimensions, along with two measures: dollars sold and units sold. To minimize the size of the fact table, dimension identifiers (such as time key and item key) are system-generated identifiers.

(b.) Multiple Approaches for Data integration? ETL (Extract-Transform-Load): typically performed in an application server tier. Other flavors include: (Extract-Load-Transform): With MPP database engines, an alternative is to do the transformation in the MPP DW (Extract-Transform-Load-Transform): do some processing in the application server tier and the rest is done in the DW

EII (Enterprise Information Integration): optimized & transparent data access and transformation layer providing a single relational interface across all enterprise data (structured, semi-structured, and unstructured) EAI (Enterprise Application Integration): message-based and transactional to integrate at both the business process and data levels including application-2-application integration 2. How many types of Schemas? Draw an essential schema using star schema and write and sql statement using that schema. There are three types of Schemas in Data Warehousing. They are : 1. Star Schema. 2. Snowflake Schema 3. Fact Constellation Schema 1. Star Schema: The most common modeling paradigm is the star schema, in which the data warehouse contains (1) a large central table (fact table) containing the bulk of the data, with no redundancy, and (2) a set of smaller attendant tables (dimension tables), one for each dimension. The schema graph resembles a starburst, with the dimension tables displayed in a radial pattern around the central fact table. 2. Snowflake Schema: The snowflake schema is a variant of the star schema model, where some dimension tables are normalized, thereby further splitting the data into additional tables. The resulting schema graph forms a shape similar to a snowflake. 3. Fact Constellation Schema: Sophisticated applications may require multiple fact tables to share dimension tables. This kind of schema can be viewed as a collection of stars, and hence is called a galaxy schema or a fact constellation.

Star Schema Diagram: Consider a database of sales, perhaps from a store chain, classified by date, store and product.
Fact_Sales is the fact and Dim_Product.

table and there are three dimension tables Dim_Date, Dim_Store

Each dimension table has a primary key on its Id column, relating to one of the columns (viewed as rows in the example schema) of the Fact_Sales table's three-column (compound) primary key (Date_Id, Store_Id, Product_Id). The non-primary key Units_Sold column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim_Date dimension).

Query for the Star Schema:


SELECT P.Brand, S.Country, SUM(F.Units_Sold) FROM Fact_Sales F INNER JOIN Dim_Date D ON F.Date_Id = D.Id INNER JOIN Dim_Store S ON F.Store_Id = S.Id INNER JOIN Dim_Product P ON F.Product_Id = P.Id

WHERE D.YEAR = 1997 AND P.Product_Category = 'tv' GROUP BY P.Brand, S.Country


4) Given a table of transactions, questions are given related to support level of association rule and confidence level.

5) Table with 5 transactions is given and also, Mininum support level and confidence level are given, then using i)Apriori algorithm and ii)FP growth algorithm, find all frequent itemsets and compare efficiency of two algorithms.

Vous aimerez peut-être aussi