Vous êtes sur la page 1sur 7

Overview Let us take a simple STAR schema designed for order analysis.

Assume this to be the schema for a manufacturing company and that the marketing department is interested in determining how they are doing with the orders received by the company. Figure shows this simple STAR schema. It consists of the orders fact table shown in the middle of the schema diagram. Surrounding the fact table are the four dimension tables of customer, salesperson, order date, and product. Let us begin to examine this STAR schema. Look at the structure from the point of view of the marketing department. The users in this department will analyze the orders using dollar amounts, cost, profit margin, and sold quantity. This information is found in the fact table of the structure. The users will analyze these measurements by breaking down the numbers in combinations by customer, salesperson, date, and product. All these dimensions along which the users will analyze are found in the structure. The STAR schema structure is a structure that can be easily understood by the users and with which they can comfortably work. The structure mirrors how the users normally view their critical measures along their business dimensions. When you look at the order dollars, the STAR schema structure intuitively answers the questions of what, when, by whom, and to whom. From the STAR schema, the users can easily visualize the answers to these questions: For a given amount of dollars, what was the product sold? Who was the customer? Which salesperson brought the order? When was the order placed?

STAR Schema for order analysis The data warehouse stores all the atomic data necessary to capture all-important historic details. The data mart provides a mechanism to take a slice of this data for easy analysis by the end user. Each data mart usually takes a portion of the data warehouse and is designed to handle the needs of a specific department or part of the enterprise. A data warehouse is generally developed one subject area at a time. Here we will discuss possible star schema structures for a departmental warehouse containing the subject area of sales analysis to be transformed.

The purpose of the designs is to provide an example of a department-specific data warehouse (or data mart) that could be developed from the data warehouse. Each enterprise may choose to modify this sales analysis departmental warehouse to meet its own specific needs. The models are presented in a standard star schema format to allow for multidimensional analysis. A star schema is a database design that contains a central table, called a fact table, with relationships to several look-up tables called dimensions. When the schema is diagrammed, it often forms a pattern resembling a star, thus the name star schema. There may be several variations on the star schema design. Depending on the business questions that need to be answered, the star schema design may vary. Sales Rep Performance Data Mart An example of a sales analysis data mart that may have very specific goals is one to measure salesperson performance. Take, for example, the need to analyse the performance of sales reps on a monthly basis. This may be done by the previously mentioned regional sales manager or, in other enterprises, by a human resources manager. These managers may need to evaluate the performance of the sales reps that report to them or develop and monitor sales incentive plans. They generally do not care what products are being sold, nor do they care about customer demographics. Their concern is sales performance and perhaps how much is sold to various customers to determine how diverse the sales reps market is. To answer these needs, the table SALES_REP_SALES was designed. This table provides presummarized data by sales rep, customer, and address, by month. For example, suppose the questions that are needed are the following: What was the sales volume for a specific sales rep over the last 12 months? Which customer bought the highest volume through a particular sales rep for a specific month? What was the distribution of sales across a sales reps customers? How much does each sales rep sell across each of his or her assigned states?

Sales rep performance To answer these questions, the model in Figure above may be the best sales analysis star schema for this application. It may be in addition to the previous star schema because a more summarized version of the star schema may be needed for quicker access. Another reason is that different departments may have different needs regarding access to the data, so there may be several different data mart designs to meet each departments needs. They all, however, should extract data from the same company-wide data warehouse, so that one consistent integrated source of decision support data can feed many data mart designs. Customer Rep Sales Fact Notice that there is no longer a PRODUCT dimension table because the table CUSTOMER_REP_SALES is a summarization of data about the performance of the sales reps, regardless of the individual products. Customer demographics are also deemed unimportant, and the CUSTOMER_DEMOGRAPHICS dimension has also been dropped. This table then represents a slightly higher level of summarization than the previous ones because two of the dimensions have been eliminated. The measures quantity and product cost have also been dropped because their values are dependent on product and cannot be summarized correctly in this context. Time Dimension In this star schema, there is a slight variation in the time dimension table. It is now summarized by month. When the sales manager was interviewed by the IS department, it became clear that daily information was not required to answer the questions posed; however, a monthly view of the data would be most useful. Therefore, the fact table has a column (month_id) that contains the ID to uniquely identify a specific month within the enterprises fiscal year. This is matched in the time dimension table.

Because the data is to be summarized only by month, the time dimension needs only the columns month, quarter, and fiscal_year. The column week is no longer needed as it makes no sense in a monthly summary view of the data. Another point to note is that summarizing data to the monthly level also represents another higher level of summarization. Thus, this table can provide the DSS analyst or department manager with another very flexible means of viewing the data. Product Analysis Data Mart Suppose a product analyst for ABC Corporation needs information to assess product performance. The information will be used to make strategic decisions on product offerings for various geographic areas. Interviews with the analyst determine that specific customer, customer address, or customer demographic information is unimportant for this type of analysis. It is also determined that monthly summaries will provide an appropriate level of granularity. Figure shows a schema containing PRODUCT_SALES as the central fact table. This is considered more highly summarized because it contains fewer dimensions thanthe previously discussed tables and because records are summarized by month. Thus, this table can be used to hold presummarized data for product sales by geographic area by month. While previous tables also included customer and sales rep information, this one does not (as the analysis indicated it was not required). Thus, the columnssales_rep_id, customer_id, and address_id are not included in this table. The only dimensions tables needed in this schema are GEOGRAPHIC_BOUNDARIES, PRODUCTS, and TME_BY_MONTH.

Product analysis star schema Product Sales Facts The measures in this table are again quantity, gross_sales, and product, cost. Data in this table could be created by summing all the information in CUSTOMER_INVOICES by product_id, city_name (from the ADDRESSES dimension), month, and year. The cities selected would be referenced by a new column, geo_id. As in the previous examples, this data could also be taken directly from the main warehouse by selecting and summing data from the data warehouse CUSTOMER_INVOICES table for the products of interest. An additional restriction on the data extracted would need to be made through a join to CUSTOMER_ADDRESSES with a summarization based on thecity name via the column geo_id. Geographic Boundaries Dimension What is this geo_id? In the case of this warehouse, these IDs are tied to city names, so the data can be summed to the level of city-related data. The table GEOGRAPHIC_BOUNDARIES contains a hierarchy of geographic areasnamely, cities, states, and countries. By using this dimension, an analyst could gather data for allthe cities within a selected state or country. Additionally, data could be selected for multiple states or countries.

Thus, for each product for a city, all the sales dollars and quantities for that product would be added up to give a grand total by product and city. Once compiled, this data could be very useful for product analysts and other executives who need a quick, high-level view of how their products are performing with respect to the various geographic areas. Even getting a view of total sales by product would be quick using this table because there are far fewer rows to sum. Questions that could be answered from this data include the following: What was the sales revenue from a specific product over the last 12 months? What was the highest volume of a product for a specific month? Which product had the greatest revenue across all sales? Within that product, which geographic region had the greatest revenue? What was the profitability for each product or product category in a certain country for a specific year? Which country has generated the greatest average annual revenue by product? This schema presented details of the design of sample star schemas built to support sales analysis. The movement of data from the enterprise data warehouse to a data mart was discussed. The resulting design contained various levels of granularity to assist in answering the questions posed. The models presented could be effectively used to support DSS, OLAP, and multidimensional analysis. Although there might be other attributes that you store in the relational database, data warehouses might not need all of those attributes. For example, customer telephone numbers, email addresses and other contact information would not be necessary for the warehouse. Keep in mind that data warehouses are used to make strategic decisions by analyzing trends. It is not meant to be a tool for daily business operations. On the other hand, you might have some reports that do include data elements that aren't necessary for data analysis. Most data warehouses will have one or multiple time dimensions. Since the warehouse will be used for finding and examining trends, data analysts will need to know when each fact has occurred. The most common time dimension is calendar time. However, your business might also need a fiscal time dimension in case your fiscal year does not start on January 1st as the calendar year. It should again be noted that what has been presented is one of many possible designs that could result from building departmental warehouses based on the enterprise warehouse. The structure of a schema will be influenced greatly by the questions it is designed to answer. With a thorough end-user interview process, the resulting design should provide the departmental analyst with useful information. When this is not the case, more questions must be asked and another design developed. Again, this is why building a data warehouse or data mart must be an iterative process. As may be obvious by the discussion so far, even though getting the data into the enterprise data warehouse may have been difficult, once in place, the warehouse provides an excellent basis for developing departmental data marts (star schemas). All the major transformation and integration work has already been done and documented. To state again, this is why a properly designed data warehouse can be of such incredible benefit to executives and analysts in an enterprise. It allows them to get data extracts more easily for viewing trends in ways that were not possible before without a substantial amount of time and effort on the part of the IS staff. With the data prearranged as

described, the amount of time and system resources needed to produce various data marts can be reduced. The accuracy of data is also increased because there is an integrated source (from the data warehouse data model) allowing for consistent decision support information that may be useful in many departmental data warehouses.

Vous aimerez peut-être aussi