Académique Documents
Professionnel Documents
Culture Documents
A data warehouse is a relational database that is designed for query and business analysis
rather than for transaction processing. It contains historical data derived from transaction data.
This historical data is used by the business analysts to understand about the business in detail.
A data warehouse should have the following characteristics
Subject oriented: A data that gives information about particular subject. For example, to know
about a company's sales, a data warehouse needs to build on sales data. Using this data
warehouse we can find the last year sales. This ability to define a data warehouse by subject
(sales) makes it a subject oriented. For example, "sales" can be a particular subject.
Integrated: Bringing data from different sources and putting them in to a consistent format. This
includes resolving the units of measures, naming conflicts etc.
data warehouse integrates data from multiple data sources. For example, source A and source B
may have different ways of identifying a product, but in a data warehouse, there will be only a
single way of identifying a product.
Non-volatile: Once the data enters into the data warehouse, the data should not be updated.
Once data is in the data warehouse, it will not change. So, historical data in a data warehouse
should never be altered.
Time variant: all data in DW is identified with particular time period.
To analyze the business, analysts need large amounts of data. So, the data warehouse should
contain historical data.
Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6
months, 12 months, or even older data from a data warehouse. This contrasts with a transactions
system, where often only the most recent data is kept. For example, a transaction system may
hold the most recent address of a customer, where a data warehouse can hold all addresses
associated with a customer.
the grain of a fact table defines the level of detail that is stored, and which dimensions are
included make up this grain
the snowflake schema query is more complex than Star. Because the dimension tables are
normalized, we need to dig deeper to get the name of the product type and the city. We
have to add another JOIN for every new level inside the same dimension.
Galaxy Schema:
Galaxy schema contains many fact tables with some common dimensions (conformed
dimensions). This schema is a combination of many data marts.
The fact table contains business facts (or measures), and foreign keys which refer to
candidate keys (normally primary keys) in the dimension tables. Contrary to fact
tables, dimension tables contain descriptive attributes (or fields) that are typically textual
fields (or discrete numbers that behave like text).
Dimension tables are used to describe dimensions; they contain dimension keys, values and
attributes. For example, the time dimension would contain every hour, day, week, month, quarter
and year that has occurred since you started your business operations. Product dimension could
contain a name and description of products you sell, their unit price, color, weight and other
attributes as applicable. Dimension tables are typically small, ranging from a few to several
thousand rows. Occasionally dimensions can grow fairly large.
Although there might be other attributes that you store in the relational database, data
warehouses might not need all of those attributes. For example, customer telephone numbers,
email addresses and other contact information would not be necessary for the warehouse. Keep
in mind that data warehouses are used to make strategic decisions by analyzing trends. It is not
meant to be a tool for daily business operations. On the other hand, you might have some reports
that do include data elements that aren't necessary for data analysis.
Fact tables contain keys to dimension tables as well as measurable facts that data analysts
would want to examine. For example, a store selling automotive parts might have a fact table
recording a sale of each item. The fact table of an educational entity could track credit hours
awarded to students. A bakery could have a fact table that records manufacturing of various
baked goods.
Fact tables can grow very large, with millions or even billions of rows. It is important to identify
the lowest level of facts that makes sense to analyze for your business this is often referred to as
fact table "grain". For instance, for a healthcare billing company it might be sufficient to track
revenues by month; daily and hourly data might not exist or might not be relevant.