Vous êtes sur la page 1sur 11

Fact table

In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is
often located at the centre of a star schema or a snowflake schema, surrounded by dimension tables.
Fact tables provide the (usually) additive values that act as independent variables by which dimensional
attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the
most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as
"Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by
a day, product and store. Other dimensions might be members of this fact table (such as location/region)
but these add nothing to the uniqueness of the fact records. These "affiliate dimensions" allow for
additional slices of the independent facts but generally provide insights at a higher level of aggregation (a
region contains many stores).
Example
If the business process is SALES, then the corresponding fact table will typically contain columns
representing both raw facts and aggregations in rows such as:
• $12,000, being "sales for New York store for 15-Jan-2005"
• $34,000, being "sales for Los Angeles store for 15-Jan-2005"
• $22,000, being "sales for New York store for 16-Jan-2005"
• $50,000, being "sales for Los Angeles store for 16-Jan-2005"
• $21,000, being "average daily sales for Los Angeles Store for Jan-2005"
• $65,000, being "average daily sales for Los Angeles Store for Feb-2005"
• $33,000, being "average daily sales for Los Angeles Store for year 2005"
"average monthly sales" is a measurement which is stored in the fact table. The fact table also contains
foreign keys from the dimension tables, where time series (e.g. dates) and other dimensions (e.g. store
location, salesperson, product) are stored.
All foreign keys between fact and dimension tables should be surrogate keys, not reused keys from
operational data.
The centralized table in a star schema is called a fact table. A fact table typically has two types of
columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a
fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the
content of the data warehouse and store different types of measures like additive, non additive, and semi
additive measures.
Measure types
• Additive - Measures that can be added across all dimensions.
• Non Additive - Measures that cannot be added across all dimensions.
• Semi Additive - Measures that can be added across few dimensions and not with others.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables that
contain aggregated facts are often instead called summary tables).
Special care must be taken when handling ratios and percentage. One good design rule[1] is to never store
percentages or ratios in fact tables but only calculate these in the data access tool. Thus only store the
numerator and denominator in the fact table, which then can be aggregated and the aggregated stored
values can then be used for calculating the ratio or percentage in the data access tool.
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are
called "factless fact tables", or "junction tables".
The "Factless fact tables" can for example be used for modeling many-to-many relationships or capture
events[1]
Types of fact tables
There are basically three fundamental measurement events, which characterizes all fact tables.[2]
• Transactional
A transactional table is the most basic and fundamental. The grain associated with a transactional
fact table is usually specified as "one row per line in a transaction", e.g., every line on a receipt.
Typically a transactional fact table holds data of the most detailed level, causing it to have a great
number of dimensions associated with it.
• Periodic snapshots
The periodic snapshot, as the name implies, takes a "picture of the moment", where the moment
could be any defined period of time, e.g. a performance summary of a salesman over the previous
month. A periodic snapshot table is dependent on the transactional table, as it needs the detailed
data held in the transactional fact table in order to deliver the chosen performance output.
• Accumulating snapshots
This type of fact table is used to show the activity of a process that has a well-defined beginning
and end, e.g., the processing of an order. An order moves through specific steps until it is fully
processed. As steps towards fulfilling the order are completed, the associated row in the fact table
is updated. An accumulating snapshot table often has multiple date columns, each representing a
milestone in the process. Therefore, it's important to have an entry in the associated date dimension
that represents an unknown date, as many of the milestone dates are unknown at the time of the
creation of the row.
Steps in designing fact table
• Identify a business process for analysis (like sales).
• Identify measures or facts (sales dollar), by asking questions like what ‘number of’ XX are
relevant for the business process (Replace the XX, and test if the question makes sense business
wise).
• Identify dimensions for facts (product dimension, location dimension, time dimension,
organization dimension), by asking questions which makes sense business wise, like 'Analyse by'
XX, where XX are replaced with the subject to test.
• List the columns that describe each dimension (region name, branch name, business unit name).
• Determine the lowest level (granularity) of summary in a fact table (e.g. sales dollar).
Or utilize the four step design process described in Kimball[1]

Measure (data warehouse)


From Wikipedia, the free encyclopedia
Jump to: navigation, search
For other senses of this word, see Measure (disambiguation).
In a data warehouse, a measure is a property on which calculations (e.g., sum, count, average, minimum,
maximum) can be made using precomputed aggregates.
Example
For example if a retail store sold a specific product, the quantity and prices of each item sold could be
added or averaged to find the total number of items sold and total or average price of the goods.

Early-arriving fact
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In the Data Warehouse practice of ETL, an early fact or early-arriving fact [1] denotes the detection of a
dimensional natural key during fact table source loading, prior to the assignment of a corresponding
primary key or surrogate key in the dimension table. Hence, the fact which cites the dimension arrives
early, relative to the definition of the dimension value.
Procedurally, an early fact can be treated as:
• an error, on the presumption that the dimensional attribute values should have been collected
before fact source loading.
• a valid fact, whose collection pauses whilst the missing dimensional attribute value itself is
collected.
• a valid fact, where a primary key value is generated on the dimension with no attributes(stub /
dummy row), the fact completes processing, and the dimension attributes are populated
(overwritten) later in the load processing on the new row.

Dimension table
In data warehousing, a dimension table is one of the set of companion tables to a fact table.
The fact table contains business facts or measures and foreign keys which refer to candidate keys
(normally primary keys) in the dimension tables.
Contrary to fact tables, the dimension tables contain descriptive attributes (or fields) which are typically
textual fields or discrete numbers that behave like text. These attributes are designed to serve two critical
purposes: query constraining/filtering and query result set labeling.
Dimension attributes are supposed to be:
• Verbose - labels consisting of full words,
• Descriptive,
• Complete - no missing values,
• Discretely valued - only one value per row in dimensional table,
• Quality assured - no misspelling, no impossible values.
Dimension table rows are uniquely identified by a single key field. It is recommended that the key field is
a simple integer for the reason that key value is meaningless and is only used to be join fields between the
fact and dimension tables.
The usage of surrogate dimension keys brings several advantages among:
• Performance - join processing is much more efficient if a single field surrogate key is used,
• Buffer from operational key management practices - prevents form situations when removed data
rows might reappear when their natural keys might be reused or reassigned after a long period of
dormancy,
• Mapping to integrate disparate sources,
• Handle unknown or not applicable connections,
• Track changes in dimension attribute values.
Usage of surrogate keys also brings an additional costs due the burden put on the ETL system. Still,
pipeline processing can be improved and ETL tools have built-in improved surrogate key processing.
The goal of dimension table is to create standardized conformed dimensions which can be shared across
the enterprise's data warehouse environment and joining to multiple fact table representing various
business processes.
Conformed dimensions are highly important to enterprise nature of DW/BI system for following reasons:
• Consistency - every fact table is filtered consistently, result query answer are labeled consistently,
• Integration - queries are able to drill different processes fact tables separately for each individual
fact table and then join the results on common dimension attributes,
• Reduced development time to market - the common dimensions are available without recreating
the wheel over again.
Over time, the attributes of a given row in a dimension table may change. For example, the shipping
address for a company may change. Kimball refers to this phenomenon as Slowly Changing Dimensions.
Strategies for dealing with this kind of change are divided into three categories:
• Type One - Simply overwrite the old value(s).
• Type Two - Add a new row containing the new value(s), and distinguish between the rows using
Tuple-versioning techniques.
• Type Three - Add a new attribute to the existing row.

Degenerate dimension
According to Ralph Kimball [1], in a data warehouse, a degenerate dimension is a dimension which is
derived from the fact table and doesn't have its own dimension table. Degenerate dimensions are often
used when a fact table's grain represents transactional level data and one wishes to maintain system
specific identifiers such as order numbers, invoice numbers(Bill No) and the like without forcing their
inclusion in their own dimension. The decision to use degenerate dimensions is often based on the desire
to provide a direct reference back to a transactional system without the overhead of maintaining a separate
dimension table.
Other Definitions
While many sources agree with Kimball, other definitions of degenerate dimension can be found both
online and in other reference works:
• "A degenerate dimension is data that is dimensional in nature but stored in a fact table." [2]
• "Any values in the fact table that don’t join to dimensions are either considered degenerate
dimensions or measures." [3]
• "A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a
separate dimension table." [4]
• "A degenerate dimension acts as a dimension key in the fact table but does not join a
corresponding dimension table because all its interesting attributes have already been placed in
other analytic dimensions." [5]

Slowly changing dimension


Dimension is a term in data management and data warehousing that refers to logical groupings of data
such as geographical location, customer information, or product information. Slowly Changing
Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-
based, regular schedule.[1]
For example, you may have a dimension in your database that tracks the sales records of your company's
salespeople. Creating sales reports seems simple enough, until a salesperson is transferred from one
regional office to another. How do you record such a change in your sales dimension?
You could sum or average the sales by salesperson, but if you use that to compare the performance of
salesmen, that might give misleading information. If the salesperson that was transferred used to work in a
hot market where sales were easy, and now works in a market where sales are infrequent, her totals will
look much stronger than the other salespeople in her new region, even if they are just as good. Or you
could create a second salesperson record and treat the transferred person as a new sales person, but that
creates problems also.
Dealing with these issues involves SCD management methodologies referred to as Type 0 through 6. Type
6 SCDs are also sometimes called Hybrid SCDs.
Type 0
The Type 0 method is a passive approach to managing dimension value changes, in which no action is
taken. Values remain as they were at the time the dimension record was first entered. In certain
circumstances historical preservation with a Type 0 SCD may occur. But, higher order SCD types are
often employed to guarantee history preservation, whereas Type 0 provides the least control or no control
over managing a slowly changing dimension.
The most common slowly changing dimensions are Types 1, 2, and 3.
[edit] Type 1
The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at
all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name.
(Assuming you won't ever need to know how it used to be misspelled in the past.)
Here is an example of a database table that keeps supplier information:
Supplier_ Supplier_C Supplier_N Supplier_S
Key ode ame tate
Acme Supply
123 ABC CA
Co
In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the
surrogate key is not necessary, since the table will be unique by the natural key (Supplier_Code).
However, the joins will perform better on an integer than on a character string.
Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply
overwrite this record:
Supplier_ Supplier_C Supplier_N Supplier_S
Key ode ame tate
Acme Supply
123 ABC IL
Co
The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the
data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example. But an
advantage to Type 1 SCDs is that they are very easy to maintain.
If you have calculated an aggregate table summarizing facts by state, it will need to be recalculated when
the Supplier_State is changed.[1]
Type 2
The Type 2 method tracks historical data by creating multiple records for a given natural key in the
dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we have
unlimited history preservation as a new record is inserted each time a change is made.
In the same example, if the supplier moves to Illinois, the table could look like this, with incremented
version numbers to indicate the sequence of changes:

Supplier_ Supplier_C Supplier_N Supplier_S Versi


Key ode ame tate on
Acme Supply
123 ABC CA 0
Co
Acme Supply
124 ABC IL 1
Co
Another popular method for tuple versioning is to add effective date columns.
Supplier_ Supplier_C Supplier_N Supplier_S Start_Da
End_Date
Key ode ame tate te
Acme Supply 01-Jan- 21-Dec-
123 ABC CA
Co 2000 2004
Acme Supply 22-Dec-
124 ABC IL
Co 2004
The null End_Date in row two indicates the current tuple version. In some cases, a standardized surrogate
high date (e.g. 9999-12-31) may be used as an end date, so that the field can be included in an index, and
so that null-value substitution is not required when querying.
Transactions that reference a particular surrogate key (Supplier_Key) are then permanently bound to the
time slices defined by that row of the slowly changing dimension table. An aggregate table summarizing
facts by state continues to reflect the historical state, i.e. the state the supplier was in at the time of the
transaction; no update is needed.
If there are retrospective changes made to the contents of the dimension, or if new attributes are added to
the dimension (for example a Sales_Rep column) which have different effective dates from those already
defined, then this can result in the existing transactions needing to be updated to reflect the new situation.
This can be an expensive database operation, so Type 2 SCDs are not a good choice if the dimensional
model is subject to change.[1]
Type 3
The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history
preservation, Type 3 has limited history preservation, as it's limited to the number of columns we
designate for storing historical data. Where the original table structure in Type 1 and Type 2 was very
similar, Type 3 will add additional columns to the tables:
Supplier_ Supplier_C Supplier_N Original_Supplier Effective_ Current_Supplier
Key ode ame _State Date _State
Acme Supply 22-Dec-
123 ABC CA IL
Co 2004
Note that this record can not track all historical changes, such as when a supplier moves twice.
One version of this type is to create the field Previous_Supplier_State instead of Original_Supplier_State
which will then track the most recent historical change.[1]
Type 4
The Type 4 method is usually referred to as using "history tables", where one table keeps the current data,
and an additional table is used to keep a record of some or all changes.
Following the example above, the original table might be called Supplier and the history table might be
called Supplier_History.
Supplier
Supplier_ Supplier_C Supplier_N Supplier_S
key ode ame tate
Acme Supply
123 ABC IL
Co

Supplier_History
Supplier_ Supplier_C Supplier_N Supplier_S Create_D
key ode ame tate ate
Acme Supply 22-Dec-
123 ABC CA
Co 2004
This method resembles how database audit tables and change data capture techniques function.
Type 6 / Hybrid
The Type 6 method combines the approaches of types 1, 2 and 3 (1 + 2 + 3 = 6). One possible explanation
of the origin of the term was that it was coined by Ralph Kimball during a conversation with Stephen Pace
from Kalido. Ralph Kimball calls this method "Unpredictable Changes with Single-Version Overlay" in
The Data Warehouse Toolkit[1].
The Supplier table starts out with one record for our example supplier:
Supplier_ Supplier_C Supplier_N Current_S Historical_S Start_D End_Da Current_F
Key ode ame tate tate ate te lag
Acme Supply 01-Jan- 31-Dec-
123 ABC CA CA Y
Co 2000 9999
The Current_State and the Historical_State are the same. The Current_Flag attribute indicates that this is
the current or most recent record for this supplier.
When Acme Supply Company moves to Illinois, we add a new record, as in Type 2 processing:
Supplier_ Supplier_C Supplier_N Current_S Historical_S Start_D End_Da Current_F
Key ode ame tate tate ate te lag
Acme Supply 01-Jan- 21-Dec-
123 ABC IL CA N
Co 2000 2004
Acme Supply 22-Dec- 31-Dec-
124 ABC IL IL Y
Co 2004 9999
We overwrite the Current_State information in the first record (Supplier_Key = 123) with the new
information, as in Type 1 processing. We create a new record to track the changes, as in Type 2
processing. And we store the history in a second State column (Historical_State), which incorporates Type
3 processing.
If our example supplier company were to relocate again, we would add another record to the Supplier
dimension, and we would once again overwrite the contents of the Current_State column:
Supplier_ Supplier_C Supplier_N Current_S Historical_S Start_D End_Da Current_F
Key ode ame tate tate ate te lag
Acme Supply 01-Jan- 21-Dec-
123 ABC NY CA N
Co 2000 2004
Acme Supply 22-Dec- 03-Feb-
124 ABC NY IL N
Co 2004 2008
Acme Supply 04-Feb- 31-Dec-
125 ABC NY NY Y
Co 2008 9999
Note that, for the current record (Current_Flag = 'Y'), the Current_State and the Historical_State are
always the same.[1]
Type 2 / Type 6 Fact Implementation
[edit] Surrogate Key Alone
In many Type 2 and Type 6 SCD implementations, the surrogate key from the dimension is put into the
fact table in place of the natural key when the fact data is loaded into the data repository.[1] The surrogate
key is selected for a given fact record based on its effective date and the Start_Date and End_Date from
the dimension table. This allows the fact data to be easily joined to the correct dimension data for the
corresponding effective date.
Here is the Supplier table as we created it above using Type 6 methodology:
Supplier_ Supplier_C Supplier_N Current_S Historical_S Start_D End_Da Current_F
Key ode ame tate tate ate te lag
Acme Supply 01-Jan- 21-Dec-
123 ABC NY CA N
Co 2000 2004
Acme Supply 22-Dec- 03-Feb-
124 ABC NY IL N
Co 2004 2008
125 ABC Acme Supply NY NY 04-Feb- 31-Dec- Y
Co 2008 9999
The following SQL retrieves the correct Supplier Surrogate_Key for each Delivery fact record, based on
the primary effective date, Delivery_Date:
SELECT
supplier.supplier_key
FROM supplier
INNER JOIN delivery
ON supplier.supplier_code = delivery.supplier_code
AND delivery.delivery_date >= supplier.start_date
AND delivery.delivery_date <= supplier.end_date
A fact record with an effective date of August 9, 2001 will be linked to Surrogate_Key 123, with a
Historical_State of 'CA'. A fact record with an effective date of October 11, 2007 will be linked to
Surrogate_Key 124, with a Historical_State of 'IL'.
Once the Delivery table contains the correct Supplier_Key, it can easily be joined to the Supplier table
using that key. The following SQL retrieves, for each fact record, the correct supplier name and the state
the supplier was located in at the time of the delivery:
SELECT
delivery.delivery_cost,
supplier.supplier_name,
supplier.historical_state
FROM delivery
INNER JOIN supplier
ON delivery.supplier_key = supplier.supplier_key
If you've utilized Type 6 processing for your dimension, then you can easily retrieve the state the company
is currently located in, using the same Supplier_Key:
SELECT
delivery.delivery_cost,
supplier.supplier_name,
supplier.current_state
FROM delivery
INNER JOIN supplier
ON delivery.supplier_key = supplier.supplier_key
[edit] Both Surrogate and Natural Key
An alternate implementation is to place both the surrogate key and the natural key into the fact table.[2]
This allows the user to select the appropriate dimension records based on:
• the primary effective date on the fact record (above),
• the most recent or current information,
• any other date associated with the fact record.
This method allows more flexible links to the dimension, even if you have used the Type 2 approach
instead of Type 6.
Here is the Supplier table as we might have created it using Type 2 methodology:
Supplier_ Supplier_C Supplier_N Supplier_S Start_Da Current_F
End_Date
Key ode ame tate te lag
Acme Supply 01-Jan- 21-Dec-
123 ABC CA N
Co 2000 2004
Acme Supply 22-Dec- 03-Feb-
124 ABC IL N
Co 2004 2008
Acme Supply 04-Feb- 31-Dec-
125 ABC NY Y
Co 2008 9999
The following SQL retrieves the most current Supplier_Name and Supplier_State for each fact record:
SELECT
delivery.delivery_cost,
supplier.supplier_name,
supplier.supplier_state
FROM delivery
INNER JOIN supplier
ON delivery.supplier_code = supplier.supplier_code
WHERE supplier.current_flag = 'Y'
If there are multiple dates on the fact record, the fact can be joined to the dimension using another date
instead of the primary effective date. For instance, the Delivery table might have a primary effective date
of Delivery_Date, but might also have an Order_Date associated with each record.
The following SQL retrieves the correct Supplier_Name and Supplier_State for each fact record based on
the Order_Date:
SELECT
delivery.delivery_cost,
supplier.supplier_name,
supplier.supplier_state
FROM delivery
INNER JOIN supplier
ON delivery.supplier_code = supplier.supplier_code
AND delivery.order_date >= supplier.start_date
AND delivery.order_date <= supplier.end_date
Some cautions:
• If the join query is not written correctly, it may return duplicate rows and/or give
incorrect answers.
• The date comparison might not perform well.
• Some Business Intelligence tools do not handle generating complex joins well.
• The ETL processes needed to create the dimension table needs to be carefully
designed to ensure that there are no overlaps in the time periods for each distinct
item of reference data.
Combining Types
Different SCD Types can be applied to different columns of a table. For example, we can apply Type 1 to
the Supplier_Name column and Type 2 to the Supplier_State column of the same table, the Supplier table.

Online analytical processing


From Wikipedia, the free encyclopedia
Jump to: navigation, search
In computing, online analytical processing, or OLAP (pronounced /ˈoʊlæp/) is an approach to swiftly
answer multi-dimensional analytical queries.[1] OLAP is part of the broader category of business
intelligence, which also encompasses relational reporting and data mining.[2] Typical applications of
OLAP include business reporting for sales, marketing, management reporting, business process
management (BPM),[3] budgeting and forecasting, financial reporting and similar areas, with new
applications coming up, such as agriculture.[4] The term OLAP was created as a slight modification of the
traditional database term OLTP (Online Transaction Processing).[5]
Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and
ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and
hierarchical databases that are faster than relational databases.[6]
The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form
the rows and columns of the matrix; the measures form the values.

Concept
The core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It
consists of numeric facts called measures which are categorized by dimensions. The cube metadata is
typically created from a star schema or snowflake schema of tables in a relational database. Measures are
derived from the records in the fact table and dimensions are derived from the dimension tables.
Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is
what describes these labels; it provides information about the measure.
A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a
dimension. Each Sale has a Date/Time label that describes more about that sale.
Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by adding a
foreign key column to the fact table. This allows an analyst to view the measures along any combination
of the dimensions.
For example:
Sales Fact Table
+-------------+----------+
| sale_amount | time_id |
+-------------+----------+ Time Dimension
| 2008.10| 1234 |---+ +---------+-------------------+
+-------------+----------+ | | time_id | timestamp |
| +---------+-------------------+
+---->| 1234 | 20080902 12:35:43 |
+---------+-------------------+
Multidimensional databases
Multidimensional structure is defined as “a variation of the relational model that uses multidimensional
structures to organize data and express the relationships between data”.[7] The structure is broken into
cubes and the cubes are able to store and access data within the confines of each cube. “Each cell within a
multidimensional structure contains aggregated data related to elements along each of its dimensions”.[8]
Even when data is manipulated it is still easy to access as well as be a compact type of database. The data
still remains interrelated. Multidimensional structure is quite popular for analytical databases that use
online analytical processing (OLAP) applications (O’Brien & Marakas, 2009). Analytical databases use
these databases because of their ability to deliver answers swiftly to complex business queries. Data can
be seen from different ways, which gives a broader picture of a problem unlike other models.[9]
Aggregations
It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the
time for the same query on OLTP relational data.[10][11] The most important mechanism in OLAP which
allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact
table by changing the granularity on specific dimensions and aggregating up data along these dimensions.
The number of possible aggregations is determined by every possible combination of dimension
granularities.
The combination of all possible aggregations and the base data contains the answers to every query which
can be answered from the data .[12]
Because usually there are many aggregations that can be calculated, often only a predetermined number
are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations
(views) to calculate is known as the view selection problem. View selection can be constrained by the
total size of the selected set of aggregations, the time to update them from changes in the base data, or
both. The objective of view selection is typically to minimize the average time to answer OLAP queries,
although some studies also minimize the update time. View selection is NP-Complete. Many approaches
to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms
and A* search algorithm.
Types
OLAP systems have been traditionally categorized using the following taxonomy.[13]
[edit] Multidimensional
Main article: MOLAP
MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this
data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it
requires the pre-computation and storage of information in the cube - the operation known as processing.
[edit] Relational
Main article: ROLAP
ROLAP works directly with relational databases. The base data and the dimension tables are stored as
relational tables and new tables are created to hold the aggregated information. Depends on a specialized
schema design.This methodology relies on manipulating the data stored in the relational database to give
the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing
and dicing is equivalent to adding a "WHERE" clause in the SQL statement.
[edit] Hybrid
Main article: HOLAP
There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a
database will divide data between relational and specialized storage. For example, for some vendors, a
HOLAP database will use relational tables to hold the larger quantities of detailed data, and use
specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed
data.
[edit] Comparison
Each type has certain benefits, although there is disagreement about the specifics of the benefits between
providers.
• Some MOLAP implementations are prone to database explosion, a phenomenon
causing vast amounts of storage space to be used by MOLAP databases when certain
common conditions are met: high number of dimensions, pre-calculated results and
sparse multidimensional data.
• MOLAP generally delivers better performance due to specialized indexing and storage
optimizations. MOLAP also needs less storage space compared to ROLAP because the
specialized storage typically includes compression techniques.[14]
• ROLAP is generally more scalable.[14] However, large volume pre-processing is difficult
to implement efficiently so it is frequently skipped. ROLAP query performance can
therefore suffer tremendously.
• Since ROLAP relies more on the database to perform calculations, it has more
limitations in the specialized functions it can use.
• HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and
MOLAP. It can generally pre-process swiftly, scale well, and offer good function
support.
[edit] Other types
The following acronyms are also sometimes used, although they are not as widespread as the ones above:
• WOLAP - Web-based OLAP
• DOLAP - Desktop OLAP
• RTOLAP - Real-Time OLAP

Online transaction processing


From Wikipedia, the free encyclopedia
(Redirected from OLTP)
Jump to: navigation, search
Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage
transaction-oriented applications, typically for data entry and retrieval transaction processing. The term is
somewhat ambiguous; some understand a "transaction" in the context of computer or database
transactions, while others (such as the Transaction Processing Performance Council) define it in terms of
business or commercial transactions.[1] OLTP has also been used to refer to processing in which the system
responds immediately to user requests. An automatic teller machine (ATM) for a bank is an example of a
commercial transaction processing application.
The technology is used in a number of industries, including banking, airlines, mailorder, supermarkets,
and manufacturing. Applications include electronic banking, order processing, employee time clock
systems, e-commerce, and eTrading. The most widely used OLTP system is probably IBM's CICS.[2]
[edit] Requirements
Online transaction processing increasingly requires support for transactions that span a network and may
include more than one company. For this reason, new online transaction processing software uses client or
server processing and brokering software that allows transactions to run on different computer platforms
in a network.
In large applications, efficient OLTP may depend on sophisticated transaction management software (such
as CICS) and/or database optimization tactics to facilitate the processing of large numbers of concurrent
updates to an OLTP-oriented database.
For even more demanding Decentralized database systems, OLTP brokering programs can distribute
transaction processing among multiple computers on a network. OLTP is often integrated into service-
oriented architecture (SOA) and Web services.
[edit] Benefits
Online Transaction Processing has two key benefits: simplicity and efficiency. Reduced paper trails and
the faster, more accurate forecasts for revenues and expenses are both examples of how OLTP makes
things simpler for businesses.
[edit] Disadvantages
As with any information processing system, security and reliability are considerations. Online transaction
systems are generally more susceptible to direct attack and abuse than their offline counterparts. When
organizations choose to rely on OLTP, operations can be severely impacted if the transaction system or
database is unavailable due to data corruption, systems failure, or network availability issues.
Additionally, like many modern online information technology solutions, some systems require offline
maintenance which further affects the cost-benefit analysis.

Vous aimerez peut-être aussi