Académique Documents
Professionnel Documents
Culture Documents
data mart?
This is a heavily debated issue. There are inherent similarities between the basic constructs used
to design a data warehouse and a data mart. In general a Data Warehouse is used on an
enterprise level, while Data Marts is used on a business division/department level. A data mart
only contains the required subject specific data for local analysis.
Back to top of file
Warehouses are Time Referenced, Subject-Oriented, Non-volatile (read only) and Integrated.
OLTP databases are designed to maintain atomicity, consistency and integrity (the "ACID" tests).
Since a data warehouse is not updated, these constraints are relaxed.
ROLAP stands for Relational OLAP. Users see their data organized in cubes with dimensions,
but the data is really stored in a Relational Database (RDBMS) like Oracle. The RDBMS will store
data at a fine grain level, response times are usually slow.
MOLAP stands for Multidimensional OLAP. Users see their data organized in cubes with
dimensions, but the data is store in a Multi-dimensional database (MDBMS) like Oracle Express
Server. In a MOLAP system lot of queries have a finite answer and performance is usually critical
and fast.
HOLAP stands for Hybrid OLAP, it is a combination of both worlds. Seagate Software's Holos is
an example HOLAP environment. In a HOLAP system one will find queries on aggregated data
as well as on detailed data.
A warehouse typically contains years of data (Time Referenced). Data warehouses group data by
subject rather than by activity (subject-oriented). Other properties are: Non-volatile (read only)
and Integrated.
Q. Why should the OLTP database different from data warehouse database?
OLTP and data warehousing require two very differently configured systems
Isolation of Production System from Business Intelligence System
Significant and highly variable resource demands of the data warehouse
Cost of disk space no longer a concern
Production systems not designed for query processing
Data warehouse usually contains historical data that is derived from transaction data, but it can
include data from other sources. Having separate databases will separate analysis workload from
transaction workload and enables an organization to consolidate data from several sources.
Q. What is the main difference between Data Warehousing and Business Intelligence?
DW - is a way of storing data and creating information through leveraging data marts. DM's are
segments or categories of information and/or data that are grouped together to provide
'information' into that segment or category. DW does not require BI to work. Reporting tools can
generate reports from the DW.
OLAP - Online Analytical processing, mainly required for DSS, data is in denormalized manner
and mainly used for non volatile data, highly indexed, improve query response time
OLTP - Transactional Processing - DML, highly normalized to reduce deadlock & increase
concurrency
What is the difference between sequential file and a dataset? When to use
the copy stage?
Sequentiial Stage stores small amount of the data with any extension in order to
acces the file where as DataSet is used to store Huge amount of the data and it
opens only with an extension (.ds ) .The Copy stage copies a single input data set to
a number of output datasets. Each record of the input data set is copied to every
output data set.Records can be copied without modification or you can drop or
change theorder of columns.
DataStage doesn't know how large your data is, so cannot make an informed choice
whether to combine data using a join stage or a lookup stage. Here's how to decide
which to use:
if the reference datasets are big enough to cause trouble, use a join. A join does a
high-speed sort on the driving and reference datasets. This can involve I/O if the
data is big enough, but the I/O is all highly optimized and sequential. Once the sort
is over the join processing is very fast and never involves paging or other I/O
Unlike Join stages and Lookup stages, the Merge stage allows you to specify several
reject links as many as input links.
Hash file stores the data based on hash algorithm and on a key value. A sequential
file is just a file with no key column. Hash file used as a reference for look up.
Sequential file cannot
Difference between Hashfile and sequential file is , searching a record is too fast in
hash file based on the hashkey, we can get the address of record directly in hashfile
based on the hashkey, and in sequential file it should search record sequential mode
only, it has to search for record by record, and we can remove duplicate records
based on the hash key in hashfile, we cannot in sequential file
The hashed files have the default size established by their modulus and separation
when you create them, and this can be static or dynamic.
Overflow space is only used when data grows over the reserved size for someone of
the groups (sectors) within the file. There are many groups as the specified by the
modulus.
dynamic will use only when we dont know howmuch data will coming from the
source side, this will allow data loading grow automatically,
only we use static when we know the fixed amount of data we are trying to load in
the target DB,
this is the scenario for use both types