Datastage Interview Question

c

In simple terms, level of granularity defines the extent of detail. As an example, let us look at
geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION,
TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET.

The star schema is created when all the dimension tables directly link to the fact table. Since the
graphical representation resembles a star it is called a star schema. It must be noted that the foreign
keys in the fact table link to the primary key of the dimension table. This sample provides the star
schema for a sales_ fact for the year 1998. The dimensions created are Store, Customer, Product_class
and time_by_day. The Product table links to the product_class table through the primary key and
indirectly to the fact table. The fact table contains foreign keys that link to the dimension tables.
c
Fact Table contains the measurements or metrics or facts of business process. If your business process
is "Sales" , then a measurement of this business process such as "monthly sales number" is captured
in the Fact table. Fact table also contains the foriegn keys for the dimension tables.
c c
Data Warehouse is a repository of integrated information, available for queries and analysis. Data and
information are extracted from heterogeneous sources as they are generated«.This makes it much
easier and more efficient to run queries over data that originally came from different sources. Typical
relational databases are designed for on-line transactional processing (OLTP) and do not meet the
requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are
designed differently than traditional relational databases.

While ER model lists and defines the constructs required to build a data model, there is no standard
process for doing so. So me methodologies, such as IDEFIX, specify a bottom-up
c
Data modeling is probably the most labor intensive and time consuming part of the development
process. Why bother especially if you are pressed for time? A common
c

Dimensional Modelling is a design concept used by many data warehouse desginers to build thier
datawarehouse. In this design model all the data is stored in two types of tables - Facts table and
Dimension table. Fact table contains the facts/measurements of the business and the dimension table
contains the context of measuremnets ie, the dimensions on which the facts are calculated.
c

c
They are mainly 2 methods.1. Ralph Kimbell Model
2. Inmon Model.
Kimbell model always structed as Denormalised structure.
Inmon model structed as Normalised structure.
Depends on the requirements of the company anyone can follow the company's DWH will choose the
one of the above models.
c

On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other
types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
c ! " " "#

Normalization can be defined as segregating of table into two different tables, so as to avoid
duplication of values.?
c $

Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the
fact table, but not the others. For example:
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes
sense to add them up for all accounts (what's the total current balance for all accounts in the bank?),
but it does not make sense to add them up through ti me (adding up all current balances for a given
account for each day of the month does not give us any useful information
A factless fact table captures the many-to-many relationships between
dimensions, but contains no numeric or textual facts. They are of ten used to record events or
coverage information. Common examples of factless fact tables include:
- Identifying product promotion events (to determine promoted products that didn?t sell)
- Tracking student attendance or registration events
- Tracking insurance-related accident events
- Identifying building, facility, and equipment schedules for a hospital or university
%
&
Yes it is correct to develop a Data Mart using an ODS.becoz ODS which is used to ?store transaction
data and few Days (less historical data) this is what datamart is required so it is coct to develop
datamart using ODS .
' (

A Degenerate dimension?is a?Dimension which has only a single attribute.
This dimension is typically represented as a single field in a fact table.
The data items thar are not facts and data items that do not fit into the existing dimensions are termed
as Degenerate Dimensions.
Degenerate Dimensions are the fastest way to group similar transactions.
Degenerate Dimensions are used when fact tables represent transactional data.
They can be used as primary key for the fact table but they cannot act as foreign keys.
c
!

View - store the SQL statement in the database and let you use it as a table. Everytime you access the
view,? the SQL statement executes.
materialized view - stores the results of the SQL in table form in the database. SQL statement only
executes once and after that everytime you run the query, the stored result set is used. Pros include
quick query results.
c $

Fact table typically has two types of columns: those that contain numeric facts (often called
measurements), and those that are foreign keys to dimension tables.
A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that
contain aggregated facts are often called summary tables. A fact table usually contains facts with the
same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be
aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts
cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions
and not along others. An example of this is inventory levels, where you cannot tell what a level means
simply by looking at it.
c

Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against
the table constraints and the bad data won't be indexed.
c
A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis
servers. End users connected to any of these analysis servers can then access the cube. This
arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple
analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube
looks like a regular cube.
c

The values of dimension which is stored in fact table is called degenerate dimensions. these dimensions
doesn,t have its own dimensions.

(c

Star Schema means
A centralized fact table and sarounded by diffrent dimensions
Snowflake means
In the same star schema dimensions split into another dimensions
Star Schema contains Highly Denormalized Data
Snow flake? contains Partially normalized
Star can not have parent table
But snow flake contain parent tables
Why need to go there Star:
Here 1)less joiners contains
2)simply database
3)support drilling up options
Why nedd to go Snowflake schema:
Here sometimes we used to provide?seperate dimensions from existing dimensions that time we will go
to snowflake
Dis Advantage Of snowflake:
Query performance is very low because more joiners is there
c

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product
price changes over time; People change their names for some reason; Country and State names may
change over time. These are a few examples of Slowly Changing Di mensions since some changes are
happening to them over a period of time.
If the data in the Dimension table happen to change very rarely,then it is called as slowly changing
dimension.
ex: changing the name and address of a person,which happens rerely.
c )

1. MS-Excel
2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
4. Microstrategy
5. MS reporting services
6. Informatica Power Analyzer
7. Actuate
8. Hyperion (BRIO)
9. Oracle Express OLAP
10. Proclarity
c

Star schema is a type of organising the tables such that we can retrieve the result from the database
easily and fastly in the warehouse environment.Usually a star schema consists of one or more
dimension tables around a fact table which looks like a star,so that it got its name.
c
*

Both differed in the concept of building teh datawarehosue..
According to Kimball «
Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering
business objectives for departments in the organization. And the data warehouse is a conformed
dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension
modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development
of the data warehouse can start with data from the online store. Other subject areas can be added to
the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management
decides it is necessary.
i.e.,
Kimball±First DataMarts±Combined way ²Datawarehouse
Inmon²First Datawarehouse±Later²-Datamarts

Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
so whenever we have the keys in a table .that itself implies that the table is in the normal form.
&

n my knowlegde, these are?called as object types in the Business Objects.And alias is different from
view in the universe. View is at database level, but alias?is a different name given for the same table
to resolve the loops in universe.
The different data types in business objects are:1. Character.2. Date.3. Long text.4. Number
c

Metadata or Meta Data Metadata is data about data. Examples of metadata include data element
descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions, and
process/method descriptions. The repository environment encompasses all corporate metadata
resources: database catalogs, data dictionaries, and navigation services. Metadata includes things like
the name, length, valid values, and description of a data element. Metadata is stored in a data
dictionary and repository. It insulates the data warehouse from changes in the schema of operational
systems. Metadata Synchronization The process of consolidating, relating and synchronizing data
elements with the same or similar meaning from different systems. Metadata synchronization joins
these differing elements together in the data warehouse to allow for easier access.
+

Every Datawarehouse maintains a time dimension. It would be at the most granular level at which the
business runs at (ex: week day, day of the month and so on). Depe nding on the data loads, these time
dimensions are updated. Weekly process gets updated every week and monthly process, every month.
Generally we load the Time dimension by using SourceStage as a Seq File and we use one passive
stage in that transformer stage we will manually write functions as Month and Year Functions to load
the time dimensions but for the lower level i.e., Day also we have one function to implement loading of
Time Dimension.
c

These tools are used for Data/dimension modeling
Oracle Designer
ERWin (Entity Relationship for windows)
Informatica (Cubes/Dimensions)
Embarcadero
Power Designer Sybase
c
) c ,(

RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modelled
DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model
c &
ODS stands for Online Data Storage.
c
The basic purpose of the scheduling tool in a DW Application is to stream li ne the flow of data from
Source To Target at specific time or based on some condition.

-
Surrogate Key is an artificial identifier for an entity. In surrogate key values are generated by the
system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not
describe anything. Primary Key is a natural identifier for an entity. In Primary keys all the values are
entered manually by the user which are uniquely identified. There will be no repetition of data.

.*
If a column is made a primary key and later there needs a change in the data type or the length for
that column then all the foreign keys that are dependent on that primary key should be changed
making the database Unstable . Surrogate Keys make the database more stable because it insulates
the Primary and foreign key relationships from changes in the data types and length.
c

Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has
been grouped into multiple tables instead of one large table. For example, a product dimension table in
a star schema might be normalized into a products table, a product_category table, and a
product_manufacturer table in a snowflake schema. While this saves space, it increases the number of
dimension tables and requires more foreign key joins. The result is more complex queries and reduced
query performance
c
&/#. &/0.

OLTP is nothing but OnLine Transaction Processing ,which contains a normalised tables and online
data,which have frequent insert/updates/delete.
c

They are dimension tables in a star schema data mart that adhere to a common structure, and
therefore allow queries to be executed across star schemas. For example, the Calendar dimension is
commonly needed in most data marts. By making this Calendar dimension adhere to a single structure,
regardless of what data mart it is used in your organization, you can query by date/time from one data
mart to another to another.
+

Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension.
Change fact table and DW population routines.
c

A conformed dimension is a single, coherent view of the same piece of data throughout the
organization. The same dimension is used in all sub sequent star schemas defined. This enables
reporting across the complete data warehouse in a simple format.
c

Data Mining is used for?the estimation of future. For example,?if we take a compan y/business
organization, by using the concept of Data Mining, we can predict the future of business interms of
Revenue (or) Employees (or) Cutomers (or) Orders etc.
Traditional approches use?simple algorithms?for estimating the future. But, it does not giv e accurate
results when compared to Data Mining.
c

No Tool testing in done in DWH, only manual testing is done.

Degenerated Dimension is a dimension key without corresponding dimension. Example:
In the PointOfSale Transaction Fact table, we have:
Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key?(FP),?and POS Transaction Number??
Date Dimension corresponds to Date Key, Production Dimension correspon ds to Production Key. In a
traditional parent-child database, POS Transactional Number would be?the key to the transaction
header record that contains all the info valid for the transaction as a whole, such as the transaction
date and store?identifier.?But in this?dimensional model, we have already extracted this info into other
dimension. Therefore, POS Transaction Number?looks like a dimension key in the fact table but does
not have the corresponding dimension table.
Therefore, POS Transaction Number is a degenerated dimension.
c

what ever changes done in source for each and every record there is a new entry in target side,
whether it may be UPDATE or INSERT and in target mentaining the history.
Let me give an example to make the point clear«.
Like account information is usually maintained in two categories:
Current Account and other is Time of Event Account i.e We have two set of tables eg CUR_ACCT this is
fast moving dimension containing information like Balance et c , while the other is TOE_ACCT table this
contains information like Contact Details, Phone No where history is not only important but considered
to be changing slowly.
With?this respect TOE_ACCT table qualiefies as slowly changing dimension.

Normally Surrogate keys are sequencers which keep on increasing with new records being injected into
the table. The standard datatype is integer
c

1.Understand the bussiness requirements.
2.Once the business requirements are clear then Identify the Grains(Levels).
3.Grains are defined ,design the Dimensional tables with the Lower level Grains.
4.Once the Dimensions are designed,design the Fact table With the Key Performance
Indecators(Facts).
5.Once the dimensions and Fact tables are designed define the relation ship between the tables by
using primery key and Foriegn Key.In logical phase data base design looks like Star Schema design so
it is named as Star Schema Design.

Datastage Interview Question

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Datastage Interview Question

Transféré par

Droits d'auteur :

Formats disponibles

c 

     

   

c    

c   

c   ! "    "   "#    

c $ 

'     (

     ! 

c  

c     

c     

c   )    

c   

c   

     *  

    &   

c    

+   

c     

c   

    )   c  ,(

c   

   &/#. &/0.

+     

c          

c       

c       

c     

Vous aimerez peut-être aussi

c

c

c

c ! " " "#

c $

' (

!

c

c

c

c )

c

c

*

&

c

+

c

c

) c ,(

c

&/#. &/0.

+

c

c

c

c