Vous êtes sur la page 1sur 8

DATASTAGE QUESTIONS

1. What is the flow of loading data into fact & dimensional tables?
A) Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys in
Dimensional table. Consists of fields with numeric values.
Dimension table - Table with Unique Primary Key.
Load - Data should be first loaded into dimensional table. Based on the primary key values in
dimensional table, the data should be loaded into Fact table.

2. What is the default cache size? How do you change the cache size if needed?
A. Default cache size is 256 MB. We can increase it by going into Datastage Administrator and
selecting the Tunable Tab and specify the cache size over there.

3. What are types of Hashed File?


A) Hashed File is classified broadly into 2 types.
a) Static - Sub divided into 17 types based on Primary Key Pattern.
b) Dynamic - sub divided into 2 types
i) Generic ii) Specific.

Dynamic files do not perform as well as a well, designed static file, but do perform better than a
badly designed one. When creating a dynamic file you can specify the following
Although all of these have default values)

By Default Hashed file is "Dynamic - Type Random 30 D"

4. What does a Config File in parallel extender consist of?


A) Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.

5. What is Modulus and Splitting in Dynamic Hashed File?


A. In a Hashed File, the size of the file keeps changing randomly.
If the size of the file increases it is called as "Modulus".
If the size of the file decreases it is called as "Splitting".

6. What are Stage Variables, Derivations and Constants?


A. Stage Variable - An intermediate processing variable that retains value during read and
doesn’t pass the value into target column.
Derivation - Expression that specifies value to be passed on to the target column.
Constant - Conditions that are either true or false that specifies flow of data with a link.

7. Types of views in Datastage Director?


There are 3 types of views in Datastage Director
a) Job View - Dates of Jobs Compiled.
b) Log View - Status of Job last run
c) Status View - Warning Messages, Event Messages, Program Generated Messages.

8. Types of Parallel Processing?


A) Parallel Processing is broadly classified into 2 types.
a) SMP - Symmetrical Multi Processing.
b) MPP - Massive Parallel Processing.

9. Orchestrate Vs Datastage Parallel Extender?


A) Orchestrate itself is an ETL tool with extensive parallel processing capabilities and running
on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version of 6.0) to
incorporate the parallel processing capabilities. Now Datastage has purchased Orchestrate and
integrated it with Datastage XE and released a new version Datastage 6.0 i.e Parallel Extender.

10. Importance of Surrogate Key in Data warehousing?


A) Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is
independent of underlying database. i.e. Surrogate Key is not affected by the changes going on
with a database.

11. How to run a Shell Script within the scope of a Data stage job?
A) By using "ExcecSH" command at Before/After job properties.

12. How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to yyyy-
dd-mm?
A) We use a) "Iconv" function - Internal Conversion.
b) "Oconv" function - External Conversion.

Function to convert mm/dd/yyyy format to yyyy-dd-mm is


Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")

13 How do you execute datastage job from command line prompt?


A) Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname

14. Functionality of Link Partitioner and Link Collector?


Link Partitioner: It actually splits data into various partitions or data flows using various
partition methods.
Link Collector: It collects the data coming from partitions, merges it into a single data flow and
loads to target.

15. Types of Dimensional Modeling?


A) Dimensional modeling is again sub divided into 2 types.
a) Star Schema - Simple & Much Faster. Denormalized form.
b) Snowflake Schema - Complex with more Granularity. More normalized form.

16. Differentiate Primary Key and Partition Key?


Primary Key is a combination of unique and not null. It can be a collection of key values called
as composite primary key. Partition Key is a just a part of Primary Key. There are several
methods of partition like Hash, DB2, and Random etc. While using Hash partition we specify the
Partition Key.

17. Differentiate Database data and Data warehouse data?


A) Data in a Database is
a) Detailed or Transactional
b) Both Readable and Writable.
c) Current.

18. Containers Usage and Types?


Container is a collection of stages used for the purpose of Reusability.
There are 2 types of Containers.
a) Local Container: Job Specific
b) Shared Container: Used in any job within a project.

19. Compare and Contrast ODBC and Plug-In stages?


ODBC: a) Poor Performance.
b) Can be used for Variety of Databases.
c) Can handle Stored Procedures.

Plug-In: a) Good Performance.


b) Database specific. (Only one database)
c) Cannot handle Stored Procedures.

20. Dimension Modelling types along with their significance


Data Modelling is Broadly classified into 2 types.
a) E-R Diagrams (Entity - Relatioships).
b) Dimensional Modelling.

Q 21 What are Ascential Dastastage Products, Connectivity


Ans:
Ascential Products

Ascential DataStage
Ascential DataStage EE (3)
Ascential DataStage EE MVS
Ascential DataStage TX
Ascential QualityStage
Ascential MetaStage
Ascential RTI (2)
Ascential ProfileStage
Ascential AuditStage
Ascential Commerce Manager
Industry Solutions
Connectivity
Files
RDBMS
Real-time
PACKs
EDI
Other

Q 22 Explain Data Stage Architecture?


Data Stage contains two components,
Client Component.
Server Component.
Client Component:
 Data Stage Administrator.
 Data Stage Manager
 Data Stage Designer
 Data Stage Director

Server Components:
 Data Stage Engine
 Meta Data Repository
 Package Installer

Data Stage Administrator:


Used to create the project.
Contains set of properties
We can set the buffer size (by default 128 MB)
We can increase the buffer size.
We can set the Environment Variables.
In tunable we have in process and inter-process
In-process—Data read in sequentially
Inter-process— It reads the data as it comes.
It just interfaces to metadata.

Data Stage Manager:


We can view and edit the Meta data Repository.
We can import table definitions.
We can export the Data stage components in .xml or .dsx format.
We can create routines and transforms
We can compile the multiple jobs.

Data Stage Designer:


We can create the jobs. We can compile the job. We can run the job. We can declare stage
variable in transform, we can call routines, transform, macros, functions.
We can write constraints.

Data Stage Director:


We can run the jobs.
We can schedule the jobs. (Schedule can be done daily, weekly, monthly, quarterly)
We can monitor the jobs.
We can release the jobs.

Q 23 What is Meta Data Repository?


Meta Data is a data about the data.
It also contains
 Query statistics
 ETL statistics
 Business subject area
 Source Information
 Target Information
 Source to Target mapping Information.

Q 24 What is Data Stage Engine?


It is a JAVA engine running at the background.

Q 25 What is Dimensional Modeling?


Dimensional Modeling is a logical design technique that seeks to present the data in a
standard framework that is, intuitive and allows for high performance access.

Q 26 What is Star Schema?


Star Schema is a de-normalized multi-dimensional model. It contains centralized fact tables
surrounded by dimensions table.
Dimension Table: It contains a primary key and description about the fact table.
Fact Table: It contains foreign keys to the dimension tables, measures and aggregates.

Q 27 What is surrogate Key?


It is a 4-byte integer which replaces the transaction / business / OLTP key in the dimension
table. We can store up to 2 billion record.

Q 28 Why we need surrogate key?


It is used for integrating the data may help better for primary key.
Index maintenance, joins, table size, key updates, disconnected inserts and partitioning.

Q 29 What is Snowflake schema?


It is partially normalized dimensional model in which at two represents least one dimension or
more hierarchy related tables.

Q 30 Explain Types of Fact Tables?


Factless Fact: It contains only foreign keys to the dimension tables.
Additive Fact: Measures can be added across any dimensions.
Semi-Additive: Measures can be added across some dimensions. Eg, % age, discount
Non-Additive: Measures cannot be added across any dimensions. Eg, Average
Conformed Fact: The equation or the measures of the two fact tables are the same under the facts
are measured across the dimensions with a same set of measures.
Q 31 Explain the Types of Dimension Tables?
Conformed Dimension: If a dimension table is connected to more than one fact table, the
granularity that is defined in the dimension table is common across between the fact tables.
Junk Dimension: The Dimension table, which contains only flags.
Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension.
De-generative Dimension: It is line item-oriented fact table design.

Q 32 What are stage variables?


Stage variables are declaratives in Transformer Stage used to store values. Stage variables are
active at the run time. (Because memory is allocated at the run time).

Q 33 What is sequencer?
It sets the sequence of execution of server jobs.

Q 34 What are Active and Passive stages?


Active Stage: Active stage model the flow of data and provide mechanisms for combining data
streams, aggregating data and converting data from one data type to another. Eg, Transformer,
aggregator, sort, Row Merger etc.
Passive Stage: A Passive stage handles access to Database for the extraction or writing of data.
Eg, IPC stage, File types, Universe, Unidata, DRS stage etc.

Q 35 What is ODS?
Operational Data Store is a staging area where data can be rolled back.

Q 36 What are Macros?


They are built from Data Stage functions and do not require arguments.
A number of macros are provided in the JOBCONTROL.H file to facilitate getting information
about the current job, and links and stages belonging to the current job. These can be used in
expressions (for example for use in Transformer stages), job control routines, filenames and table
names, and before/after subroutines.

These macros provide the functionality of using the DSGetProjectInfo, DSGetJobInfo,


DSGetStageInfo, and DSGetLinkInfo functions with the DSJ.ME token as the JobHandle and
can be used in all active stages and before/after subroutines. The macros provide the
functionality for all the possible InfoType arguments for the DSGet…Info functions. See the
Function call help topics for more details.

The available macros are: DSHostName


DSProjectName
DSJobStatus
DSJobName
DSJobController
DSJobStartDate
DSJobStartTime
DSJobStartTimestamp
DSJobWaveNo
DSJobInvocations
DSJobInvocationId DSStageName
DSStageLastErr
DSStageType
DSStageInRowNum
DSStageVarList DSLinkRowCount
DSLinkLastErr
DSLinkName

1) Examples

2) To obtain the name of the current job:

3) MyName = DSJobName

To obtain the full current stage name:

MyName = DSJobName : . : DSStageName

Vous aimerez peut-être aussi