Vous êtes sur la page 1sur 6

1. What is a data-warehouse?

A Data Warehouse is a Subject-oriented, integrated, time-variant and non-Volatile


collection of data in support of management's decision making process.

2. What are Data Marts?

A data mart is a repository of data that is designed to


serve a particular community of knowledge workers.
Difference between DWH and data mart is that DWH is a
central database for organization purpose but data mart is
use to meet purpose of specific community.

3. What is ER Diagram?

An Entity Relationship (ER) Diagram is a type of flowchart that


illustrates how “entities” such as people, objects or concepts relate
to each other within a system.ER Diagrams are most often used to
design or debug relational databases in the fields of software
engineering, business information systems, education and
research.
4. What is a Star Schema?

Star Schema contains Centralized Fact Tables, Fact Table surrounded by different
Dimension Tables its called Star Schema.
 A star schema contains only single dimension table for each dimension.
 When dimension table contains less number of rows, we can choose Star schema.
 Both Dimension and Fact Tables are in De-Normalized
 Less No.of Joins we can used
 Good for data marts with simple relationships (1:1 or 1:many)
 Less number of foreign keys and hence shorter query execution time (faster)
 Lower query complexity and easy to understand
 Top down approach.

5. What is Dimensional Modelling?

Dimensional modeling is a database design technique that supports


business users to query data in data warehousesystem. The dimensional
modeling is developed to be oriented to improve the query performance
and ease of use.
6. What Snow Flake Schema?

Snow Flake Schema contains Centralized Fact table, surrounded by different Dimension
tables and each dimension tables having Sub-Dimensions its called Snow Flake Schema.

7. What are the Different methods of loading Dimension tables?

8. Slowly changing Dimension


9. Rapidly Changing Dimension
10. Junk Dimension
11. Confirmed Dimension
12. De-Generated Dimension
13. Role Playing Dimensions
14. Inferred Dimensions
15. Static Dimensions.

8. What are Aggregate tables?

Aggregate table contains the  summary of existing warehouse data which is grouped to certain levels of
dimensions.Retrieving the required data from the actual table, which have millions of records will take
more time and also affects the server performance.To avoid this we can aggregate the table to certain
required level and can use it.This tables reduces the load in the database server and increases the
performance of the query and can retrieve the result very fastly.

9. What is the Difference between OLTP and OLAP?

 OLTP :
 Online Transaction Processing
 It maintain current data(it means period of <1 yrs)
 Will have done INSERT, UPDATE and DELETE operations
 Data is volatile in OLTP
 Response time is fast.
 Data is Normalized

 OLAP :
 Online Analytical Processing
 It maintain current and history data.
 Only we can use SELECT operation for reporting and analysis purpose.
 Data is Non-Volatile
 Response time is slow
 Data is De-Normalized.

10. What is ETL?

ETL (Extract, Transform and Load) is a process in data warehousing responsible for
pulling data out of the source systems and placing it into a data warehouse. ETL involves
the following tasks:

- extracting the data from source systems (SAP, ERP, other oprational systems), data
from different source systems is converted into one consolidated data warehouse format
which is ready for transformation processing. 

- transforming the data may involve the following tasks:


   applying business rules (so-called derivations, e.g., calculating new measures and
dimensions),
   cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.),
   filtering (e.g., selecting only certain columns to load),
   splitting a column into multiple columns and vice versa,
   joining together data from multiple sources (e.g., lookup, merge),
   transposing rows and columns,

   applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row
are empty then reject the row from processing)
- loading the data into a data warehouse or data repository other reporting applications

11. What are the various ETL tools in the Market?

 Informatica - Power Center


 IBM - Websphere DataStage(Formerly known as Ascential DataStage)
 SAP - BusinessObjects Data Integrator
 IBM - Cognos Data Manager (Formerly known as Cognos DecisionStream)
 Microsoft - SQL Server Integration Services
 Oracle - Data Integrator (Formerly known as Sunopsis Data Conductor)
 SAS - Data Integration Studio
 Oracle - Warehouse Builder
 AB Initio
 Information Builders - Data Migrator

 Pentaho - Pentaho Data Integration


12. What are the various Reporting tools in the Market?

Cognos (Impromptu, Power Play) , MS-EXCEL, TABLEAU, Qlikview, Oracle Express OLAP, MS reporting
services(SSRS) , Informatica Power Analyzer

13. What is Fact table?

Fact table contains Facts and Measures, Every fact table surrounded by number of Dimension
tables. Based on our requirement.

14. What is a dimension table?

Dimension table stores attributes, or dimensions, that describe the objects in a fact table.

16. What is a lookup table?

When a table is used to check for some data for its presence prior to loading of some other data
or the same data to another table, the table is called a LOOKUP Table.

17. What is a general purpose scheduling tool? Name some of them?

The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data
from Source to Target at specific time or based on some condition.

18. What are modeling tools available in the Market? Name some of them?

19. What is real time data-warehousing?

20. What is data mining?

Data Mining is defined as extracting information from huge sets of data. In


other words, we can say that data mining is the procedure of mining
knowledge from data. The information or knowledge extracted so can be
used for any of the following applications −

 Market Analysis

 Fraud Detection

 Customer Retention

 Production Control
 Science Exploration

Data Mining Applications


Data mining is highly useful in the following domains −

 Market Analysis and Management

 Corporate Analysis & Risk Management

 Fraud Detection

21. What is Normalization? First Normal Form, Second Normal Form , Third Normal Form?

22. What is ODS?

An ODS is designed for relatively simple queries on small


amounts of data (such as finding the status of a customer
order), rather than the complex queries on large amounts of
data typical of the data warehouse. An ODS is similar to your
short term memory in that it stores only very recent
information; in comparison, the data warehouse is more like
long term memory in that it stores relatively permanent
information.

23. What type of Indexing mechanism do we need to use for a typical Data warehouse?

Bitmap index
23. Which columns go to the fact table and which columns go the dimension table? (My user needs to see
<data element<data element broken by <data element<data element>

All elements before broken = Fact Measures

All elements after broken = Dimension Elements

24. What is a level of Granularity of a fact table? What does this signify?(Weekly level summarization
there is no need to have Invoice Number in the fact table anymore)

25. How are the Dimension tables designed? De-Normalized, Wide, Short, Use Surrogate Keys, Contain
Additional date fields and flags.
26. What are slowly changing dimensions?

27. What are non-additive facts? (Inventory,Account balances in bank)

28. What are conformed dimensions?

29. What is VLDB? (Database is too large to back up in a time frame then it's a VLDB)

30. What are SCD1, SCD2 and SCD3?

Vous aimerez peut-être aussi