Vous êtes sur la page 1sur 19

DWH Concepts and Informatica Junior Level Faqs

1. What is ER model and Dimensional Model? A: ER Model - Relational Dimensional - Star Schema(central table fact table with numeric data , all others are linked to central table, faster , but denormalised ) , Snowflake Schema(one fact table, Normalizing the dimension tables , Fact Constellation(Different fact tables and combined from one datamart to other) 2. What is Metadata? A:Data about the data or structure of data is called metadata. 3. What are different types of Dimensional Modeling? A: Dimensional - Star Schema(central table fact table with numeric data , all others are linked to central table, faster , but denormalised ) , Snowflake Schema(one fact table, Normalizing the dimension tables , Fact Constellation(Different fact tables and combined from one datamart to other) 4. Difference between Star & Snowflake Schema A: Snowflaking is a star schema design technique to separately store logical attributes usually of low cardinality along a loosely normalization technique. For example, you could snowflake the gender of your customers in order for you to track changes on these attributes if your customer dimension is too large to SCD's. -The technique s not quite recommendable if you are going to use OLAP tools for your front end due to speed issues. -Snowflaking allows for easy update and load of data as redundancy of data is avoided to some extent, but browsing capabilities are greatly compromised. But sometimes it may become a necessary evil. -To add a little to this, snowflaking often becomes necessary when you need data for which there is a one-to-many relationship with a dimension table. To try to consolidate this data into the dimension table would necessarily lead to redundancy (this is a violation of second normal form, which will produce a Cartesian product). This sort of redundancy can cause misleading results in queries, since the count of rows is artificially large (due to the Cartesian product). A simple example of such a situation might be a "customer" dimension for which there is a need to store multiple contacts. If the contact information is brought in to the customer table, there would be one row for each contact (i.e., one for each customer/contact combination). In this situation, it is better just to create a "contact" snowflake table with a FK to the customer. In general, it is better to avoid snowflaking if possible, but sometimes the consequences of avoiding it are much worse. -In star schema, all your dimensions will be linked directly with your fact table. On the other hand in Snowflake schema, dimensions maybe interlinked or may have one to many relationship with other tables. As previous mails said this isn't a desirable situation but you can make best choice once you have gathered all the requirements. The snowflake is a design like a star but with a connect tables in the dimensions tables is a relation between 2 dimensions.

5. Which is better, Star or Snowflake?why? A: Strict data warehousing rules would have you use a Star schema but in reality most designs tend to become Snowflakes. They each have their pros and cons but both are far better then trying to use a transactional system third-normal form design.

6. What is the necessity of having dimensional modeling instead of an ER modeling? A: Compared to entity/relation modeling, it's less rigorous (allowing the designer more discretion in organizing the tables) but more practical because it accommodates database complexity and improves performance. 7. what are Dimensions and Facts? A: Dimensional modeling begins by dividing the world into measurements and context. Measurements are usually numeric and taken repeatedly. Numeric measurements are facts. Facts are always surrounded by mostly textual context that's true at the moment the fact is recorded. Facts are very specific, well-defined numeric attributes. By contrast, the context surrounding the facts is open-ended and verbose. It's not uncommon for the designer to add context to a set of facts partway through the implementation. Dimensional modeling divides the world of data into two major types: Measurements and Descriptions of the context surrounding those measurements. The measurements, which are typically numeric, are stored in fact tables, and the descriptions of the context, which are typically textual, are stored in the dimension tables. A fact table in a pure star schema consists of multiple foreign keys, each paired with a primary key in a dimension, together with the facts containing the measurements.

Every foreign key in the fact table has a match to a unique primary key in the respective dimension (referential integrity). This allows the dimension table to possess primary keys that arent found in the fact table. Therefore, a product dimension table might be paired with a sales fact table in which some of the products are never sold. Dimensional models are full-fledged relational models, where the fact table is in third normal form and the dimension tables are in second normal form. The main difference between second and third normal form is that repeated entries are removed from a second normal form table and placed in their own snowflake. Thus the act of removing the context from a fact record and creating dimension tables places the fact table in third normal form. E.g. for Fact tables Sales, Cost, Profit E.g. for Dimensions Customer, Product, Store, Time 8. What are Additive Facts? Give one Example? A: The fact tables are mostly very huge and almost never fetch a single record into our answer set. We fetch a very large number of records on which we then do, adding, counting, averaging, or taking the min or max. The most common of them is adding. Applications are simpler if they store facts in an additive format as often as possible. Thus, in the grocery example, we dont need to store the unit price. We compute the unit price by dividing the dollar sales by the unit sales whenever necessary. 9. What is a Conformed Dimension? Give one Example? A: When the enterprise decides to create a set of common labels across all the sources of data, the separate data mart teams (or, single centralized team) must sit down to create master dimensions that everyone will use for every data source. These master dimensions are called Conformed Dimensions. Two dimensions are conformed if the fields that you use as row headers have the same domain. 10. What is a Conformed Fact? A: If the definitions of measurements (facts) are highly consistent, we call them as Conformed Facts. 11. What are the some important features in a data warehouse reporting? (Optional question) A: Drilling Down, Roll Up Drilling Across Slicing/Dicing 12. What is mean by Slowly Changing Dimensions and what are the different types of SCDs? Explain with example ? A: Dimensions dont change in predicable ways. Individual customers and products evolve slowly and episodically. Some of the changes are true physical changes. Customers change their addresses because they move. A product is manufactured with different packaging. Other changes are actually corrections of mistakes in the data. And finally, some changes are changes in how we label a product or customer and are more a matter of opinion than physical reality. We call these variations Slowly Changing Dimension (SCD). The 3 fundamental choices for handling the slowly changing dimension are: Overwrite the changed attribute, thereby destroying previous history Eg. Useful when correcting an error

Issue a new record for the customer, keeping the customer natural key, but creating a new surrogate primary key Create an additional field in the existing customer record, and store the old value of the attribute in the additional field. Overwrite the original attribute field A Type 1 SCD is an overwrite of a dimensional attribute. History is definitely lost. We overwrite when we are correcting an error in the data or when we truly dont want to save history. A Type 2 SCD creates a new dimension record and requires a generalized or surrogate key for the dimension. We create surrogate keys when a true physical change occurs in a dimension entity at a specific point in time, such as the customer address change or the product packing change. We often add a timestamp and a reason code in the dimension record to precisely describe the change. The Type 2 SCD records changes of values of dimensional entity attributes over time. The technique requires adding a new row to the dimension each time theres a change in the value of an attribute (or group of attributes) and assigning a unique surrogate key to the new row. A Type 3 SCD adds a new field in the dimension record but does not create a new record. We might change the designation of the customers sales territory because we redraw the sales territory map, or we arbitrarily change the category of the product from confectionary to candy. In both cases, we augment the original dimension attribute with an old attribute so we can switch between these alternate realities. 13. What are the techniques for handling SCD2s? A: Overwriting Creating another dimension record Creating a current value filed 14. What is a Surrogate Key and where do you use it? A: A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key. It is just a unique identifier or number for each row that can be used for the primary key to the table. It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult. Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME. Another benefit you can get from surrogate keys (SID) is in Tracking the SCD - Slowly Changing Dimension. A classical example: On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover has to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.' If you used the natural business key 'E1' for your employee within your data warehouse

everything would be allocated to Business Unit 'BU2' even what actually belongs to 'BU1.' If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key. This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.' You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key so you need another id. Every join between dimension tables and fact tables in a data warehouse environment should be based on surrogate key, not natural keys. 15. What is the necessity of having surrogate keys? A: -Production may reuse keys that it has purged but that you are still maintaining -Production might legitimately overwrite some part of a product description or a customer description with new values but not change the product key or the customer key to a new value. We might be wondering what to do about the revised attribute values (slowly changing dimension crisis) -Production may generalize its key format to handle some new situation in the transaction system. E.g. changing the production keys from integers to alphanumeric or may have 12byte keys you are used to have become 20-byte keys Acquisition of companies 16. What are the advantages of using Surrogate Keys(Integer key, non natural key)? A: -We can save substantial storage space with integer valued surrogate keys -Eliminate administrative surprises coming from production -Potentially adapt to big surprises like a merger or an acquisition -Have a flexible mechanism for handling slowly changing dimensions 17. What is the difference between OLTP and OLAP? A: OLAP - Online Analytical processing, mainly required for DSS, data is in denormalized manner and mainly used for non volatile data, highly indexed, improve query response time OLTP - Transactional Processing - DML, highly normalized to reduce deadlock & increase concurrency 18. What is the difference between OLTP and data warehouse? Operational System Transaction Processing Time Sensitive Operator View Organized by transactions (Order, Input, Inventory) Data Warehouse Query Processing History Oriented Managerial View Organized by subject (Customer, Product)

Relatively smaller database Many concurrent users Volatile Data Stores all data Not Flexible

Large database size Relatively few concurrent users Non Volatile Data Stores relevant data Flexible

19. What is the life cycle of DW? A: Extracting the data from OLTP systems from diff data sources Analysis & staging - Putting in a staging layer- cleaning, purging, putting surrogate keys, SCM , dimensional modeling Loading Writing of metadata 20. What is a data warehouse? A: A data warehouse is a database designed to support a broad range of decision tasks in a specific organization. It is usually batch updated and structured for rapid online queries and managerial summaries. Data warehouses contain large amounts of historical data which are derived from transaction data, but it can include data from other sources also. It is designed for query and analysis rather than for transaction processing. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. The term data warehousing is often used to describe the process of creating, managing and using a data warehouse. 21. What is a data mart? A: A data mart is a selected part of the data warehouse which supports specific decision support application requirements of a companys department or geographical region. It usually contains simple replicates of warehouse partitions or data that has been further summarized or derived from base warehouse data. Instead of running ad hoc queries against a huge data warehouse, data marts allow the efficient execution of predicted queries over a significantly smaller database. 22. How do I differentiate between a data warehouse and a data mart? A: A data warehouse is for very large databases (VLDBs) and a data mart is for smaller databases. The difference lies in the scope of the things with which they deal. A data mart is an implementation of a data warehouse with a small and more tightly restricted scope of data and data warehouse functions. A data mart serves a single department or part of an organization. In other words, the scope of a data mart is smaller than the data warehouse. It is a data warehouse for a smaller group of end users. 23. What is the aim/objective of having a data warehouse? And who needs a data warehouse? A: Data warehousing technology comprises a set of new concepts and tools which support the executives, managers and analysts with information material for decision making. The fundamental reason for building a data warehouse is to improve the quality of information in the organization. The main goal of data warehouse is to report and present the information in a very user friendly form.

24. What is the Architecture of a data warehouse? A: A data warehouse system (DWS) comprises the data warehouse and all components used for building, accessing and maintaining the DWH (illustrated in Figure 1). The center of a data warehouse system is the data warehouse itself. The data import and preparation component is responsible for data acquisition. It includes all programs, applications and legacy systems interfaces that are responsible for extracting data from operational sources, preparing and loading it into the warehouse. The access component includes all different applications (OLAP or data mining applications) that make use of the information stored in the warehouse. Additionally, a metadata management component (not shown in Figure 1) is responsible for the management, definition and access of all different types of metadata. In general, metadata is defined as data about data or data describing the meaning of data. In data warehousing, there are various types of metadata, e.g., information about the operational sources, the structure and semantics of the DWH data, the tasks performed during the construction, the maintenance and access of a DWH, etc. The need for metadata is well known. Statements like A data warehouse without adequate metadata is like a filing cabinet stuffed with papers, but without any folders or labels characterize the situation. Thus, the quality of metadata and the resulting quality of information gained using a data warehouse solution are tightly linked.

Implementing a concrete DWS is a complex task comprising two major phases. In the DWS configuration phase, a conceptual view of the warehouse is first specified according to user requirements (data warehouse design). Then, the involved data sources and the way data will be extracted and loaded into the warehouse (data acquisition) is determined. Finally, decisions about persistent storage of the warehouse using database technology and the various ways data will be accessed during analysis are made. After the initial load (the first load of the DWH according to the DWH configuration), during the DWS operation phase, warehouse data must be regularly refreshed, i.e., modifications of operational data since the last DWH refreshment must be propagated into the warehouse such that data stored in the DWH reflect the state of the underlying operational systems. Besides DWH refreshment, DWS operation includes further tasks like archiving and purging of DWH data or DWH monitoring. 25. When should a company consider implementing a data warehouse?

A: Data warehouses or a more focused database called a data mart should be considered when a significant number of potential users are requesting access to a large amount of related historical information for analysis and reporting purposes. So-called active or real-time data warehouses can provide advanced decision support capabilities. 26. What data is stored in a data warehouse? A: In general, organized data about business transactions and business operations is stored in a data warehouse. But, any data used to manage a business or any type of data that has value to a business should be evaluated for storage in the warehouse. Some static data may be compiled for initial loading into the warehouse. Any data that comes from mainframe, client/server, or web-based systems can then be periodically loaded into the warehouse. The idea behind a data warehouse is to capture and maintain useful data in a central location. Once data is organized, managers and analysts can use software tools like OLAP to link different types of data together and potentially turn that data into valuable information that can be used for a variety of business decision support needs, including analysis, discovery, reporting and planning. 27. Why should the OLTP database different from data warehouse database? A: OLTP and data warehousing require two very differently configured systems Isolation of Production System from Business Intelligence System Significant and highly variable resource demands of the data warehouse Cost of disk space no longer a concern Production systems not designed for query processing Data warehouse usually contains historical data that is derived from transaction data, but it can include data from other sources. Having separate databases will separate analysis workload from transaction workload and enables an organization to consolidate data from several sources. 28. What is a Logical data model? A: A logical design is a conceptual and abstract design. We do not deal with the physical implementation details yet; we deal only with defining the types of information that we need. The process of logical design involves arranging data into a series of logical relationships called entities and attributes. 29. What are an Entity, Attribute and Relationship? A: An entity represents a chunk of information. In relational databases, an entity often maps to a table. An attribute is a component of an entity and helps define the uniqueness of the entity. In relational databases, an attribute maps to a column. The entities are linked together using relationships. 30. What are the different types of Relationships? A:One-One, Many-One, Many-Many 31. What is a Star Schema? A: A star schema is a set of tables comprised of a single, central fact table surrounded by denormalized dimensions. Each dimension is represented in a single table. Star schema implement dimensional data structures with de- normalized dimensions. Snowflake schema is an alternative to star schema. A relational database schema for representing multidimensional data. The data is stored in a central fact table, with one or more tables holding information on each dimension. Dimensions have levels, and all levels are usually shown as columns in each dimension table. 32. What is a Snowflake Schema? A: A snowflake schema is a set of tables comprised of a single, central fact table surrounded by normalized dimension hierarchies. Each dimension level is represented in a table.

Snowflake schema implements dimensional data structures with fully normalized dimensions. Star schema is an alternative to snowflake schema. An example would be to break down the Time dimension and create tables for each level; years, quarters, months; weeks, days These additional branches on the ERD create ore of a Snowflake shape then Star. 33. What is an OLAP? A: OLAP is software for manipulating multidimensional data from a variety of sources. The data is often stored in data warehouse. OLAP software helps a user create queries, views, representations and reports. OLAP tools can provide a "front-end" for a data-driven DSS. On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated enterprise data supporting end user analytical and navigational activities

34. What are Factless Fact tables? A: Fact tables which do not have any facts are called factless fact tables. They may consist of nothing but keys. There are two kinds of fact tables that do not have any facts at all. -The first type of factless fact table is a table that records an event. Many event-tracking tables in dimensional data warehouses turn out to be factless. E.g. A student tracking system that detects each student attendance event each day. -The second type of factless fact table is called a coverage table. Coverage tables are frequently needed when a primary fact table in a dimensional data warehouse is sparse. E.g. A sales fact table that records the sales of products in stores on particular days under each promotion condition. The sales fact table does answer many interesting questions but cannot answer questions about things that did not happen. For instance, it cannot answer the question, which products were in promotion that did not sell? because it contains only the records of products that did sell. In this case the coverage table comes to the rescue. A record is placed in the coverage table for each product in each store that is on promotion in each time period.

Informatica
35. How can you define a transformation? What are different types of transformations available in Informatica? What are frequently used transformations ? A:A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Below are the various transformations available in Informatica: Aggregator Application Source Qualifier Custom Expression External Procedure Filter Input Joiner Lookup Normalizer Output Rank Router Sequence Generator Sorter Source Qualifier Stored Procedure Transaction Control Union Update Strategy XML Generator XML Parser XML Source Qualifier 36. What is a source qualifier? What is meant by Query Override? A:Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation. PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELECT statement containing all the source columns. Source Qualifier has capability to override this default query by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query. 37. What is aggregator transformation? A: The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression Transformation, the Aggregator transformation can only be used to perform calculations on groups. The Expression transformation permits calculations on a row-by-row basis only. Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the aggregator transformation outputs the last row of each group unless otherwise specified in the transformation properties. Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.

10

38. How Union Transformation is used? A: The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements. 39. Can two flat files be joined with SQ Transformation? A: No, We need to use joiner transformation 40. What is a look up transformation? A: This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. Later returned values can be passed to other transformations. 41. Can a lookup be done on Flat Files? A: Yes. 42. What is the difference between a connected look up and unconnected look up? A: Connected lookup takes input values directly from other transformations in the pipleline. Unconnected lookup doesnt take inputs directly from any other transformation, but it can be used in any transformation (like expression) and can be invoked as a function using :LKP expression. So, an unconnected lookup can be called multiple times in a mapping. 43. What is a mapplet? A:A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and it allows us to reuse that transformation logic in multiple mappings. 44. What does reusable transformation mean? How to create reusable transformation? A: Reusable transformations can be used multiple times in a mapping. The reusable transformation is stored as a metadata separate from any other mapping that uses the transformation. Whenever any changes to a reusable transformation are made, all the mappings where the transformation is used will be invalidated. 45. What is update strategy and what are the options for update strategy? A: Informatica processes the source data row-by-row. By default every row is marked to be inserted in the target table. If the row has to be updated/inserted based on some logic Update Strategy transformation is used. The condition can be specified in Update Strategy to mark the processed row for update or insert. Following options are available for update strategy : DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of DD_INSERT is 0. DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent numeric value of DD_UPDATE is 1. DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of DD_DELETE is 2. DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equivalent numeric value of DD_REJECT is 3. 46. Difference between Informatica 7.1 and 8.1?(optional) A: The architecture of Power Center 8 has changed a lot; PC8 is service-oriented for modularity, scalability and flexibility.

11

2) The Repository Service and Integration Service (as replacement for Rep Server and Informatica Server) can be run on different computers in a network (so called nodes), even redundantly. 3) Management is centralized, that means services can be started and stopped on nodes via a central web interface. 4) Client Tools access the repository via that centralized machine, resources are distributed dynamically. 5) Running all services on one machine is still possible, of course. 6) It has a support for unstructured data which includes spreadsheets, email, Microsoft Word files, presentations and .PDF documents. It provides high availability, seamless fail over, eliminating single points of failure. 7) It has added performance improvements (To bump up systems performance, Informatica has added "push down optimization" which moves data transformation processing to the native relational database I/O engine whenever its is most appropriate.) 8) Informatica has now added more tightly integrated data profiling, cleansing, and matching capabilities. 9) Informatica has added a new web based administrative console. 10) Ability to write a Custom Transformation in C++ or Java. 11) Midstream SQL transformation has been added in 8.1.1, not in 8.1. 12) Dynamic configuration of caches and partitioning 13) Java transformation is introduced 14) User defined functions 15) Power Center 8 release has "Append to Target file" 47. What is a diff between joiner and lookup transformation? A: Joiner Transformation: Joiner will join the two different data sources based on a join condition ,and pass only the rows which satisfy that condition and discards the remaining rows. -Joiner transformation supports 4 types of joins at Informatica level Normal Master Outer Detail Outer Full Outer LookUp Transformation: Lookup transformation basically for Reference, based on the lookup condition. when u want some data based on target data ,will take lookup on that particular table and retrieve the corresponding fields from that table. -we can override the lookup transformation using the SQL query 48. What are limitations of joiner transformation? A: 1.Both the pipelines begin with the same original data source 2.Both input pipelines originate from the SQ transformation 3.Both input pipelines originate from the same normalizer t/r 4.Either input pipelines contains an update transformation 5.Either input pipelines contains a connected or unconnected 6.Sequence generator transformation 49. How to load time dimension? A: Manually by using Stored Procedures(in few cases we may use .csv files to load time dimension from Informatica corp.) 50. In a flatfile i want to get the first record and last record how could we get that? A: Use the first and last functions of aggregate transformation

12

51. What is a Active/Passive Transformation? Give with Examples ? A: A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. -Transformations in a mapping represent the operations the Informatica Server performs on the data. Data passes into and out of transformations through ports that you connect in a mapping or mapplet. -Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the configured filter condition. A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation. -Transformations can be connected to the data flow, or they can be unconnected. An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation. Transformation Descriptions Transformation Advanced External Procedure Aggregator ERP Source Qualifier Expression Type Active/ Connected Active/ Connected Active/ Connected Passive/ Connected Passive/ Connected or Unconnected Active/ Connected Passive/ Connected Active/ Connected Passive/ Connected or Unconnected Active/ Connected Passive/ Connected Description Calls a procedure in a shared library or in the COM layer of Windows NT. Performs aggregate calculations. Represents the rows that the Informatica Server reads from an ERP source when it runs a session. Calculates a value. Calls a procedure in a shared library or in the COM layer of Windows NT. Filters records. Defines mapplet input rows. Available only in the Mapplet Designer. Joins records from different databases or flat file systems. Looks up values. Normalizes records, including those read from COBOL sources. Defines mapplet output rows. Available only in the Mapplet Designer.

External Procedure

Filter Input Joiner

Lookup

Normalizer Output

13

Rank Sequence Generator Source Qualifier

Active/ Connected Passive/ Connected Active/ Connected Active/ Connected Passive/ Connected or Unconnected Active/ Connected Passive/ Connected

Limits records to a top or bottom range. Generates primary keys. Represents the rows that the Informatica Server reads from a relational or flat file source when it runs a session. Routes data into multiple transformations based on a group expression. Calls a stored procedure. Determines whether to insert, delete, update, or reject records. Represents the rows that the Informatica Server reads from an XML source when it runs a session.

Router

Stored Procedure

Update Strategy XML Source Qualifier

52. Give me the Overview Of all Transformations? A: Aggregator: -The Aggregator transformation allows you to perform aggregate calculations, such as averages and sums. The Aggregator transformation is unlike the Expression transformation, in that you can use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only. -When using the transformation language to create aggregate expressions, you can use conditional clauses to filter records, providing more flexibility than SQL language. -The Informatica Server performs aggregate calculations as it reads, and stores necessary data group and row data in an aggregate cache. -After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Informatica Server performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally. Filter: -The Filter transformation provides the means for filtering rows in a mapping. You pass all the rows from a source transformation through the Filter transformation, and then enter a filter condition for the transformation. All ports in a Filter transformation are input/output, and only rows that meet the condition pass through the Filter transformation. -In some cases, you need to filter data based on one or more conditions before writing it to targets. For example, if you have a human resources data warehouse containing information about current employees, you might want to filter out employees who are part-time and hourly. Joiner: -While a Source Qualifier transformation can join data originating from a common source database, the Joiner transformation joins two related heterogeneous sources residing in

14

different locations or file systems. The combination of sources can be varied. You can use the following sources: -Two relational tables existing in separate databases -Two flat files in potentially different file systems -Two different ODBC sources -Two instances of the same XML source -A relational table and a flat file source -A relational table and an XML source -You use the Joiner transformation to join two sources with at least one matching port. The Joiner transformation uses a condition that matches one or more pairs of ports between the two sources. -For example, you might want to join a flat file with in-house customer IDs and a relational database table that contains user-defined customer IDs. You could import the flat file into a temporary database table, and then perform the join in the database. However, if you use the Joiner transformation, there is no need to import or create temporary tables. -If two relational sources contain keys, then a Source Qualifier transformation can easily join the sources on those keys. Joiner transformations typically combine information from two different sources that do not have matching keys, such as flat file sources. The Joiner transformation allows you to join sources that contain binary data. -The Joiner transformation supports the following join types, which you set in the Properties tab: Normal (Default) Master Outer Detail Outer Full Outer Source Qualifier: -When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier represents the records that the Informatica Server reads when it runs a session. -You can use the Source Qualifier to perform the following tasks: Join data originating from the same source database : -You can join two or more tables with primary-foreign key relationships by linking the sources to one Source Qualifier. Filter records when the Informatica Server reads source data : -If you include a filter condition, the Informatica Server adds a WHERE clause to the default query. Specify an outer join rather than the default inner join : -If you include a user-defined join, the Informatica Server replaces the join information specified by the metadata in the SQL query. Specify sorted ports: -If you specify a number for sorted ports, the Informatica Server adds an ORDER BY clause to the default SQL query. Select only distinct values from the source: -If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT statement to the default SQL query. Create a custom query to issue a special SELECT statement for the Informatica Server to read source data:

15

-For example, you might use a custom query to perform aggregate calculations or execute a stored procedure. Stored Procedure: -A Stored Procedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate time-consuming tasks that are too complicated for standard SQL statements. -A stored procedure is a precompiled collection of Transact-SQL statements and optional flow control statements, similar to an executable script. Stored procedures are stored and run within the database. You can run a stored procedure with the EXECUTE SQL statement in a database client tool, just as you can run SQL statements. Unlike standard SQL, however, stored procedures allow user-defined variables, conditional statements, and other powerful programming features. Not all databases support stored procedures, and database implementations vary widely on their syntax. You might use stored procedures to: -Drop and recreate indexes. -Check the status of a target database before moving records into it. -Determine if enough space exists in a database. -Perform a specialized calculation. -Database developers and programmers use stored procedures for various tasks within databases, since stored procedures allow greater flexibility than SQL statements. Stored procedures also provide error handling and logging necessary for mission critical tasks. Developers create stored procedures in the database using the client tools provided with the database. -The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. -You might use a stored procedure to perform a query or calculation that you would otherwise make part of a mapping. For example, if you already have a well-tested stored procedure for calculating sales tax, you can perform that calculation through the stored procedure instead of recreating the same calculation in an Expression transformation. Sequence Generator: -The Sequence Generator transformation generates numeric values. You can use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers. -The Sequence Generator transformation is a connected transformation. It contains two output ports that you can connect to one or more transformations. The Informatica Server generates a value each time a row enters a connected transformation, even if that value is not used. When NEXTVAL is connected to the input port of another transformation, the Informatica Server generates a sequence of numbers. When CURRVAL is connected to the input port of another transformation, the Informatica Server generates the NEXTVAL value plus one. -You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target. -For example, if you have a large input file that you separate into three sessions running in parallel, you can use a Sequence Generator to generate primary key values. If you use different Sequence Generators, the Informatica Server might accidentally generate duplicate key values. Instead, you can use the same reusable Sequence Generator for all three sessions to provide a unique value for each target row. Rank:

16

-The Rank transformation allows you to select only the top or bottom rank of data. You can use a Rank transformation to return the largest or smallest numeric value in a port or group. You can also use a Rank transformation to return the strings at the top or the bottom of a session sort order. During the session, the Informatica Server caches input data until it can perform the rank calculations. -The Rank transformation differs from the transformation functions MAX and MIN, in that it allows you to select a group of top or bottom values, not just one value. For example, you can use Rank to select the top 10 salespersons in a given territory. Or, to generate a financial report, you might also use a Rank transformation to identify the three departments with the lowest expenses in salaries and overhead. While the SQL language provides many functions designed to handle groups of data, identifying top or bottom strata within a set of rows is not possible using standard SQL functions. -You connect all ports representing the same row set to the transformation. Only the rows that fall within that rank, based on some measure you set when you configure the transformation, pass through the Rank transformation. You can also write expressions to transform data or perform calculations. Look Up: -Use a Lookup transformation in your mapping to look up data in a relational table, view, or synonym. Import a lookup definition from any relational database to which both the Informatica Client and Server can connect. You can use multiple Lookup transformations in a mapping. -The Informatica Server queries the lookup table based on the lookup ports in the transformation. It compares Lookup transformation port values to lookup table column values based on the lookup condition. Use the result of the lookup to pass to other transformations and the target. You can use the Lookup transformation to perform many tasks, including: Get a related value: For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read. Perform a calculation: Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). Update slowly changing dimension tables: You can use a Lookup transformation to determine whether records already exist in the target. -You can configure the Lookup transformation to perform different types of lookups. You can configure the transformation to be connected or unconnected, cached or uncached: Connected or unconnected: Connected and unconnected transformations receive input and send output in different ways. Cached or uncached: Sometimes you can improve session performance by caching the lookup table. If you cache the lookup table, you can choose to use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session. With a dynamic cache, the Informatica Server inserts rows into the cache during the session. Informatica recommends that you cache the target table as the lookup. This enables you to look up values in the target and insert them if they do not exist. Expression:

17

-You can use the Expression transformations to calculate values in a single row before you write to the target. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can use the Expression transformation to perform any non-aggregate calculations. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations. Note: To perform calculations involving multiple rows, such as sums or averages, use the Aggregator transformation. Unlike the Expression transformation, the Aggregator allows you to group and sort data. For details, see Aggregator Transformation. Router: -A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. -If you need to test the same input data based on multiple conditions, use a Router Transformation in a mapping instead of creating multiple Filter transformations to perform the same task. The Router transformation is more efficient when you design a mapping and when you run a session. For example, to test data based on three conditions, you only need one Router transformation instead of three filter transformations to perform this task. Likewise, when you use a Router transformation in a mapping, the Informatica Server processes the incoming data only once. When you use multiple Filter transformations in a mapping, the Informatica Server processes the incoming data for each transformation. Update Strategy: -When you design your data warehouse, you need to decide what type of information to store in targets. As part of your target table design, you need to determine whether to maintain all the historic data or just the most recent changes. For example, you might have a target table, T_CUSTOMERS that contains customer data. When a customer address changes, you may want to save the original address in the table, instead of updating that portion of the customer record. In this case, you would create a new record containing the updated address, and preserve the original record with the old customer address. This illustrates how you might store historical information in a target table. However, if you want the T_CUSTOMERS table to be a snapshot of current customer data, you would update the existing customer record and lose the original address. 53. How to Import source definition? A: Using Source Analyser 54. While importing the relational source definition from database, what are the meta data of source U import? A: Source name Database location Column names Data types Key constraints 55. How many ways you can update a relational source definition and what r they? A:Two ways -Edit the definition -Reimport the definition

18

56. Where should you place the flat file to import the flat file definition to the designer? A: Place it in local folder 57. Which transformation should we need while using the Cobol sources as source definitions? A: Normalizer transformation which is used to normalize the data. Since Cobol sources r often consists of Demoralized data. 58. How can U create or import flat file definition in to the warehouse designer? A: U can not create or import flat file definition in to warehouse designer directly. Instead U must analyze the file in source analyzer, then drag it into the warehouse designer. When U drag the flat file source definition into warehouse designer workspace, the warehouse designer creates a relational target definition not a file definition. If u want to load to a file, configure the session to write to a flat file. When the Informatica server runs the session, it creates and loads the flat file. 59. 9. What r the designer tools for creating transformations? A: Mapping designer Transformation developer Mapplet designer 60. What r the connected or unconnected transformations? A: An unconnected transformation is not connected to other transformations in the mapping. Connected transformation is connected to other transformations in the mapping. 61. How many ways u create ports? A: Two ways 1. Drag the port from another transformation 2. Click the add button on the ports tab.

19

Vous aimerez peut-être aussi