Vous êtes sur la page 1sur 8

Erwin Data Modelling Data Warehousing Business Intelligence Oracle Database Dimensional Modeling Questions and Answers Part

4 Q) Erwin Tutorial All Fusion Erwin Data Modeler commonly known as Erwin, is a powerful and leading data modeling tool from Computer Associates. Computer Associates delivers several softwares for enterprise management, storage management solutions, security solutions, application life cycle management, data management and business intelligence. Erwin makes database creation very simple by generating the DDL(sql) scripts from a data model by using its Forward Engineering technique or Erwin can be used to create data models from the existing database by using its Reverse Engineering technique. Erwin workplace consists of the following main areas: Logical: In this view, data model represents business requirements like entities, attributes etc. Physical: In this view, data model represents physical structures like tables, columns, datatypes etc. Modelmart : Many users can work with a same data model concurrently. Q)What can be done with Erwin? 1.Logical, Physical and dimensional data models can be created. 2.Data Models can be created from existing systems(rdbms, dbms, files etc.). 3.Different versions of a data model can be compared. 4.Data model and database can be compared. 5.SQl scripts can be generated to create databases from data model. 6.Reports can be generated in different file formats like .html, .rtf, and .txt. 7.Data models can be opened and saved in several different file types like .er1, .ert, .bpx, .xml, .ers, .sql, .cmt, .df, .dbf, and .mdb files. 8.By using ModelMart, concurrent users can work on the same data model. In order to create data models in Erwin, you need to have this All Fusion Erwin Data Modeler installed in your system. If you have installed Modelmart, then more than one user can work on the same model. Q)What is Data Modeling Development Cycle? Gathering Business Requirements - First Phase: Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs. Conceptual Data Modeling(CDM) - Second Phase: This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE. Logical Data Modeling(LDM) - Third Phase: This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of an organization.

Physical Data Modeling(PDM) - Fourth Phase: This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database.

Database - Fifth Phase: DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL code is executed in server to create databases. Q)Standardization Needs | Modeling data: Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules. Nowadays, business to business transactions(B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definition would create a chaos across the business. For example, when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization. Table Names Standardization: Giving a full name to the tables, will give an idea about data what it is about. Generally, do not abbreviate the table names; however this may differ according to organizations standards. If the table names length exceeds the database standards, then try to abbreviate the table names. Some general guidelines are listed below that may be used as a prefix or suffix for the table. Examples: Lookup LKP - Used for Code, Type tables by which a fact table can be directly accessed. e.g. Credit Card Type Lookup CREDIT_CARD_TYPE_LKP Fact FCT - Used for transaction tables: e.g. Credit Card Fact - CREDIT_CARD_FCT Cross Reference - XREF Tables that resolves many to many relationships. e.g. Credit Card Member XREF CREDIT_CARD_MEMBER_XREF History HIST - Tables the stores history. e.g. Credit Card Retired History CREDIT_CARD_RETIRED_HIST Statistics STAT - Tables that store statistical information. e.g. Credit Card Web Statistics CREDIT_CARD_WEB_STAT Column Names Standardization: Some general guidelines are listed below that may be used as a prefix or suffix for the column. Examples: Key Key System generated surrogate key. e.g. Credit Card Key CRDT_CARD_KEY Identifier ID - Character column that is used as an identifier. e.g. Credit Card Identifier CRDT_CARD_ID Code CD - Numeric or alphanumeric column that is used as an identifying attribute. e.g. State Code ST_CD Description DESC - Description for a code, identifier or a key. e.g. State Description ST_DESC Indicator IND to denote indicator columns. e.g. Gender Indicator GNDR_IND Database Parameters Standardization: Some general guidelines are listed below that may be used for other physical parameters. Examples: Index Index IDX for index names. e.g. Credit Card Fact IDX01 CRDT_CARD_FCT_IDX01 Primary Key PK for Primary key constraint names. e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01 Alternate Keys AK for Alternate key names. e.g. Credit Card Fact AK01 CRDT_CARD_FCT_AK01 Foreign Keys FK for Foreign key constraint names. e.g. Credit Card Fact FK01 CRDT_CARD_FCT_FK01

Q)Steps to create a Data Model These are the general guidelines to create a standard data model and in real time, a data model may not be created in the same sequential manner as shown below. Based on the enterprises requirements, some of the steps may be excluded or included in addition to these. Sometimes, data modeler may be asked to develop a data model based on the existing database. In that situation, the data modeler has to reverse engineer the database and create a data model. 1 Get Business requirements. 2 Create High Level Conceptual Data Model. 3 Create Logical Data Model. 4 Select target DBMS where data modeling tool creates the physical schema. 5 Create standard abbreviation document according to business standard. 6 Create domain. 7 Create Entity and add definitions. 8 Create attribute and add definitions. 9 Based on the analysis, try to create surrogate keys, super types and sub types. 10 Assign datatype to attribute. If a domain is already present then the attribute should be attached to the domain. 11 Create primary or unique keys to attribute. 12 Create check constraint or default to attribute. 13 Create unique index or bitmap index to attribute. 14 Create foreign key relationship between entities. 15 Create Physical Data Model. 15 Add database properties to physical data model. 16 Create SQL Scripts from Physical Data Model and forward that to DBA. 17 Maintain Logical & Physical Data Model. 18 For each release (version of the data model), try to compare the present version with the previous version of the data model. Similarly, try to compare the data model with the database to find out the differences. 19 Create a change log document for differences between the current version and previous version of the data model. Q)Data Modeler Role Business Requirement Analysis: Interact with Business Analysts to get the functional requirements. Interact with end users and find out the reporting needs. Conduct interviews, brain storming discussions with project team to get additional requirements. Gather accurate data by data analysis and functional analysis. Development of data model: Create standard abbreviation document for logical, physical and dimensional data models. Create logical, physical and dimensional data models(data warehouse data modelling). Document logical, physical and dimensional data models (data warehouse data modelling). Reports: Generate reports from data model. Review: Review the data model with functional and technical team. Creation of database: Create sql code from data model and co-ordinate with DBAs to create database. Check to see data models and databases are in synch. Support & Maintenance: Assist developers, ETL, BI team and end users to understand the data model. Maintain change log for each data model.

Q)What is Conceptual Data Modeling Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE. Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review. Conceptual Data Model - Highlights CDM is the first step in constructing a data model in top-down approach and is a clear and accurate visual representation of the business of an organization. CDM visualizes the overall structure of the database and provides high-level information about the subject areas or data structures of an organization. CDM discussion starts with main subject area of an organization and then all the major entities of each subject area are discussed in detail. CDM comprises of entity types and relationships. The relationships between the subject areas and the relationship between each entity in a subject area are drawn by symbolic notation(IDEF1X or IE). In a data model, cardinality represents the relationship between two entities. i.e. One to one relationship, or one to many relationship or many to many relationship between the entities. CDM contains data structures that have not been implemented in the database. In CDM discussion, technical as well as non-technical team projects their ideas for building a sound logical data model Q)What is Enterprise Data Modeling? The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modeling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models. Enterprise Data Modeling is sometimes called as global business model and the entire information about the enterprise would be captured in the form of entities. Data Model Highlights When a enterprise logical data model is transformed to a physical data model, super types and sub types may not be as is. i.e. the logical and physical structure of super types and sub types may be entirely different. A data modeler has to change that according to the physical and reporting requirement. When a enterprise logical data model is transformed to a physical data model, length of table names, column names etc may exceed the maximum number of the characters allowed by the database. So a data modeler has to manually edit that and change the physical names according to database or organizations standards. One of the important things to note is the standardization of the data model. Since a same attribute may be present in several entities, the attribute names and data types should be standardized and a conformed dimension should be used to connect to the same attribute present in several tables. Standard Abbreviation document is a must so that all data structure names would be consistent across the data model. Q) Logical V/s Physical Data Model ? When a data modeler works with the client, his title may be a logical data modeler or a physical data modeler or combination of both. A logical data modeler designs the data model to suit business requirements, creates and maintains the lookup data, compares the versions of data model, maintains change log, generate reports from data model and whereas a physical data modeler has to know about the source and target databases properties.

A physical data modeler should know the technical-know-how to create data models from existing databases and to tune the data models with referential integrity, alternate keys, indexes and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering and so on.

The differences between a logical data model and physical data model is shown below. Logical vs Physical Data Modeling LDM :Represents business information and defines business rules PDM: Represents the physical implementation of the model in a database. LDM :Entity PDM :Table LDM:Attribute PDM:Column LDM:Primary Key PDM:Primary Key Constraint LDM:Alternate Key PDM:Unique Constraint or Unique Index LDM:Inversion Key Entry PDM:Non Unique Index LDM:Rule PDM:Check Constraint, Default Value LDM:Relationship PDM:Foreign Key LDM:Definition PDM:Comment Q)Relational vs Dimensional Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional Data Modeling is used in OLAP systems which are analytical based. In a data warehouse environment, staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before loaded into a data warehouse or data mart. In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like data warehouse.

Relational vs Dimensional RDM:Data is stored in RDBMS DDM:Data is stored in RDBMS or Multidimensional databases RDM:Tables are units of storage DDM:Cubes are units of storage RDM:Data is normalized and used for OLTP. Optimized for OLTP processing DDM:Data is denormalized and used in datawarehouse and data mart. Optimized for OLAP RDM:Several tables and chains of relationships among them DDM:Few tables and fact tables are connected to dimensional tables RDM:Volatile(several updates) and time variant DDM:Non volatile and time invariant RDM:Detailed level of transactional data DDM:Summary of bulky transactional data (Aggregates and Measures) used in business decisions

Q)Data Warehouse & Data Mart A data warehouse is a relational/multidimensional database that is designed for query and analysis rather than transaction processing. A data warehouse usually contains historical data that is derived from transaction data. It separates analysis workload from transaction workload and enables a business to consolidate data from several sources. In addition to a relational/multidimensional database, a data warehouse environment often consists of an ETL solution, an OLAP engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. There are three types of data warehouses: 1. Enterprise Data Warehouse - An enterprise data warehouse provides a central database for decision support throughout the enterprise. 2. ODS(Operational Data Store) - This has a broad enterprise wide scope, but unlike the real entertprise data warehouse, data is refreshed in near real time and used for routine business activity. 3. Data Mart - Datamart is a subset of data warehouse and it supports a particular region, business unit or business function. Data warehouses and data marts are built on dimensional data modeling where fact tables are connected with dimension tables. This is most useful for users to access data since a database can be visualized as a cube of several dimensions. A data warehouse provides an opportunity for slicing and dicing that cube along each of its dimensions. Data Mart: A data mart is a subset of data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, data can be derived

from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources. Q)Star Schema in detail In general, an organization is started to earn money by selling a product or by providing service to the product. An organization may be at one place or may have several branches. When we consider an example of an organization selling products throughtout the world, the main four major dimensions are product, location, time and organization. Dimension tables have been explained in detail under the section Dimensions. With this example, we will try to provide detailed explanation about STAR SCHEMA. Q)What is Star Schema? Star Schema is a relational database schema for representing multimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data. Steps in designing Star Schema Identify a business process for analysis(like sales). Identify measures or facts (sales dollar). Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). List the columns that describe each dimension.(region name, branch name, region name). Determine the lowest level of summary in a fact table(sales dollar). Important aspects of Star Schema & Snow Flake Schema In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies. Glossary: Hierarchy A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example, in a time dimension, a hierarchy might be used to aggregate data from the Month level to the Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals or not. Level A position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter, and Year levels. Fact Table A table in a star schema that contains facts and connected to dimensions. A fact table typically

has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation. Q)Snowflake Schema in detail A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table(sales). In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables and 1 fact table. The reason is that hierarchies(category, branch, state, and month) are being broken out of the dimension tables(PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively and shown separately. In OLAP, this Snowflake schema approach increases the number of joins and poor performance in retrieval of data. In few organizations, they try to normalize the dimension tables to save space. Since dimension tables hold less space, Snowflake schema approach may be avoided. Q)ETL Tools what to learn? With the help of ETL tools, we can create powerful target Data Warehouses without much difficulty. Following are the various options that we have to know and learn in order to use ETL tools. Software:

Vous aimerez peut-être aussi