Vous êtes sur la page 1sur 13

Deploying IBM Industry Data Models

on a Netezza appliance

Whitepaper
Deploying IBM Industry Data Models on a Netezza appliance
Page 2

About This Paper

The purpose of this paper is to provide guidance on how to deploy


the IBM Industry Data Models on a Netezza appliance.

Contents
This paper is divided into the following chapters:

Executive Summary Page 3


Chapter 1, “Selecting a subset of IBM Industry Data Model
content for Netezza deployment” explains the components of
Chapter 1 Page 4
the IBM Industry Data Models, describes how they are related, and
Selecting a subset of IBM Industry Data Model
introduces the concept of identifying a subset of the data
content for Netezza deployment
warehouse design model to be deployed.

Chapter 2 Page 6
Chapter 2, “Transforming the Logical Data Model to a Physical
Transforming the Logical Data Model to a
Data Model / DDL for deployment on a Netezza appliance”
Physical Data Model / DDL for deployment on a
outlines the steps required to transform the subset of the Logical
Netezza appliance
Data Model to a Physical Data Model, and from there generate
Data Definition Language (DDL) that can be run on a Netezza
Glossary of Abbreviated Terms Page 12
appliance.

References Page 12

Who Should Read This Document

- IT Architects
- IT Specialists
- Data Modelers
- Database Administrators
- Business Intelligence practitioners
Deploying IBM Industry Data Models on a Netezza appliance
Page 3

Executive Summary

IBM Industry Models combine deep expertise and industry best


practice in a form usable by both business and IT communities to
accelerate industry solutions. Part of the IBM InfoSphere portfolio,
IBM Industry Models contain data the industry models are based on experience of more than 500
warehouse design models that can be clients, and more than ten years of development.
deployed on a Netezza appliance.

The comprehensive IBM Industry Data Models contain data


warehouse design models, business terminology models and
analysis templates to accelerate the development of business
intelligence applications. The data warehouse design models can be
transformed to Data Definition Language (DDL) that can be
deployed on a Netezza appliance.

Netezza provides a family of data warehouse appliances and


software products that allow you to easily deploy high-performance
data analytics across your entire enterprise. Netezza appliances
architecturally integrate database, server and storage into a single,
easy to manage system that requires minimal set-up and ongoing
administration.
Deploying IBM Industry Data Models on a Netezza appliance
Page 4

Chapter 1: Selecting a subset of IBM Industry Data


Model content for Netezza deployment

IBM Industry Data Models contain a number of integrated models at


different levels of abstraction. At the highest level of abstraction is a
business terminology data model (conceptual model). This model
provides an integrated reference point for business terms and
descriptions of information used across the architecture. The
fundamental purpose of the conceptual model is to enable the
business concepts, which make up a particular issue, to be clearly
understood and communicated.

Industry business solution templates (requirements model) are a


IBM Industry Data Models contain a number
collection of business-centric Key Performance Indicators (KPIs)
of integrated models at different levels of
that make up the enterprise’s consolidated business reporting
abstraction, including an enterprise data
metrics and represent reporting best practices in areas such as
warehouse design model.
finance, risk and compliance.

The core of the model’s data-warehouse solution is the entity-


relationship design model (data warehouse design model) that
specifies the structures required to capture and store information
over time from across the enterprise. This extensive model supports
the integration of enterprise data for a wide range of business
issues, providing a re-usable source of data for ongoing and future
business intelligence projects.

The data warehouse design model is a Logical Data Model that can
be customized to meet the specific requirements of the business
and then transformed into a Physical Data Model / DDL to create the
data warehouse database.
Deploying IBM Industry Data Models on a Netezza appliance
Page 5

The models contain very broad coverage across an industry. In a


typical deployment scenario, one of the key first steps is to
identify the subset of the data warehouse design model that is in
scope for the deployment project. There are a number of features
in the Industry Models tooling that support this process.

The following IBM developerWorks article details the precise


steps involved in identifying a subset of the model to work with.
An important step in any Industry Data
The focus of the article is on deriving a subset data mart model
Models deployment is identifying a subset
from the IBM Banking Data Warehouse.
of the data warehouse design model that is
relevant to the current phase of the
http://www.ibm.com/developerworks/data/tutorials/dm-1003bankindustrymodel/
deployment project.

This subset data mart (logical) model forms the basis for the
transformation to a Physical Data Model, and subsequently to
DDL that can be executed on the Netezza appliance.
Deploying IBM Industry Data Models on a Netezza appliance
Page 6

Chapter 2: Transforming the Logical Data Model to


a Physical Data Model / DDL for deployment on a
Netezza appliance

The subset of the Logical Data Model identified in the previous


A logical data model (LDM) is a
chapter goes through a series of transformations steps, resulting in
representation of an organization's data,
a DDL script that can be run on the Netezza appliance to create
organized in terms of entities, attributes
relational database structures.
and relationships and is independent of
any particular data management
The set of transformation steps required to produce the DDL script is
technology.
as follows:
1. Transform the Logical Data Model to a Physical Data
Model
To begin the deployment process, the LDM
2. Define a partitioning strategy for distributing the data
must be transformed to a Physical Data
3. Generate a DDL script for Netezza
Model – a model which takes into account
4. Execute the DDL script on the Netezza appliance
the facilities and constraints of a given
database management system (e.g.
Netezza).
Step 1 – Transform the Logical Data Model to a Physical Data
Model

IBM’s data modeling product -- InfoSphere Data Architect (IDA) –


has the ability to transform a Logical Data Model into a Physical
Data Model, using the following procedure:
Deploying IBM Industry Data Models on a Netezza appliance
Page 7

In IDA, select the menu option: Data > Transform > Physical Data Model
In the resulting wizard, choose the following options:
Database: DB2 for Linux, Unix, and Windows
Note: DB2 is chosen because the DDL is syntactically
similar to Netezza DDL
Version: 9.7

Click Next followed by Finish to complete the creation of the Physical Data
Model.
Deploying IBM Industry Data Models on a Netezza appliance
Page 8

Step 2 – Define a partitioning strategy for distributing the data


One of the strengths of the Netezza appliance is that performance tuning
is dramatically simplified. Many of the configuration and tuning steps
required in other RDBMSs – for example: tablespace configuration,
physical & logical log sizing, block sizing, extent sizing, temp space
Good distribution is a allocation, etc. – are not required on Netezza
fundamental element of
performance. Good distribution is a fundamental element of performance. A distribution
method that distributes data evenly across all data slices is the single most
important factor that can influence the overall performance of the Netezza
database.

Netezza benefits from a simple partitioning strategy for distributing data,


which includes three options:
1. Explicitly define a distribution key in the Physical Data Model –
each table in a Netezza database can have one distribution key,
which consists of one to four columns. Although up to four
A distribution method that
columns is supported, the recommendation is to use one column
distributes data evenly across all
for the distribution key.
data slices is the single most
2. Use a round-robin distribution – this will distributed data
important factor that can
randomly, resulting in even distribution across the data slices.
influence overall performance of
3. Let Netezza choose a distribution key. There is no guarantee
the Netezza database.
what that key is and it can vary depending on the Netezza
software release, so this option is not recommended.

When choosing a distribution key consider the following factors:


• The more distinct the distribution key values, the better
• Tables used together should use the same columns for their
distribution key where possible
• If a particular key is used largely in equi-join clauses, then that
key is a good choice for the distribution key
• Check that there is no accidental processing skew when there is
a good record distribution
Deploying IBM Industry Data Models on a Netezza appliance
Page 9

A random distribution is often the best choice. To accommodate this, the


Netezza DDL generation utility (described in the next step) provides the
option to automatically generate a random distribution clause on each
table.

In some cases – for example, when co-location of data is desired – the


random distribution can be overridden by explicitly defining a distribution
key in the Physical Data Model

To explicitly define a distribution key in the IDA Physical Data Model:


• Select a table
• Select Properties > Distribution Key.
• Maintained by: choose HASHING
• Use the ellipse button to add one or more distribution columns. Up
to four columns can be used to define the distribution key.

Always review the distribution of data after implementation to ensure data


is distributed evenly and that data skew has been minimized.
Deploying IBM Industry Data Models on a Netezza appliance
Page 10

Step 3 – Generate DDL for the Netezza RDBMS


The Physical Data Model can be used to generate DDL that is syntactically
compatible with Netezza RDBMS. This is accomplished using a simple
Netezza DDL generation utility that takes the Physical Data Model as input,
and outputs a Netezza DDL script.

Downloading the Netezza DDL generation utility


The Netezza DDL generation utility is available from the IBM Fix Central
support site. Full details on how to download the utility are available at the
following URL:
http://www-01.ibm.com/support/docview.wss?uid=swg24028954

Executing the Netezza DDL generation utility


General syntax for the Netezza DDL generation utility:
generate_ddl_from_dbm.ps1 <input file> <output file>
<statement separator> <distribution method> <max
comment length> <name generation>

Input Description

input file Specifies the name of the input file i.e. the
physical data model (.dbm)

output file Specifies the name of the output file i.e. the
DDL script

statement Specifies the SQL statement separator. The


separator default value is a semicolon (;)

distribution method Specifies the distribution details.

Note: this setting only takes effect if a


distribution key is not explicitly defined in the
input model.

RANDOM: uses a round-robin distribution in


the event that an explicit distribution key is not
defined.
PK: uses a table’s primary key in the event
that an explicit distribution key is not defined.

max column length Imposes a limitation on the maximum length of


table/column comments. Valid values range
from 1 to 999.

name generation Column and table name generation options.


ENCLOSE: encloses table/column names in
quotation marks, avoiding issues with special
characters
REPLACE: table/column names are not
enclosed in quotation marks. Special
characters (e.g. ‘-‘) are replaced by a valid
alternative to avoid DDL execution errors
Deploying IBM Industry Data Models on a Netezza appliance
Page 11

Usage Examples
1) generate_ddl_from_dbm.ps1 ‘bdw_model.dbm’
‘bdw_nz_ddl.sql’ ‘;’ ‘random’ ‘999’ ‘REPLACE’
Generates a DDL script entitled bdw_nz_ddl.sql based on the
content of the Physical Data Model bdw_model.dbm, with the
following characteristics:
• Statement delimiter: ‘;’
• Data distribution: round-robin (if no explicit distribution key is
defined in the data model)
• Maximum table/column length is 999 characters
• Special characters (e.g. ‘-‘) are replaced with a valid alternative

2) generate_ddl_from_dbm.ps1 ‘bdw_model.dbm’
‘bdw_nz_ddl.sql’ ‘!’ ‘pk’ ‘255’ ‘ENCLOSE’
Generates a DDL script entitled bdw_nz_ddl.sql based on the
content of the Physical Data Model bdw_model.dbm, with the
following characteristics:
• Statement delimiter: ‘!’
• Data distribution: the table’s primary key is used as the distribution
key (if no explicit distribution key is defined in the data model)
• Maximum table/column length is 255 characters
• Table/column names are enclosed in quotation marks, to avoid
any issues with special characters

Step 4 – Execute the DDL on the Netezza appliance


Finally, the DDL script that has been generated from the IBM Industry Data
Model can be executed on the Netezza RDBMS database to create the
relational database structures.

The following command can be used to execute the DDL script:

nzsql -d dbname -u user -pw password -f bdw_nz_ddl.sql


Deploying IBM Industry Data Models on a Netezza appliance
Page 12

Glossary of Abbreviated Terms

The following abbreviated terms were used in this article:

DDL Data Definition Language


IDA InfoSphere Data Architect
KPI Key Performance Indicator
LDM Logical Data Model
SPU Snippet Processing Unit
RDBMS Relational Database Management System

References
“Scoping the IBM Industry Model for banking using Enterprise Model
Extender and InfoSphere Data Architect” by Hermann Voellinger

This article is available on IBM’s developerWorks site at:


http://www.ibm.com/developerworks/data/tutorials/dm-1003bankindustrymodel/
Deploying IBM Industry Data Models on a Netezza appliance
Page 13

IBM Industry Models & Assets


IBM Ireland
Building 6
Dublin Technology Campus
Damastown Industrial Estate
Mulhuddart
Dublin 15
Ireland

The IBM home page can be found on the Internet


at ibm.com

IBM is a registered trademark of International


Business Machines Corporation.

References in this publication to IBM products,


programs or services do not imply that IBM
intends to make these available in all countries in
which IBM operates. Any reference to an IBM
product, program or service is not intended to
imply that only IBM ’s product, program
or service may be used. Any functionally
equivalent product, program or service may be
used instead.
This publication is for general guidance only.
© Copyright IBM Corp. 2010. All Rights Reserved.

Part No :
Release :