Vous êtes sur la page 1sur 47


Capacity Planning -
Version 8.6

Murat Yesilsirt, Principal Consultant


• Key Architecture Points

• Environment Sizing vs. Capacity Planning
• Environment Sizing
• Capacity Planning

• Tools
• Sizing Exercise
• Customer Examples

Capacity Planning

• Ensures that sufficient capacity is available at all

times to meet business requirements
• Integration capacity is not simply the sum of
capacity needs of each application
• Time dimension - Involves more than
performance of the system’s components,
individually or collectively
• Also deals with resolving incidents and
identifying problems relating to capacity issues

Have you ever asked yourself?

• Can I save money with server consolidation?

• Could I move my data faster with an expanded
• Is my Informatica Server ‘big’ enough?
• How much more data could I move with my existing
system configuration?
• How much faster could I execute my existing loads?
• If I had to add 1 more project – do I have sufficient
• How about x more projects?

Server Sizing/Capacity Planning Goals

• Meet future requirements

• Meet performance requirements
• Satisfy load window requirements
• Minimize contentions due to lack of resources
• Lower maintenance cost and cost of ownership
• Optimize capital expenditures

Key Architecture Points

Data Integration Environment:
Key Architecture Points
Source File Target File
Server and Server and

Network Network

Sources Targets


Informatica Real Time Data Integration Transactional
System –
Relational Source
(Oracle, DB2, etc)

Customer Portal Web Application Integrated

Server Customer Portal

Mainframe System

Administration Portal

Orchestration Engine Acquired Mainframe

Data Steward Portal

Acquired Mid-Range
AS/400 System

PowerCenter Data Integration
System Characteristics
• Block processing
• Parallelism – Multiple threads, partitioning
• 64-bit Option and Caching
• Random Reads and Sequential Reads
• Database and File processing
• String Manipulation and Unicode
• Pushdown Optimization
• Shared File System
• Checkpoint Recovery
• Web Services

Environment Sizing

Environment Sizing vs. Capacity Planning

• Environment Sizing
• New software implementation/install
• Extremely rudimentary models to predict estimated need
• Rarely perfect

• Capacity Planning (Existing Environment)

• Accuracy of exercise is based on statistics from existing
• New projects on existing environment
• Ensure existing projects are not affected
• Load window
• Load times
• Performance
• PowerCenter upgrade on existing environment
• Use of new PowerCenter enhancements for performance gains on
existing hardware after upgrade

Environment Sizing

• New Environment
• Conversion of custom code and stored procedures
• Estimation process because there is no existing environment
• Consider various architectural options and how they will affect the
• Windows vs. UNIX/LINUX
• Shared PowerCenter Environment vs. Dedicated Environment
• Virtualization
• Hardware sizing considerations
• Memory
• I/O Capacity
• Disk Space
• Network Bandwidth
• Repository Database

Environment Sizing Inputs

• Data volumes
• Mapping complexity
• Number of mappings
• Concurrent work load
• Peak work load
• Expected growth

Environment Sizing Methodology
• Gather performance requirements (Volume, load window etc.)
• Document assumptions e.g. planned architecture, usage period,
geographical distribution of data & users
• Evaluate alternatives – Commodity hardware vs. High-End SMP
• 75% CPU Utilization or less
• Minimize memory paging
• Consider future growth
• Use proof of concept benchmark testing to validate
• Based on high level estimation factors –
• 2MB per CPU per second avg.
• 2-4 GB Memory per Core
• Cross check with other implementations

Environment Sizing Example
Informatica PowerCenter Sizing Questions
Data Volume Rate Aggregates and Sorting
Data volumes are a critical aspect as the CPU cycles must be available to handle the
data volume in the appropriate timeframe. Getting a reasonable estimate of the volume
of data to be moved on a nightly/daily basis is the cornerstone of a sizing effort.
Method 1 (Volume based) What is the expected use of aggregates/sort (Enter a "1")
Number of Gigabytes per hour 8 Low(25% or less) 0
Number of simultaneous jobs on average? 5 Medium (25% to 75%) 1
Method 2 (Existing load process) High (75% or more) 0
How many loads? 10
How much data is being moved (in GB)? 0.5 Data Volume Growth
What is your load window in minutes? 720 What is the expected yearly data growth (%) 20%
Method 3 (Expected load process)
How many target tables do you have to load? 250 Operating System
Size of data to load (source data) in GB? 10 Unix or NT Unix
What is the load window in minutes? 480 64 bit or 32 bit 64

Continuous and/or Real Time Lookup Sizing

The assumption is that continuous and/or real-time workloads will require more CPU Lookups (caching data tables to match values) require additional CPU and
and memory. This is because there is less flexibility in workload management. RT RAM. It is an important factor in sizing the box. Use an educated guess as to
sessions must run, and they must run now and they should not be slowed by other the size of your lookup requirements. If you are loading a warehouse, think in
processes terms of the size of th
What percent of sessions/loads will be real time? (Enter a '1') Percent of lookups with > 250k rows? (Enter a "1")
25% or less 1 Low(25% or less) 0
25-60% 0 Medium (25% to 75%) 1
60-90% 0 High (75% or more) 0
90%+ 0

Load Window Criticality Application Type(s)

How critical is it that the load windows is always met? (Enter a '1') What sort of application(s) will PowerCenter be used for?
Not at all important 0
Somewhat Important 0
Very Important 0
Critical 1 I

Other Considerations
Please include any other considerations you feel are important to the sizing effort. Any environmental information, restrictions,
needs should be listed here.

Capacity Planning

Capacity Planning

• Existing Environment
• Measure actual performance in YOUR environment
• Use real world performance information to understand
current unused capacity
• Use linear scalability to predict future needs
• Key review points :
• Current performance
• Data growth projections
• Future integration needs
• Consider Impacts of any technology shift/change
• Web Services
• Grid/HA
• XML Processing etc.
• New server technologies

Capacity Planning Methodology
• Gather performance information
• Volume (data/records)
• CPU Usage
• Memory Usage
• Network Usage
• File System Usage
• System Characteristics (CPU speed etc.)
• Document future assumptions e.g. planned architecture, usage period,
geographical distribution of data & users
• Review future growth needs
• Review data growth projections
• Plan for 75% CPU utilization or less
• Determine required capacity
• Update/expand environment as needed
• Use benchmark testing with real production like data


• Operations
• Informatica Administrator
• System Administrator
• Network Specialist
• Developer
• Business Analyst



• Monitoring tools to help determine how the

servers are performing
• Reports to provide metrics about how
PowerCenter is being used
• Analysis to find out your current maximum
• Estimation to determine required capacity for
future growth


• Repository Reports
• Repository Queries

• Key Results
• Number of records from SQ per node
• Number of session runs per node per day
• Number of concurrent session runs per node per hour
• CPU/Memory used per session

• Reports information about processes, memory, paging, block IO, traps, and cpu activity
• vmstat 5 10 – run with 5 sec delay 10 times
• Processes in the run queue (procs r) procs r consistently greater than the number of CPUs
is a bottleneck
• Idle time (cpu id) cpu id is consistenly 0 indicates CPU issue
• Scan rate (sr) sr rate continuously over 200 pages per second indicates a memory shortage
• Key Results
• Memory usage statistics
• Report on CPU, input/output statistics for devices and partitions
• iostat 5 10 – run with 5 sec delay 10 times
• Reads/writes per second (r/s , w/s) Consistently high reads/writes indicates disk issues
• Percentage busy (%b) %b > 5 may point to I/O bottleneck
• Service time (svc_t) svc_t > 30 milliseconds requires faster disk/controller
• Key Results
• Disk usage results


• Displays information about networkinterfaces on the
• Network connections, routing tables, and interface
• Shows a list of hosts using the network
• Provides information about traffic generated by each host


sar – System Activity Reporter

• Exists on many UNIX platforms
• Examine live statistics
• sar [options…] t n
• t is number of seconds per sample
• n is number of samples

• Save sar data for later analysis

• sar –o filename t n
• Recall CPU usage: sar –u –f filename
• Recall Disk usage: sar –d –f filename
• You can also specify time windows (-s, -e) and alternate interval with –I

• Key Results
• Consolidated CPU/Memory/Disk usage statistics


sar – Disk Utilization

• sar –d t n
• Average I/O size in bytes = (blks/s*512 bytes)/(r+w/s)
• % busy is a good indicator of disk bottleneck
• Shows disk devices -- can be tough to trace back to specific logical

vega7077-root-># sar -d 60 1

HP-UX vega7077 B.11.23 U ia64 10/25/07

10:25:24 device %busy avque r+w/s blks/s avwait avserv

10:26:24 c2t6d0 0.65 0.50 1 23 0.00 9.14
c76t4d3 0.02 0.50 0 0 0.01 10.03
c140t2d0 3.13 0.50 2 180 0.00 18.21
c142t2d0 3.88 0.50 2 180 0.00 22.38
c148t2d0 0.28 0.50 2 180 0.00 1.69
c150t2d0 0.42 0.50 2 176 0.00 2.37
c108t2d0 3.03 0.50 2 179 0.00 17.67

sar – CPU utilization
• sar –u t n
• %sys is system/kernel time
• %usr is user space time
• %wio is Percent of time “waiting on I/O”
• wio is the best indicator if I/O is a bottleneck
• Directly reflects how much performance is lost waiting on I/O

vega7077-root-># sar -u 60 1

HP-UX vega7077 B.11.23 U ia64 10/25/07

10:49:31 %usr %sys %wio %idle

10:50:31 1 5 6 87


• Provides a dynamic real-time view of a running
• Displays system summary information as well as
a list of tasks currently being managed
• Useful for shared environments to identify each
application process and their CPU/memory

Windows perfmon


Capacity Planning Example

• Before upgrade to PowerCenter 8.6, planning for the new

environment is initiated
• Current hardware on Unix
• Business activity is expected to increase 20% annually
• Two new Business Units are expected to use Informatica
• Explore PowerCenter 8.6 performance enhancements

Capacity Planning Example
• Peak Load Time – 1am to 1:35 am
• Number of Sessions – 45
• Most concurrent sessions – 15
• Total Data Processed – 10 GB
• Primarily flat file to DBMS and DBMS to DBMS data load
• Server is 4 CPU with 16gb of RAM
• Most sessions include lookups, but with fairly reasonable
cache size (ie. no 8gb customer master)
• Total Load Window requirement is 2 hrs (done by 3am)

Capacity Planning Steps
• Using repository reports
establish a timeline for loads
• Daily
• Weekly

Extract + Audit Completed

• Monthly
• Determine the complexity of

Dimension Loads

Daily Fact Loads


Extract Files
• High: Multiple sources or



targets, 5 or more lookups,
complex logic
• Medium: Multiple sources
or targets, 2-5 lookups or
1:00 AM

1:10 AM

1:20 AM

1:30 AM

1:40 AM

1:50 AM
2:00 AM

2:10 AM

2:20 AM

2:30 AM

2:40 AM

2:50 AM

3:00 AM
an aggregator, full update
• Low: Straight Thru Mapping
less than 3 lookups

Capacity Planning Steps
• Link the results of system metrics to the load timeline
• Identify the peaks in CPU/memory/disk utilizations
Time CPU 1 CPU 2 CPU 3 CPU 4 Avg RAM I/O
1:01 95% 90% 85% 25% 74% 90% Ok
1:11 90% 90% 65% 3% 62% 35% Good
1:21 90% 50% 10% 3% 38% 50% Good
1:31 75% 25% 3% 3% 25% 25% Good
Avg 87% 64% 41% 9% 50% 50% Good

Data Seconds Data/Sec Data/CPU/Sec Max Expected

10GB 2100 4.8mb 1.2mb 2.4/mb/CPU

Capacity Planning Steps

• Review bottlenecks to reveal areas of improvement with

addition CPU/memory/disks
• This may also result in code fixes, but performance tuning is
only a short term fix
• Value of new Informatica features e.g. using OS profiles for
more granular information and process ownership
• Consider architectural changes in the new environment such as
Enterprise Grid Option
• Start making projections based on the input available
• My current peak CPU/memory utilization is at 50% and I am expecting
20% growth per group and 2 new groups will join

Questions for the Example :
• Do you need more CPU?
• Do you need more RAM?
• How much more expected capacity do you have without
extending the current load window?
• How much more capacity do you have until you no longer
meet load window?
• What could you do to ‘free up’ more capacity?

Pitfalls and Common Mistakes
• Apples to Apples
• “I talked to <customer> at the user group and they are moving 1,000 rows a second –
why aren’t we experiencing the same?”
• “I read an Informatica benchmark and they moved a terabyte in 38 min, which showed
4mb a second per processor – mine should be the same performance right?”

• Growth Projections
• “Every day we process 100,000 records that equal 5mb of data thus our warehouse is
increasing by 5mb a day. “
• “Every year our warehouse grows by 25% so our daily capacity must be growing by
25%. “

• Adding Horsepower
• ‘If I add more CPU and RAM my loads will be faster.”
• “My hardware vendor promised their new CPU’s are 2x faster so my load should finish
in ½ the time.”

• Root Cause
• “My performance is poor, it must be the Informatica Platform.”
• “I’m seeing very low rows per second processed, I must have a slow server”

Capacity Planning Results

• Better to start low, observe the adoption rates and usage

and then adjust upward as necessary
• Vertical – Expandable servers
• Horizontal – Grid Architecture

• Start with adding CPU and memory to existing server

• Then increase number of servers with Grid Architecture
• Allocate abundant storage for infa_shared directory
• Use flexible storage architecture e.g. start with 4 stripes
over 4 LUN’s, then grow to 4 stripes over 8 LUN’s to
expand from 100 GB to 200 GB

Customer Examples

Customer Example 1
• Planning for release to production for PowerExchange CDC
• First a benchmark test was conducted with a subset of the data
• Projected data volumes was used for the estimation
• Assumptions were documented: Projected data volumes and benchmark
• The disk space used for the file system during the benchmark test was
• For actual data volumes, session logs are expected to use about 13 GB
• There will be process to purge log files older than two days
• Based on this 26 GB will be allocated for Session Log directory
• Based on the number of lookups, the sizes of the lookup tables, and
concurrent sessions, 20 GB for Cache directory should be allocated

Customer Example 2
• Provide capacity planning assistance for upgrade and server purchase
Some key questions
What is the total volume of data to move?
• Current task – Data volume is less than 2 GB per Month.
What is the largest table (bytes and rows)? Is there any key on this table that could
be used to partition load sessions, if needed?
• Existing task – Largest table 6 M rows with average record length 1000 bytes
What is the batch window available for the load?
• Existing task – Batch window is around 6 hours. Future task – 3 hours Week days, 10 hours week end
What is the expected growth?
• The percentage of data volume growth has been projected to be 25% each year.
• Currently there are 50 interfaces loading an approximate average of 200MB of data each
• Compute a “base size” using the key driving factors for CPU and RAM. Then, adjust this base size
according to some key attributes of the job load
• The key driving factors for calculating the base CPU size are “cpu mb per sec” (data rate) and “cpu per
session” (job load)
• Data Rate: CPU mb per sec = cpu mb per sec factor * number of GB/hour

Customer Example 3
• Upgrade, Server Consolidation, and ICC Organization

Informatica PowerCenter Sizing Results
Component Details Ram Factor Initial CPU Adjusted CPU
Data Volume Rate Method 1 Using CPU/MB sec factor 4.9 2.9 4.9
Data Volume Rate Method 2 Using CPU/MB sec factor 4.9 2.9 4.9
Data Volume Rate Method 3 Using CPU/MB sec factor 4.9 2.9 4.9
Base Size 4.9 2.9 4.9
Continuous And/Or Real Time Ranges (0,40%,60%,100%) 0% 0% 0%
Load Window Criticality Ranges (-30%,0,50%,75%) 0% 0% 0%
Aggregates and Sorting 0% 0% 0%
Data Volume Growth 0% 0% 0%
Operating System Unix vs NT and 32 vs 64 bit 100% -25% -25%
Lookup Sizing Ranges (Ram = -20%,0,50%) 0% 0% 0%
Application Types + Other Subjective Factor (in %)
Total Adjustment Factor 100% -25% -25%

Final Sizing Raw 9.8 2.175 3.675

Final Sizing Adjusted 10 2 4

Sizing Upper Range 15 4 6

Informatica would recommend a PowerCenter server(s) with 4 to 6 CPUs and 10 to 15 GB of RAM.

Customer Example 4
• Upgrade, High Business Growth, and End of Life for Servers

4 Nodes with 2 Dual Core CPU and 32 GB Memory each


• Capacity planning is a complicated process that requires

input from various sources
• Testing the PowerCenter loads in your environment is the
most effective way to estimate system behavior
• Choose a flexible architecture to allow incremental growth
• Validate your conclusions with Informatica Professional
• Informatica HACOE at your service for reference
architecture and testing


Thank you