Capacity Planning PDD Final

1
Capacity Planning -
Version 8.6
Murat Yesilsirt, Principal Consultant
2
Agenda
• Key Architecture Points

• Environment Sizing vs. Capacity Planning
• Environment Sizing
• Capacity Planning
• Tools
• Sizing Exercise
• Customer Examples
3
Capacity Planning
• Ensures that sufficient capacity is available at all

times to meet business requirements
• Integration capacity is not simply the sum of
capacity needs of each application
• Time dimension - Involves more than
performance of the system’s components,
individually or collectively
• Also deals with resolving incidents and
identifying problems relating to capacity issues
4
Have you ever asked yourself?
• Can I save money with server consolidation?

• Could I move my data faster with an expanded
environment?
• Is my Informatica Server ‘big’ enough?
• How much more data could I move with my existing
system configuration?
• How much faster could I execute my existing loads?
• If I had to add 1 more project – do I have sufficient
capacity?
• How about x more projects?
5
Server Sizing/Capacity Planning Goals
• Meet future requirements

• Meet performance requirements
• Satisfy load window requirements
• Minimize contentions due to lack of resources
• Lower maintenance cost and cost of ownership
• Optimize capital expenditures
6
Key Architecture Points
7
Data Integration Environment:
Key Architecture Points
Source File Target File
Server and Server and
RDBMS RDBMS
CPU/RAM CPU/RAM
Network Network
PowerCenter
Server
Sources Targets
Server
CPU/RAM
8
Informatica Real Time Data Integration Transactional
System –
Relational Source
(Oracle, DB2, etc)
Customer Portal Web Application Integrated

Server Customer Portal
Database
Mainframe System
Administration Portal
PowerCenter
Orchestration Engine Acquired Mainframe
System
Data Steward Portal
Acquired Mid-Range
Exception
AS/400 System
Management
Database
9
PowerCenter Data Integration
System Characteristics
• Block processing
• Parallelism – Multiple threads, partitioning
• 64-bit Option and Caching
• Random Reads and Sequential Reads
• Database and File processing
• String Manipulation and Unicode
• Pushdown Optimization
• Shared File System
• Checkpoint Recovery
• Web Services
10
Environment Sizing
11
Environment Sizing vs. Capacity Planning
• Environment Sizing
• New software implementation/install
• Extremely rudimentary models to predict estimated need
• Rarely perfect
• Capacity Planning (Existing Environment)

• Accuracy of exercise is based on statistics from existing
environment
• New projects on existing environment
• Ensure existing projects are not affected
• Load window
• Load times
• Performance
• PowerCenter upgrade on existing environment
• Use of new PowerCenter enhancements for performance gains on
existing hardware after upgrade
12
Environment Sizing
• New Environment
• Conversion of custom code and stored procedures
• Estimation process because there is no existing environment
• Consider various architectural options and how they will affect the
sizing
• GRID/HA
• Windows vs. UNIX/LINUX
• Shared PowerCenter Environment vs. Dedicated Environment
• Virtualization
• Hardware sizing considerations
• CPU
• Memory
• I/O Capacity
• Disk Space
• Network Bandwidth
• Repository Database
13
Environment Sizing Inputs
• Data volumes
• Mapping complexity
• Number of mappings
• Concurrent work load
• Peak work load
• Expected growth
14
Environment Sizing Methodology
• Gather performance requirements (Volume, load window etc.)
• Document assumptions e.g. planned architecture, usage period,
geographical distribution of data & users
• Evaluate alternatives – Commodity hardware vs. High-End SMP
• 75% CPU Utilization or less
• Minimize memory paging
• Consider future growth
• Use proof of concept benchmark testing to validate
• Based on high level estimation factors –
• 2MB per CPU per second avg.
• 2-4 GB Memory per Core
• Cross check with other implementations
15
Environment Sizing Example
Informatica PowerCenter Sizing Questions
Data Volume Rate Aggregates and Sorting
Data volumes are a critical aspect as the CPU cycles must be available to handle the
data volume in the appropriate timeframe. Getting a reasonable estimate of the volume
of data to be moved on a nightly/daily basis is the cornerstone of a sizing effort.
Method 1 (Volume based) What is the expected use of aggregates/sort (Enter a "1")
Number of Gigabytes per hour 8 Low(25% or less) 0
Number of simultaneous jobs on average? 5 Medium (25% to 75%) 1
Method 2 (Existing load process) High (75% or more) 0
How many loads? 10
How much data is being moved (in GB)? 0.5 Data Volume Growth
What is your load window in minutes? 720 What is the expected yearly data growth (%) 20%
Method 3 (Expected load process)
How many target tables do you have to load? 250 Operating System
Size of data to load (source data) in GB? 10 Unix or NT Unix
What is the load window in minutes? 480 64 bit or 32 bit 64
Continuous and/or Real Time Lookup Sizing
The assumption is that continuous and/or real-time workloads will require more CPU Lookups (caching data tables to match values) require additional CPU and
and memory. This is because there is less flexibility in workload management. RT RAM. It is an important factor in sizing the box. Use an educated guess as to
sessions must run, and they must run now and they should not be slowed by other the size of your lookup requirements. If you are loading a warehouse, think in
processes terms of the size of th
What percent of sessions/loads will be real time? (Enter a '1') Percent of lookups with > 250k rows? (Enter a "1")
25% or less 1 Low(25% or less) 0
25-60% 0 Medium (25% to 75%) 1
60-90% 0 High (75% or more) 0
90%+ 0
Load Window Criticality Application Type(s)

How critical is it that the load windows is always met? (Enter a '1') What sort of application(s) will PowerCenter be used for?
Not at all important 0
Somewhat Important 0
Very Important 0
Critical 1 I
Other Considerations
Please include any other considerations you feel are important to the sizing effort. Any environmental information, restrictions,
needs should be listed here.
16
Capacity Planning
17
Capacity Planning
• Existing Environment
• Measure actual performance in YOUR environment
• Use real world performance information to understand
current unused capacity
• Use linear scalability to predict future needs
• Key review points :
• Current performance
• Data growth projections
• Future integration needs
• Consider Impacts of any technology shift/change
• Web Services
• Grid/HA
• XML Processing etc.
• New server technologies
18
Capacity Planning Methodology
• Gather performance information
• Volume (data/records)
• CPU Usage
• Memory Usage
• Network Usage
• File System Usage
• System Characteristics (CPU speed etc.)
• Document future assumptions e.g. planned architecture, usage period,
geographical distribution of data & users
• Review future growth needs
• Review data growth projections
• Plan for 75% CPU utilization or less
• Determine required capacity
• Update/expand environment as needed
• Use benchmark testing with real production like data
19
Roles
• DBA
• Operations
• Informatica Administrator
• System Administrator
• Network Specialist
• Developer
• Business Analyst
20
Tools
21
Tools
• Monitoring tools to help determine how the

servers are performing
• Reports to provide metrics about how
PowerCenter is being used
• Analysis to find out your current maximum
capacity
• Estimation to determine required capacity for
future growth
22
Tools
• Repository Reports
• Repository Queries
• OPB_SWIDGINST_LOG, OPB_TASK_INST_RUN,
OPB_WFLOW_RUN, OPB_TASK_STATS
• Key Results
• Number of records from SQ per node
• Number of session runs per node per day
• Number of concurrent session runs per node per hour
• CPU/Memory used per session
23
Tools
vmstat
• Reports information about processes, memory, paging, block IO, traps, and cpu activity
• vmstat 5 10 – run with 5 sec delay 10 times
• Processes in the run queue (procs r) procs r consistently greater than the number of CPUs
is a bottleneck
• Idle time (cpu id) cpu id is consistenly 0 indicates CPU issue
• Scan rate (sr) sr rate continuously over 200 pages per second indicates a memory shortage
• Key Results
• Memory usage statistics
iostat
• Report on CPU, input/output statistics for devices and partitions
• iostat 5 10 – run with 5 sec delay 10 times
• Reads/writes per second (r/s , w/s) Consistently high reads/writes indicates disk issues
• Percentage busy (%b) %b > 5 may point to I/O bottleneck
• Service time (svc_t) svc_t > 30 milliseconds requires faster disk/controller
• Key Results
• Disk usage results
24
Tools
netstat
• Displays information about networkinterfaces on the
system
• Network connections, routing tables, and interface
statistics
ntop
• Shows a list of hosts using the network
• Provides information about traffic generated by each host
25
Tools
sar – System Activity Reporter

• Exists on many UNIX platforms
• Examine live statistics
• sar [options…] t n
• t is number of seconds per sample
• n is number of samples
• Save sar data for later analysis

• sar –o filename t n
• Recall CPU usage: sar –u –f filename
• Recall Disk usage: sar –d –f filename
• You can also specify time windows (-s, -e) and alternate interval with –I
• Key Results
• Consolidated CPU/Memory/Disk usage statistics
26
Tools
sar – Disk Utilization

• sar –d t n
• Average I/O size in bytes = (blks/s*512 bytes)/(r+w/s)
• % busy is a good indicator of disk bottleneck
• Shows disk devices -- can be tough to trace back to specific logical
volume
vega7077-root-># sar -d 60 1
HP-UX vega7077 B.11.23 U ia64 10/25/07
10:25:24 device %busy avque r+w/s blks/s avwait avserv

10:26:24 c2t6d0 0.65 0.50 1 23 0.00 9.14
c76t4d3 0.02 0.50 0 0 0.01 10.03
c140t2d0 3.13 0.50 2 180 0.00 18.21
c142t2d0 3.88 0.50 2 180 0.00 22.38
c148t2d0 0.28 0.50 2 180 0.00 1.69
c150t2d0 0.42 0.50 2 176 0.00 2.37
c108t2d0 3.03 0.50 2 179 0.00 17.67
27
Tools
sar – CPU utilization
• sar –u t n
• %sys is system/kernel time
• %usr is user space time
• %wio is Percent of time “waiting on I/O”
• wio is the best indicator if I/O is a bottleneck
• Directly reflects how much performance is lost waiting on I/O
operations
vega7077-root-># sar -u 60 1
HP-UX vega7077 B.11.23 U ia64 10/25/07
10:49:31 %usr %sys %wio %idle

10:50:31 1 5 6 87
28
Tools
top
• Provides a dynamic real-time view of a running
system
• Displays system summary information as well as
a list of tasks currently being managed
• Useful for shared environments to identify each
application process and their CPU/memory
consumption
29
Tools
Windows perfmon
30
Example
31
Capacity Planning Example
• Before upgrade to PowerCenter 8.6, planning for the new

environment is initiated
• Current hardware on Unix
• Business activity is expected to increase 20% annually
• Two new Business Units are expected to use Informatica
platform
• Explore PowerCenter 8.6 performance enhancements
32
Capacity Planning Example
• Peak Load Time – 1am to 1:35 am
• Number of Sessions – 45
• Most concurrent sessions – 15
• Total Data Processed – 10 GB
• Primarily flat file to DBMS and DBMS to DBMS data load
• Server is 4 CPU with 16gb of RAM
• Most sessions include lookups, but with fairly reasonable
cache size (ie. no 8gb customer master)
• Total Load Window requirement is 2 hrs (done by 3am)
33
Capacity Planning Steps
• Using repository reports
establish a timeline for loads
• Daily
• Weekly
Extract + Audit Completed

• Monthly
• Determine the complexity of
Dimension Loads
Daily Fact Loads

mappings
Extract Files
• High: Multiple sources or
Validations
Validations
Validations
targets, 5 or more lookups,
complex logic
• Medium: Multiple sources
or targets, 2-5 lookups or
1:00 AM
1:10 AM
1:20 AM
1:30 AM
1:40 AM
1:50 AM
2:00 AM
2:10 AM
2:20 AM
2:30 AM
2:40 AM
2:50 AM
3:00 AM
an aggregator, full update
strategy
• Low: Straight Thru Mapping
less than 3 lookups
34
• Link the results of system metrics to the load timeline
• Identify the peaks in CPU/memory/disk utilizations
Time CPU 1 CPU 2 CPU 3 CPU 4 Avg RAM I/O
1:01 95% 90% 85% 25% 74% 90% Ok
1:11 90% 90% 65% 3% 62% 35% Good
1:21 90% 50% 10% 3% 38% 50% Good
1:31 75% 25% 3% 3% 25% 25% Good
Avg 87% 64% 41% 9% 50% 50% Good
Data Seconds Data/Sec Data/CPU/Sec Max Expected
10GB 2100 4.8mb 1.2mb 2.4/mb/CPU
35
• Review bottlenecks to reveal areas of improvement with

addition CPU/memory/disks
• This may also result in code fixes, but performance tuning is
only a short term fix
• Value of new Informatica features e.g. using OS profiles for
more granular information and process ownership
• Consider architectural changes in the new environment such as
Enterprise Grid Option
• Start making projections based on the input available
• My current peak CPU/memory utilization is at 50% and I am expecting
20% growth per group and 2 new groups will join
36
Questions for the Example :
• Do you need more CPU?
• Do you need more RAM?
• How much more expected capacity do you have without
extending the current load window?
• How much more capacity do you have until you no longer
meet load window?
• What could you do to ‘free up’ more capacity?
37
Pitfalls and Common Mistakes
• Apples to Apples
• “I talked to <customer> at the user group and they are moving 1,000 rows a second –
why aren’t we experiencing the same?”
• “I read an Informatica benchmark and they moved a terabyte in 38 min, which showed
4mb a second per processor – mine should be the same performance right?”
• Growth Projections
• “Every day we process 100,000 records that equal 5mb of data thus our warehouse is
increasing by 5mb a day. “
• “Every year our warehouse grows by 25% so our daily capacity must be growing by
25%. “
• Adding Horsepower
• ‘If I add more CPU and RAM my loads will be faster.”
• “My hardware vendor promised their new CPU’s are 2x faster so my load should finish
in ½ the time.”
• Root Cause
• “My performance is poor, it must be the Informatica Platform.”
• “I’m seeing very low rows per second processed, I must have a slow server”
38
Capacity Planning Results
• Better to start low, observe the adoption rates and usage

and then adjust upward as necessary
• Vertical – Expandable servers
• Horizontal – Grid Architecture
• Start with adding CPU and memory to existing server

• Then increase number of servers with Grid Architecture
• Allocate abundant storage for infa_shared directory
• Use flexible storage architecture e.g. start with 4 stripes
over 4 LUN’s, then grow to 4 stripes over 8 LUN’s to
expand from 100 GB to 200 GB
39
Customer Examples
40
Customer Example 1
Scenario
• Planning for release to production for PowerExchange CDC
• First a benchmark test was conducted with a subset of the data
• Projected data volumes was used for the estimation
• Assumptions were documented: Projected data volumes and benchmark
results
• The disk space used for the file system during the benchmark test was
recorded
Recommendations
• For actual data volumes, session logs are expected to use about 13 GB
daily
• There will be process to purge log files older than two days
• Based on this 26 GB will be allocated for Session Log directory
• Based on the number of lookups, the sizes of the lookup tables, and
concurrent sessions, 20 GB for Cache directory should be allocated
41
Customer Example 2
Scenario
• Provide capacity planning assistance for upgrade and server purchase
Some key questions
What is the total volume of data to move?
• Current task – Data volume is less than 2 GB per Month.
What is the largest table (bytes and rows)? Is there any key on this table that could
be used to partition load sessions, if needed?
• Existing task – Largest table 6 M rows with average record length 1000 bytes
What is the batch window available for the load?
• Existing task – Batch window is around 6 hours. Future task – 3 hours Week days, 10 hours week end
What is the expected growth?
• The percentage of data volume growth has been projected to be 25% each year.
• Currently there are 50 interfaces loading an approximate average of 200MB of data each
Recommendation
• Compute a “base size” using the key driving factors for CPU and RAM. Then, adjust this base size
according to some key attributes of the job load
• The key driving factors for calculating the base CPU size are “cpu mb per sec” (data rate) and “cpu per
session” (job load)
• Data Rate: CPU mb per sec = cpu mb per sec factor * number of GB/hour
42
Customer Example 3
Scenario
• Upgrade, Server Consolidation, and ICC Organization
Recommendation
Informatica PowerCenter Sizing Results
Component Details Ram Factor Initial CPU Adjusted CPU
Data Volume Rate Method 1 Using CPU/MB sec factor 4.9 2.9 4.9
Base Size 4.9 2.9 4.9
Continuous And/Or Real Time Ranges (0,40%,60%,100%) 0% 0% 0%
Load Window Criticality Ranges (-30%,0,50%,75%) 0% 0% 0%
Aggregates and Sorting 0% 0% 0%
Data Volume Growth 0% 0% 0%
Operating System Unix vs NT and 32 vs 64 bit 100% -25% -25%
Lookup Sizing Ranges (Ram = -20%,0,50%) 0% 0% 0%
Application Types + Other Subjective Factor (in %)
Total Adjustment Factor 100% -25% -25%
Final Sizing Raw 9.8 2.175 3.675
Final Sizing Adjusted 10 2 4

Sizing Upper Range 15 4 6
Informatica would recommend a PowerCenter server(s) with 4 to 6 CPUs and 10 to 15 GB of RAM.
43
Customer Example 4
Scenario
• Upgrade, High Business Growth, and End of Life for Servers
Recommendation
4 Nodes with 2 Dual Core CPU and 32 GB Memory each
44
Summary
• Capacity planning is a complicated process that requires

input from various sources
• Testing the PowerCenter loads in your environment is the
most effective way to estimate system behavior
• Choose a flexible architecture to allow incremental growth
• Validate your conclusions with Informatica Professional
Services
• Informatica HACOE at your service for reference
architecture and testing
45
Questions?
46
Thank you
47

Capacity Planning PDD Final

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Capacity Planning PDD Final

Transféré par

Droits d'auteur :

Formats disponibles

1

Murat Yesilsirt, Principal Consultant

• Key Architecture Points

• Ensures that sufficient capacity is available at all

• Can I save money with server consolidation?

• Meet future requirements

Customer Portal Web Application Integrated

Data Steward Portal

• Capacity Planning (Existing Environment)

Continuous and/or Real Time Lookup Sizing

Load Window Criticality Application Type(s)

• Monitoring tools to help determine how the

sar – System Activity Reporter

• Save sar data for later analysis

sar – Disk Utilization

HP-UX vega7077 B.11.23 U ia64 10/25/07

10:25:24 device %busy avque r+w/s blks/s avwait avserv

HP-UX vega7077 B.11.23 U ia64 10/25/07

10:49:31 %usr %sys %wio %idle

• Before upgrade to PowerCenter 8.6, planning for the new

Extract + Audit Completed

Daily Fact Loads

Data Seconds Data/Sec Data/CPU/Sec Max Expected

10GB 2100 4.8mb 1.2mb 2.4/mb/CPU

• Review bottlenecks to reveal areas of improvement with

• Better to start low, observe the adoption rates and usage

• Start with adding CPU and memory to existing server

Final Sizing Raw 9.8 2.175 3.675

Final Sizing Adjusted 10 2 4

Informatica would recommend a PowerCenter server(s) with 4 to 6 CPUs and 10 to 15 GB of RAM.

• Capacity planning is a complicated process that requires

Vous aimerez peut-être aussi