Vous êtes sur la page 1sur 40

Migrating Java Applications to HP-UX

August 2010
Technical white paper

Table of Contents
Introduction ................................................................................................................................... 3 Essential Tools on HP-UX for Java Applications..................................................................................... 3 HPjconfig................................................................................................................................... 3 Java Out-of-Box (JavaOOB)........................................................................................................... 4 HPjmeter.................................................................................................................................... 4 Collecting Profile Data for HPjmeter ............................................................................................ 5 Collecting Garbage Collection Data for HPjmeter.......................................................................... 7 GlancePlus (glance or gpm) .......................................................................................................... 8 gpm (GUI mode)...................................................................................................................... 9 Glance Screen Mode ............................................................................................................. 10 Glance Adviser Mode ............................................................................................................ 10 Other Tools .............................................................................................................................. 11 Configuring Patches and Kernel Parameters for Java on HP-UX ............................................................ 11 Running HPjconfig in GUI mode (default)...................................................................................... 12 Running HPjconfig in non-GUI mode............................................................................................. 14 Key Factors Affecting Performance................................................................................................... 14 Java Heap Size and Garbage Collection Behavior.......................................................................... 15 Garbage Collection in HP's Hotspot JVM ................................................................................... 15 JVM Heap and GC Parameters ................................................................................................ 16 Default GC Policies and Heap Settings on HP-UX ........................................................................ 17 Migrating from Solaris............................................................................................................ 17 Migrating from IBM/AIX ......................................................................................................... 18 Confirm Garbage Collection Behavior using HPjmeter.................................................................. 19 Thread Behavior and Lock Contention........................................................................................... 24 Detecting Lock Contention in Your Application ............................................................................ 24 Reducing Lock Contention in Your Application ............................................................................ 28 Deployment of Java Instances and Processor Usage......................................................................... 30 Other Factors ........................................................................................................................... 35 OS Scheduler........................................................................................................................ 35 Hyper-threading .................................................................................................................... 35 Other Java Options................................................................................................................ 35 System Components ............................................................................................................... 36 Memory Footprint of Migrated Java Application................................................................................. 36 Java Process Memory Footprint.................................................................................................... 36 Tools to Examine Java Process Memory Footprint ............................................................................ 37 Threads in the Java Process......................................................................................................... 38 Reducing Starting Footprint ......................................................................................................... 38 For More Information..................................................................................................................... 40 Call to Action............................................................................................................................... 40

Introduction
HP offers a full range of Java technology products on HP-UX 11i systems. We provide solutions to develop or deploy Java applications with the best performance on HP PA-RISC 11i v1(11.11), 11i v2 (11.23), 11i v3 (11.31), and HP Itanium 11i v2 (11.23) and 11iv3 (11.31). This document provides guidance on how to easily migrate your existing java applications from other platforms to HP-UX. The topics covered are:

Essential Tools on HP-UX for Java Applications Configuring Patches and Kernel Parameters for Java on HP-UX Key Factors Affecting Performance Memory Footprint of Migrated Java Applications

Essential Tools on HP-UX for Java Applications


HP-UX provides a variety of tools to assist you in deploying Java applications on HP-UX. These tools help with initial machine setup with patches and kernel parameters, monitoring performance of your application, and troubleshooting problems that arise. We recommend starting with these three tools: HPjconfig (or JavaOOB) HPjmeter HP glance/gpm The following provides a brief description of these essential HP-UX tools and some of the features you are likely to use.

HPjconfig
HPjconfig is a tool used in configuring and tuning HP-UX 11i systems for running Java workloads. It provides recommended kernel parameter settings and patch information needed for running Java on HP-UX. HPjconfig is a pure Java application. To run, it requires the following: Java 1.4.2.x or later HP-UX 11i v1 or later HP Integrity Itanium or HP 9000 PA-RISC system To download HPjconfig, go to:
https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPJCONFIG

Refer to the HPjconfig release notes at: http://www.hp.com/go/hpux-java-docs

Java Out-of-Box (JavaOOB)


Java Out-of-Box (JavaOOB) is a stand-alone bundle intended for large server-side Java applications. Upon installation, it will install startup (RC) scripts, modify kernel parameters, and if necessary, rebuild the kernel and reboot the system. These modifications provide better "Out-of-the-Box" behavior for Java. JavaOOB may have already been installed on your HP-UX system, in which case, the kernel parameters will have already been updated for Java. You should still use HPjconfig to make sure your system has the correct patches for Java. To download JavaOOB, go to:
https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPUXJAVAOOB

HPjmeter
HPjmeter is a Java performance analysis tool that helps identify and diagnose performance problems in your deployed Java application. It can be used in your production environment as well as during development. HPjmeter has two modes of operation:

Real-time monitoring of your running Java application Static data analysis (off-line) of data collected from your java application using the profiling or garbage collection options

Real-Time Monitoring provides the following features:

Automatic problem detection and alerts: o o o o o o o o o Memory leak detection alerts with leak rate Thread deadlock detection Abnormal thread termination detection Expected out of memory error Excessive method compilation System and process CPU utilization thresholds Heap usage thresholds Garbage collection duration Finalizer queue length

Dynamic real-time display of application behavior: o o o o o o o o o Java heap size Garbage collection events and percentage time spent in garbage collection CPU usage per method for hottest methods Object allocation percentage by method and by object type Method compilation count in the JVM dynamic compiler Number of classes loaded by the JVM and activity in class loaders Thrown exception statistics Multi-application, multi-node monitoring from a single console Applications are ready to monitor. At application start, no HPjmeter options are required to monitor the application (with Java 6.0.03 or later)

Static Data Analysis (Off-line Analysis) provides the following features:

Drill down into application profile metrics:

o o o o o o o

Graphic display of profiling data Threads histogram and lock contention metrics CPU/clock time metrics for Java methods Call graphs with call count, CPU, or clock time Per-thread display of time spent in nine different states Per-thread or per-process display of metrics Reference trees for heap analysis

In-depth garbage collection analysis: o o o o Graphical display of Java heap usage and garbage collector behavior over time Details of GC type, GC frequency, GC duration, object creation rate, cumulative memory allocation User-configurable graphs for presenting all heap and GC data Detailed summaries of GC activity and system resource allocation, along with other system and JVM runtime and version data

This document describes using some of HPjmeter's static data analysis features to analyze the performance of your migrated Java application. For more details, as well as information on HPjmeter's Real-Time capabilities, refer to the Java Troubleshooting Guide for HP-UX Systems, and the HPjmeter User's Guide. The steps for collecting the profiling and garbage collection data for static data analysis are described below. See the section Key Factors Affecting Performance for steps on viewing the data in HPjmeter to identify common Java performance problems.

Collecting Profile Data for HPjmeter


The following steps summarize how to collect and view profiling data from your application. For applications running on HP-UX, use the extended profiling option -Xeprof to capture profile data, and then view the data in HPjmeter. For applications running on non-HP-UX platforms, you can use the -Xrunhprof or -agentlib:hprof options, and use HPjmeter to view the data. The examples in this section use the -Xeprof option. 1. Change the command line of your Java application to use -Xeprof. To collect Xeprof data during the entire execution of the launched Java application, use:
$ java -Xeprof <yourApp>

You can send the Xeprof output to a specified file using the file= keyword as follows:
$ java -Xeprof:file=<yourApp>.eprof <yourApp>

The <pid> will be inserted automatically in the file name, for example,
<yourApp><pid>.eprof.

To collect Xeprof data for a specified time interval, there are two options: a. Turn on/off profiling based on specified time since program start:
$ java -Xeprof:time_on=<start_time>,time_slice=<length_of_collection_time> <yourApp>

b. Turn on/off profiling using signals (for example, sigusr1 and sigusr2):
$ java -Xeprof:time_on=sigusr1,time_slice=sigusr2 <yourApp>

The generated filename will include the time between the start of an application and the start of profiling. For example:

java<pid>_<t>.eprof

NOTE: If you are running JDK 1.5.0.04 or later, the command-line option is not required to capture eprof data. Instead, you can toggle eprof data gathering on and off by sending signals to the currently running Java VM. One log file is produced per sample period; the name for the log file is java<pid>_<startTime>.eprof. The SIGUSR2 signal toggles the recording of eprof data. Use the following process to gather eprof data for specific periods:

Send SIGUSR2 to the Java VM process. The Java VM begins recording eprof data. Send SIGUSR2 to the Java VM process. The Java VM flushes eprof data and closes the log file.

For more information, see Profiling with Zero Preparation in the HPjmeter User's Guide. 2. Run the application to create a data file. 3. Start the HPjmeter console from a local installation on your client machine. For example, here are two different ways:
$ $JAVA_HOME/bin/java <heap_settings> -jar /opt/hpjmeter/lib/HPjmeter.jar

$ /opt/hpjmeter/bin/hpjmeter

4. Click File>Open File to browse for and open the data file. 5. A profile analysis screen opens, displaying a set of tabs containing summary and graphical metric data. The following screen shows an example: Figure 1 HPjmeter Profile Data

Click among the tabs to view available metrics. Use the Metrics or Estimate menus to select additional metrics to view. Each metric you select opens in a new tab. Hover your mouse

over each category in the cascading menu to reveal the relevant metrics for that category. The following screen shows the available metrics for the threads/locks category: Figure 2 HPjmeter Threads/Locks Metrics

Collecting Garbage Collection Data for HPjmeter


The following steps summarize how to collect and view garbage collection data from your application.

When running Java on HP-UX:


1. Add the -Xverbosegc option to the Java command line.
$ java -Xverbosegc:file=<yourApp.vgc> ....

NOTE: If you are running JDK 1.5.0.14, JDK 6.0.06, or later versions, you can start GC data collection without using the Xverbosegc option on the command-line (zero-prep Xverbosegc). Instead, you can toggle GC data collection on and off by sending a SIGPROF signal to the currently running Java VM. The GC data will be written to a file named java_<pid>.vgc. To start and stop GC data collection with zero-prep, execute: kill PROF <pid> or kill -21 <pid> See Collecting GC Data with Zero Preparation in the the HPjmeter User's Guide for more information. 2. Run the application to create a Xverbosegc data file. 3. Start the HPjmeter console from a local installation on your client machine
$ java <heap_settings> -jar /opt/hpjmeter/lib/HPjmeter.jar

4. Click File>Open File to browse for and open the data file. 5. A GC viewer screen opens and displays a set of tabs containing metric data. The following figure shows the Garbage Collection Summary Analysis screen: Figure 3 HPjmeter Garbage Collection Summary Analysis Screen

When running Java on non-HP-UX platforms:


1. Collect GC data using -Xloggc option.
$ java -Xloggc:filename ...

2. For more detailed GC output, add -XX:+PrintGCDetails or -XX:+PrintHeapAtGC


$ java ... -XX:+PrintGCDetails -Xloggc:filename or $ java ... -XX:+PrintHeapAtGC -Xloggc:filename

(Using -XX:+PrintGCDetails and -XX:+PrintHeapAtGC together can cause intermixed output, which cannot be parsed by HPjmeter. We recommend not using these two options together.) 3. Run the application, and view the resulting data file in HPjmeter.

GlancePlus (glance or gpm)


GlancePlus is a system-level, real-time performance monitoring tool. It provides detailed information about CPU, memory, disk, and network activity, and also information at the process and thread level from a system perspective. With glance or gpm, you can easily examine system activities, identify and resolve performance bottlenecks, and tune your system for more efficient operation. For more information on GlancePlus, refer to the following webpage:
http://www.managementsoftware.hp.com/products/gplus/index.html

(Your OE may already come with GlancePlus installed.) This section provides brief highlights on using glance/gpm to monitor Java applications. There are 3 ways to run glance or gpm:

gpm (GUI mode) glance screen mode glance advisor mode

gpm (GUI mode)


gpm is the graphical version of GlancePlus. It provides extensive information on all aspects of your system (CPU, memory, disk, and network activity) and detailed information about your processes and threads in your process. To run it, set your DISPLAY environment variable, and then invoke gpm as follows:
/opt/perf/bin/gpm

The following figure is an example of the Main panel: Figure 4 GlancePlus (gpm) main panel

The Main Panel presents an overview of the system with a view of CPU, Network, I/O, and Memory. The Reports menu lets you select more detailed information, both global to the system, and processspecific. To examine an individual process, select: Reports>Process list (select the process you want to monitor) From the process screen, select more details (for example, system calls, memory regions, thread list...)

Detailed on-line help is provided on the Main Panel. Additional examples of using gpm are provided in the section Key Factors Affecting Performance.

Glance Screen Mode


If you only have a terminal window available and cannot invoke gpm, a screen mode of glance is available. But this can be fairly cumbersome to use, and the command letters are not particularly intuitive. gpm is far preferable. To invoke glance in screen mode:
/opt/perf/bin/glance

To see a list of available commands (letters), type "?"

Glance Adviser Mode


Sometimes it is not practical to watch and do live monitoring of a system with gpm. For example, when the application runs for a long time before the problem point occurs, it is more convenient to collect data in "batch" mode to be viewed later. Also, it is often desirable to save performance metrics from a run for future reference. The glance adviser mode allows you to collect the glance data into a file to analyze later. We have found this facility extremely useful for remote analysis of customer issues. The javaGlanceAdviser.ksh script is provided to collect metrics that are commonly useful for Java applications. It collects global metrics and process-specific metrics. Examples of global metrics collected: OS version, machine name, number of CPUs, physical memory, buffer cache sizes, number of disks. Examples of process-specific metrics collected: name, pid, cpu utilization (system and user), vss, rss, number of threads, number of open files, io rate, disk io rate For each process, the script also collects metrics for the system calls made by the process: name, count, cumulative count, call rate, cumulative call rate, and total time The javaGlanceAdviser.ksh script is shipped with HPjmeter 4.1. You can obtain a copy from:
http://www.hp.com/go/hpjmeter

To invoke script: 1. Start up your Java process. Get its process id. 2. Start up glance to collect data:
javaGlanceAdviser.ksh <pid> [output-file] [sampling-interval]

e.g.
javaGlanceAdviser.ksh 1613 ga.out.1613 5

This will collect data every 5 seconds for process 1613 and write output to file ga.out.1613. If the run is going to be really long, you can change this interval to be less frequent, maybe every 20 seconds.

10

Here is an example of partial output from running the glance adviser script:
GBLH: date time swap nfile cpu_tot cpu_sys cpu_usr run_q mem_virt mem_res pageout_rate pageout_rate_cum b ufcache_hit_pct disk_io net_in_pkt_rate net_in_err_rate net_out_pkt_rate net_out_err_rate GBL: 05/28/2010 13:33:00 16 1 50 5 46 0 7995mb 21.45 0 0 99.30 1.60 12471 0 13779 0 PH: proc_name pid cpu cpu_sys cpu_usr vss rss data_vss data_rss threads files io disk_io P: java 28818 323.60 27.30 296.30 3534380kb 3401580kb 77660 77660 89 173 0.00 0.00 PSH: syscall_name count count_cum rate rate_cum total_time total_time_cum PS: read 281334 550369 28417.5 18345.6 107.62 209.63 PS: write 34005 65958 3434.8 2198.6 0.09 0.17 PS: ioctl 11559 22372 1167.5 745.7 9.52 18.31 PS: poll 11337 21990 1145.1 733.0 20.02 40.03 PS: send 144508 283858 14596.7 9461.9 1.62 3.17 PS: sched_yield 447103 1151559 45161.9 38385.3 0.40 1.00 PS: ksleep 17466 31275 1764.2 1042.5 403.69 601.73 PS: kwakeup 17392 31131 1756.7 1037.7 0.04 0.06

Other Tools
If lower-level analysis is required, additional tools are available such as HP caliper, tusc, gdb and others. The standard JDK tool set is available as well. See Java Tools for HP-UX: Quick Start and Migration Guide for a complete list.

Configuring Patches and Kernel Parameters for Java on HP-UX


When migrating a Java application from another platform to HP-UX, make sure the machine is set up with the proper patches required by Java, and the recommended kernel parameters are set appropriately. HPjconfig is a configuration tool to help you set up patches and kernel parameters. To see the patches required for Java, refer to the following website: http://ftp.hp.com/pub/softlib/hpuxjava-patchinfo/index.html HPjconfig provides recommendations for the following kernel parameters:
maxusers nproc max_thread_proc nkthread nfile maxfiles maxfiles_lim maxdsiz maxdsize_64bit ncallout

To download HPjconfig (HPjconfig-3.2.00.tar.gz), go to:


https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPJCONFIG

HPjconfig can be run in GUI mode or non-GUI (command-line) mode. In either mode, HPjconfig generates a summary of configuration information in a log file. The default name is:
HPjconfig_<hostname>_<date>_<timestamp>.log

11

Running HPjconfig in GUI mode (default)


To run HPjconfig in GUI mode: 1. 2. 3. 4. cd to the directory where HPjconfig is installed. Set the DISPLAY variable. Set the path to java. To bring up the GUI with default log file name (for example: HPjconfig_<machine_name>_<date>_<timestamp>.log):
$ java -jar ./HPjconfig.jar

To bring up the GUI with specified log file:


$ java -jar ./HPjconfig.jar -logfile <logfile_name>

5. The following four figures show the System, Application, Patches, and Tunables tabs for the HPjconfig tool in GUI mode: Figure 5 HPjconfig SystemTab

12

Figure 6 HPjconfig Application Tab

Figure 7 HPjconfig Patches Tab

Figure 8 HPjconfig Tunables Tab

13

Running HPjconfig in non-GUI mode


The following is the -help information for HPconfig:
<put in different font> <see TS guide>

usage: HPjconfig [ options ] -gui HPjconfig [ options ] <object> <action> objects: -patches &| -tunables actions: -listreq | -listmis | -listpres | -apply options: -patches -tunables -listreq -listmis -listpres -apply -javavers s -data s -[no]gui -logfile s -proxyhost s -proxyport s -help -version operate on java-specific patches operate on java-specific tunables list all java required patches or tunables that are applicable to this system list missing java-specific patches or tunables on the system list applied (installed) java-specific patches or tunables on the system apply (install) missing java-specific patches or tunables on the system java versions for selecting patches e.g 1.2, 1.3, 1.4, 5.0, 6.0 local data file with java-specific list of patches and tunables run in GUI mode name of log file HTTP proxy host name for accessing live data HTTP proxy port for accessing live data show help string and exit show version string

Examples: List options available in non-gui mode: $ java -jar ./HPjconfig.jar -nogui -help List missing patches: $ java -jar ./HPjconfig.jar -nogui -patches -listmis List required tunables: $ java -jar ./HPjconfig.jar -nogui -tunables -listreq List present patches and tunables $ java -jar ./HPjconfig.jar -nogui -tunables -patches List present patches and tunables and write to specified log file
$ java -jar ./HPjconfig.jar -logfile my.log -nogui -tunables -patches listpres

listpres

The log file produced can be used for remote analysis of your machine.

Key Factors Affecting Performance


This section discusses some of the key factors that can affect the performance of your java application:

Java Heap Size and Garbage Collection Behavior Thread Behavior and Lock Contention Deployment of Java Instances and Processor Usage Other Factors (OS Scheduler, HT, ForceMmapReserved, Exceptions, System Components)

14

Java Heap Size and Garbage Collection Behavior


The behavior of the Java heap and garbage collector is critical to achieving good application performance. Tuning the heap is easy, especially when using HPjmeter. When migrating a Java application from another platform to HP-UX, make sure that the heap parameters are comparable to your previous settings, and that you are using an equivalent or similar GC policy.

Garbage Collection in HP's Hotspot JVM


Basic Generational Garbage Collection in HP Hotspot
The following is a brief description of the basic garbage collector in the HP Hotspot JVM and the GC policies supported. The HP Hotspot JVM supports a generational garbage collector. The Java heap is divided into multiple spaces or generations:

New Space for newly created objects Old Space for longer-lived objects Perm Space for reflective data or "permanent" objects

The New Space is further divided into 3 regions: eden and two survivor spaces (to and from). Newly created objects get allocated into the eden space. When eden becomes full, a scavenge GC is triggered. Live objects in the eden space are identified and copied into the survivor space (to), and garbage is removed from eden. To space then becomes the from space. The process starts again, with newly created objects going into eden and scavenge GCs getting triggered when eden is full. Live objects are then copied from eden and from into to. After objects have survived enough scavenges, they get migrated to old space, or tenured. The tenuring threshold determines how many scavenges an object survives before getting migrated to old space. Eventually, when the old space is determined to be too full, a Full GC is performed. A Full GC collects both the new and old spaces, examining all objects in the entire heap. Scavenge GCs are usually quick, whereas Full GC's are very time-consuming. The principle behind generational garbage collection is that many objects in an application are short-lived and will be garbage-collected while in the New Space. Therefore, overall GC time is reduced for the most common case.

GC Policies Supported in HP's Hotspot


There are multiple types of garbage collectors that operate on the different subspaces of the Java heap.

Basic "serial" collector (described in the previous section) This collector is a single-threaded, stop-the-world collector. On the new space, a fast copying collector is used to perform the scavenge GC. On the old space, a mark-compact copying collector is used to perform the Full GC. The Full GC requires scanning and processing the entire heap and is very time-consuming. Parallel scavenge of New space (high-throughput collector) This collector uses multiple threads to perform the scavenge GC on the New space, thereby reducing the total stop-the-world pause time. To enable parallel scavenge, use the JVM

15

option:
-XX:+UseParallelGC

The JVM determines the number of parallel threads using heuristics based on number of CPUs available. You can also explicitly set the number of parallel GC threads with:
-XX:ParallelGCThreads=n

Old Generation Concurrent Mark Sweep (CMS) collector (low pause collector). This collector runs mostly concurrently with the application. It attempts to clean out the Old Generation before it gets full to avoid the costly Full GC. There are still two stop-the-world pauses, but they are are very short compared to a Full GC. To enable CMS, use the JVM option:
-XX:+UseConcMarkSweepGC

Turning on this option automatically enables the following on the New space:
-XX:+UseParNewGC

UseParNewGC is a parallel scavenge specifically intended to work with low-pause CMS. CMS is initiated when Old Space occupancy reaches a specified percent, determined by CMSInitiatingOccupancyFraction:
-XX:CMSInitiatingOccupancyFraction=<percent>

CMS reduces large GC pause times, but incurs some additional CPU overhead while the application is running.

Old Generation Parallel GC (available in jdk 5.0 and later) This collector uses multiple GC threads to perform a Full GC when the Old space gets full, thereby reducing the very long pause times caused by a Full GC. To enable ParallelOldGC, use the JVM option:
-XX:+UseParallelOldGC

The JVM determines the number of parallel threads using heuristics based on number of CPUs available. You can also explicitly set the number of parallel GC threads with:
-XX:ParallelGCThreads=n

JVM Heap and GC Parameters


The following are the JVM parameters used to control the heap size and GC policies described above.
-Xms<size> -Xmx<size> -Xmn<size> -XX:PermSize=<size> -XX:MaxPermSize=<size -XX:SurvivorRatio=<num> -XX:+UseParallelGC -XX:ParallelGCThreads=n Total heap size (initial) e.g. -Xmx512m Total heap size (maximum) e.g. -Xms512m Size of New Space e.g. -Xmn128m Size of PermSpace (initial) e.g. -XX:PermSize=256m Size of PermSpace (maximum) e.g. -XX:MaxPermSize=512m Ratio of Eden to one Survivor Space e.g. XX:SurvivorRatio=8 Enable parallel scavenge on New space Set number of parallel GC threads

16

-XX:+UseConcMarkSweepGC Enable Concurrent Mark Sweep (CMS) collector -XX:+UseParallelOldGC Enable parallel Full GC -XX:+UseAdaptiveSizePolicy Attempt to resize subspaces to produce more optimal GC behavior

Default GC Policies and Heap Settings on HP-UX


When migrating a java application from another platform to HP-UX, make sure that the heap parameters are comparable to your previous settings, and that you are using an equivalent or similar GC policy. The default heap settings can differ from one platform to another, so it is better to explicitly set your heap settings, rather than rely on the defaults. The default GC policies on HP-UX are as follows: JVM 142 New space: single-threaded scavenge (UseParallelGC is off by default) Old space: single-threaded Full GC UseAdaptiveSizePolicy is off by default. JVM 5.0 and JVM 6.0 New space: multi-threaded scavenge (UseParallelGC is on by default) Old space: single-threaded Full GC UseAdaptiveSizePolicy is on by default. There are a number of factors that affect the default heap settings used by the JVM. The default values for heap settings may be different between 142, 5.0 and 6.0, and on Itanium versus PA-RISC. UseParallelGC is turned on by default in 5.0, whereas it is off by default in 142. That difference causes the default heap settings to be different. The following are examples of default settings on HP-UX: (e.g. on Itanium system with 24 GB of memory) JVM 142.25 Initial heap size: 16m Maximum heap size: 64m JVM 150.19 and JVM 6.0.06 (with default UseParallelGC on) Initial heap size: 384m Maximum heap size: 1G JVM 150.19 and JVM 6.0.06 (-XX:-UseParallelGC) Initial heap size: 6.5m Maximum heap size: 64m The amount of memory available on the machine also affects the default heap settings. Finally, these defaults can change with a new minor release of java. (Note: JVM 6.0.07 and 6.0.08 included changes in default initial heap size. For details, see the JDK 6.0.08 release notes.) Because of these many factors, we recommend that you explicitly set your heap parameters.

Migrating from Solaris


The JVM on Solaris is also a Hotspot-based JVM. The GC policies will be similar to those on HP-UX. Therefore, you can map the GC policy directly. To set your heap settings on HP-UX, if on your Solaris box, the total heap (Xmx), inital heap (Xms), and New size (Xmn) settings are explicitly set in various scripts, then run using those same settings on

17

HP-UX. If you are unsure of the settings, or some are not set (and relying on defaults), then collect the GC details from the JVM running on Solaris using:
-Xloggc:loggc.solaris -XX:+PrintHeapAtGC

On HP-UX, set the -Xmx, -Xms, -Xmn parameters based on above values from Solaris, and set GC policy to be the same as on Solaris. Add the following to verify that your settings are comparable:
-Xloggc:loggc.hpux -XX:+PrintHeapAtGC

The PrintHeapAtGC outputs can be viewed in HPjmeter. The following figure shows a sample screenshot of PrintHeapAtGC data in HPjmeter: Figure 9 GC Summary panel from PrintHeapAtGC output

Migrating from IBM/AIX


The IBM JVM has a different set of GC policies than Hotspot. These policies rely on similar concepts but do not map directly to the policies on Hotspot. Therefore, experimentation may be required to optain optimal performance. The following are GC policies available in IBM SDK 5.0:
-Xgcpolicy:optthruput Optimizes for throughput (the default policy). Throughput is more important than short GC pauses. The application is stopped each time garbage is collected. -Xgcpolicy:optavgpause Optimizes for pause time. Trades high throughput for shorter GC pauses by performing some of the garbage collection concurrently, using a concurrent mark and concurrent sweep phase. The application is paused for shorter periods.

18

-Xgcpolicy:gencon Generational and concurrent. Generational collector with a concurrent mark phase. -Xgcpolicy:subpool Similar to default policy optthruput, but employs an allocation strategy intended to perform better on machines with 16 or more processors. (available on pSeries and zSeries)

If on IBM's JVM, your application is running with "optthruput", then try the default basic GC policy on HP-UX. If your application is running with "optavgpause" on IBM, then you can try CMS (UseConcMarkSweepGC) on HP-UX. If your application is running with "gencon," first try the basic GC policy on HP-UX and then compare it to using CMS. If your application is running "subpool," on a large multi-processor IBM machine, consider using strategies listed in the section "Deployment of Java Instances and Processor Usage" To determine the heap settings used on the IBM JVM, use the options:
-verbose:gc or -Xverbosegclog:<filename>

Note the Xverbosegclog option on the IBM JVM produces an XML file, and is different from the -Xverbosegc option on the HP JVM. Try to set comparable heap settings on HP-UX. Because there is no direct mapping between GC policies from IBM to HP-UX, some experimentation and iteration will probably be necessary.

Confirm Garbage Collection Behavior using HPjmeter


Once you have your GC policies and GC parameters set, confirm that garbage collection is behaving well by collecting GC output (Xverbosegc) and viewing the result in HPjmeter. See the section Collecting Garbage Collection Data for HPjmeter. The following are examples of some of the screenshots provided by HPjmeter to diagnose GC problems. Figures 10 - 12 illustrate serious GC performance problems. The Summary panel shows percentage time spent in GC is 38%. Full GCs occur far too frequently, every 70 seconds on average. When they occur, they are very long, averaging 26 seconds in duration:

19

Figure 10 GC Summary Panel

The "Heap Usage After GC" panel shows the application running close to the maximum heap size and hence, Full GCs are invoked frequently: Figure 11 Heap Usage After GC Panel

20

The "Duration" panel shows the very long Full GC times. In contrast, note the scavenges GCs are extremely fast, as expected: Figure 12 GC Duration Panel

A typical remedy to the problem of Full GCs occurring too often is to increase the size of the heap, in particular the Old space. However, for the above example, increasing the heap size would have been insufficient for solving the problem. Several improvements were made to the above application including reducing the cache timeout so that objects do not live as long, and optimizing the cache implementation. This resulted in substantial performance improvement as seen in Figures 13 and 14. The Summary panel shows that the percentage time spent in GC has been reduced to 3.6%, with Full GCs occurring every 30 minutes instead of every 70 seconds. However, the Full GC durations are still 28 seconds on average.

21

Figure 13 GC Summary Panel

Figure 14 Heap Usage After GC Panel

The caching mechanism was eventually replaced with a more efficient implementation. This cut the GC duration roughly in half, as shown in the following figure:

22

Figure 15 GC Duration panel

(Note: When migrating your Java application from another platform, you should not have to make modifications to the application itself. The example above is being used to illustrate HPjmeter's functionality. It is not meant to imply that application changes are required) Other recommendations: The UseAdaptiveSizePolicy is intended to resize the subspaces of the java heap while the application is running in order to improve GC behavior. Occasionally, it can cause a ping-pong effect, where every scavenge causes the subspaces sizes to bounce back and forth. In such cases, it is best to disable Adaptive Sizing (-XX:-UseAdaptiveSizePolicy). When using CMS, make sure that the CMS collection is successfully cleaning out the Old space before the Old Space gets completely full. If the Old Space gets full, the Hpjmeter will show an "incomplete CMS" and a regular Full GC will occur. In such cases, try lowering the value at which a CMS is initiated, using -XX:CMSInitiatingOccupancyFraction=<percent>. For further information on garbage collection in Hotspot and using HPjmeter to optimize GC performance, refer to the following: Tuning Garbage Collection with the 1.4.2 Java Virtual Machine
http://java.sun.com/docs/hotspot/gc1.4.2/index.html

Tuning Garbage Collection with the 5.0 Java Virtual Machine


http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning


http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

Java Troubleshooting Guide for HP-UX Systems


http://www.hp.com/go/hpux-java-docs

23

HPjmeter 4.0 User's Guide


http://www.hp.com/go/hpux-java-docs

Thread Behavior and Lock Contention


Because Java applications are heavily multi-threaded, performance of the application is dependent on the OS thread implementation, scheduling policies, and how the JVM handles the acquiring of locks. Differences between platforms can cause changes in thread behavior and increased lock contention. When migrating your Java application from another platform to HP-UX, make sure that performance is not being affected by increased lock contention.

Detecting Lock Contention in Your Application


There are two easy ways to detect lock contention in your application:

Using gpm or Glance Advisor script Using HPjmeter

Using gpm or Glance Advisor Script


(Also see the previous section Glance Plus (glance or gpm)) Using gpm: 1. Startup gpm. 2. Select Reports->Process list (select the process to monitor) 3. Select Reports->System Calls (brings up Process System Calls screen) 4. Look at SysCall Rate for system calls: sched_yield ksleep kwakeup Using glance advisor script: 1. Start up glance adviser script:
javaGlanceAdviser.ksh <pid> [output-file] [sampling-interval]

2. Look at system call output for each interval. 3. High rates of sched_yield, ksleep, or kwakeup indicate lock contention (see 3rd column). Figure 16 shows partial glance output with some very high sched_yield rates (e.g, 85K per second): Figure 16 Output from glance advisor script (1 record)
GBLH: date time swap nfile cpu_tot cpu_sys cpu_usr run_q mem_virt mem_res pageout_rate pageout_rate_cum bufcache_hit_pct disk_io net_in_pkt_rate net_in_err_rate net_out_pkt_rate net_out_err_rate GBL: 05/28/2010 13:33:10 100.00 1.80 12612 16 0 1 51 13959 5 0 46 1 7995mb 21.46 0 0

PH: proc_name pid cpu cpu_sys cpu_usr vss rss data_vss data_rss threads files io disk_io P: java 173 0.00 28818 0.00 353.78 29.58 324.20 rate 3534380kb rate_cum 3403056kb 77660 77660 89

PSH: syscall_name count

count_cum

total_time total_time_cum

24

PS: read PS: write PS: ioctl PS: poll PS: send PS: sched_yield PS: ksleep PS: kwakeup

284859 34113 11630 11373 147110 854681 16701 16636

835228 100071 34002 33363 430968 2006240 47976 47767

28485.9 3411.3 1163.0 1137.3 14711.0 85468.1 1670.1 1663.6

20880.7 2501.7 850.0 834.0 10774.2 50156.0 1199.4 1194.1

106.44 0.09 9.23 20.01 1.62 1.41 224.94 0.03

316.07 0.26 27.54 60.05 4.79 2.41 826.68 0.10

In addition, the glance output provides useful data regarding the overall system performance. In particular, note the CPU usage 353% (user 324% and system 29%), the process size (vss 3.5GB and rss 3.4GB), and 89 threads running.

Using HPjmeter
To examine lock contention in HPjmeter: 1. Collect Xeprof profile (see the section Collecting Profile Data with HPjmeter). 2. Open file in HPjmeter (File->Open File) Figure 17 shows the HPjmeter Summary screen, including the running time (that is, for how long the profile was collected), and shows the application has 313 threads: Figure 17 HPjmeter Summary screen

3. Click on Threads Histogram tab The threads histogram shows each thread, its lifetime, and a color-coded set of states indicating how the thread is spending its time: lock contention, garbage collection, CPU, I/O, sleeping, waiting, and so forth. "Red" indicates "lock contention"

25

Figure 18 shows this application has an huge amount of lock contention, with every thread showing a large amount of red: Figure 18 Threads Histogram screen

4. Select a thread, and double-click on it. This brings up a pie chart with a breakdown of time spent in each of the states. Figure 19 shows this thread is spending 72% in lock contention and 24% time in Garbage collection, and no CPU time spent. In other words, the thread is not getting any real work done: Figure 19 Threads Histogram with pie chart

26

5. Determine where the lock contention is occurring. Select Metrics->Threads/Locks->Lock Delay - Method Exclusive Figure 20 shows the highest lock delay (time spent waiting to acquire a lock) is coming from method: weblogic.utils.classloaders.ChangeAwareClassLoader.loadClass. Figure 20 Lock Delay Method Exclusive

6. Find where the method is called in the call graph tree (Figure 21): Click on <method name> in list. Select Edit->Mark to mark the method for finding later. Select Metrics->Threads/Locks->Lock Delay-Call Graph Tree (brings up call graph tree) Select Edit->Find Immediately (finds method in call graph tree).

27

Figure 21 Lock Delay - Call Graph Tree

7. Look at lock delay at thread level instead of process level (Figure 22): Select Scope->Thread (Change to thread scope) Select Metrics->Threads/Locks->Lock Delay - Method Exclusive Figure 22 Thread level Lock Delay - Method Exclusive

Reducing Lock Contention in Your Application


To reduce lock contention, there are several recommended options to try:

Tune the thread count in the application (most likely reduce number of threads) Tune deployment of Java (see next section "Deployment of Java Instances and Processor Usage") Make application modifications to reduce lock contention (break up locks or hold locks for shorter time) Modify thread scheduling policy (see section Other Factors)

28

Figures 23 and 24 shows the Threads Histogram and pie chart after reducing the thread count from 300 to 100, improving class loading, and rearchitecting parts of application to reduce lock contention. Lock contention is substantially reduced with these improvements. Figure 23 Threads Histogram screen (after improvement)

Figure 24 Threads Histogram screen with pie chart (after improvement)

29

Deployment of Java Instances and Processor Usage


Migrating Java applications to a different platform is often the result of upgrading hardware to a newer processor and/or upgrading to a larger machine with more cores. If you were previously running a single Java instance on a small box (e.g. 4-way), running that same single instance of Java on a 32-way box may not give you the increased throughput you were expecting. To get optimal performance, you may need to optimize the number of Java instances and assign these instances to processor sets to improve locality. To determine if you should utilize multiple Java instances and processor sets, here are some indicators to look at:

Throughput or performance is not what you expected Java process does not appear to be scaling with increased number of cores Application is experiencing very high lock contention Use glance and HPjmeter to analyze lock contention (see section Thread Behavior and Lock Contention above) CPU is very underutilized with the addition of more cores. Use glance to observe. Your application is experiencing sporadic high GC latencies Use HPjmeter to look at GC duration.

Running multiple Java instances on larger machines versus a single Java instance has several advantages:

Lock contention is potentially reduced. Rather than all threads waiting on a single lock in a single instance, having multiple instances essentially breaks up the single lock into multiple separate locks, thereby reducing lock contention overall.

The effects of pause times for GCs can be reduced. o As you try to increase the load on your system, with a single Java instance, you would need to continue to increase the size of your heap to accommodate the increased object creation rate. Assuming the basic garbage collector, increasing the total heap size results in longer stop-the-world pause times when GC occurs. On the other hand, with multiple instances of Java, each instance can have a smaller heap size since you have multiple instances handling the increased load. The stop-the-world pause time for each GC in a small instance would be less than with one huge instance. With the basic garbage collector, each time a GC occurs, all threads stop and are brought to a safepoint. Then, GC occurs. If you have a single instance of Java, then the entire instance is stopped waiting for GC to complete. With multiple instances, one Java instance could be doing GC, but the others can continue to execute the actual application.

Running multiple Java instances each in its own processor set offers additional advantages of improved locality:

Ensures Java processes use local memory (cell-local or socket-local) for faster allocation and memory access

30

Reduces cache-to-cache misses by keeping accesses local and preventing scheduler from moving processes across locality domains

For an application server such as Oracle Weblogic, we recommend assigning one Java instance to a processor set with 2-4 cores. For instructions on using processor sets, refer to man pages man psrset or
http://h71028.www7.hp.com/enterprise/us/en/os/hpux11i-prm-processor-sets.html

Figures 25 and 26 below illustrate an example of an application, using Oracle Weblogic application server, which initially did not scale when moving from 2 cores to 8 cores. After changing to run multiple Java instances, each in a 2-core processor set, the application achieved the desired scaling and goal performance. Figure 25 shows the Threads Histogram for a single instance of weblogic running on 8 cores. The 8 weblogic.socket.Muxer threads show substantial lock contention; one of the threads is at 88% lock contention. (This is a common scenario.) Figure 26 shows the result after running multiple weblogic instances each in a 2-core pset. The number of socket.Muxer threads was also reduced to 3 per instance (instead of 8). The screen shows the Threads Histogram for a single instance. Although there is still some lock contention, it has been noticably reduced. Figure 25 Threads Histogram (before)

31

Figure 26 Threads Histogram (after)

Figure 27 shows one sample glance record of single instance Java.

Figure 27 Sample Glance Record (before)


GBLH: date time swap nfile cpu_tot cpu_sys cpu_usr run_q mem_virt mem_res pageout_rate pageout_rate_cum bufcache_hit_pct disk_io net_in_pkt_rate net_in_err_rate net_out_pkt_rate net_out_err_rate GBL: 05/28/2010 13:33:10 16 1 51 5 46 1 7995mb 21.46 0 0 100.00 1.80 12612 0 13959 0 PH: proc_name pid cpu cpu_sys cpu_usr vss rss data_vss data_rss threads files io disk_io P: java 28818 353.78 29.58 324.20 3534380kb 3403056kb 77660 77660 89 173 0.00 0.00 PSH: syscall_name count count_cum rate rate_cum total_time total_time_cum PS: read 284859 835228 28485.9 20880.7 106.44 316.07 PS: write 34113 100071 3411.3 2501.7 0.09 0.26 PS: ioctl 11630 34002 1163.0 850.0 9.23 27.54 PS: poll 11373 33363 1137.3 834.0 20.01 60.05 PS: send 147110 430968 14711.0 10774.2 1.62 4.79 PS: sched_yield 854681 2006240 85468.1 50156.0 1.41 2.41 PS: ksleep 16701 47976 1670.1 1199.4 224.94 826.68 PS: kwakeup 16636 47767 1663.6 1194.1 0.03 0.10

Figure 28 shows one sample glance record after multi-instance/psets. Figure 28 Sample glance record (after)
... GBLH: date time swap nfile cpu_tot cpu_sys cpu_usr run_q mem_virt mem_res pageout_rate pageout_rate_cum bufcache_hit_pct disk_io net_in_pkt_rate net_in_err_rate net_out_pkt_rate net_out_err_rate GBL: 06/04/2010 14:42:32 19 1 27 3 24 0 13529mb 24.53 0 0 100.00 3.00 10326 0 11498 0 PH: proc_name pid cpu cpu_sys cpu_usr vss rss data_vss data_rss threads files io disk_io P: java 22925 178.80 16.40 162.20 3514828kb 3382284kb 73564 73564 68 172 0.20 0.20 PSH: syscall_name count count_cum rate rate_cum total_time total_time_cum PS: read 98957 747895 20195.3 5527.6 31.15 227.78 PS: write 11274 87629 2300.8 647.6 0.03 0.23

32

PS: PS: PS: PS: PS: PS: ...

ioctl poll send sched_yield ksleep kwakeup

3769 3759 49537 443 3990 3980

29485 29225 379390 3997 31546 31307

769.1 767.1 10109.5 90.4 814.2 812.2

217.9 216.0 2804.0 29.5 233.1 231.3

4.82 8.01 0.80 0.09 56.42 0.01

225.66 131.72 26.20 1.67 3283.72 0.13

Focusing just on sched_yield rates (grep sched_yield), the following screen is a snippet from single instance Java. Note the sched_yield rates (3rd column) get as high 90K per second. Figure 29 sched_yield rates (before)
... PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: ... sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield 704456 447103 854681 3228 427631 80744 125310 10114 86729 20274 6122 4212 480955 459844 368195 9701 107817 322367 918902 3603 704456 1151559 2006240 2009468 2437099 2517843 2643153 2653267 2739996 2760270 2766392 2770604 3251559 3711403 4079598 4089299 4197116 4519483 5438385 5441988 71157.1 45161.9 85468.1 322.8 43195.0 8074.4 12786.7 1011.4 8672.9 2047.8 651.2 405.0 48581.3 46448.8 36454.9 979.8 10890.6 32236.7 92818.3 360.3 35222.8 38385.3 50156.0 40109.1 40618.3 35969.1 33080.7 29480.7 27372.5 25093.3 23149.7 21312.3 23225.4 24759.1 25497.4 24054.7 23317.3 23786.7 27205.5 25914.2 0.59 0.40 1.41 0.01 0.37 0.10 0.12 0.02 0.09 0.05 0.03 0.02 0.43 0.51 0.34 0.05 0.10 0.31 0.78 0.02 0.59 1.00 2.41 2.42 2.79 2.90 3.01 3.04 3.13 3.18 3.21 3.23 3.65 4.17 4.50 4.55 4.65 4.96 5.73 5.76

The following screen is a snippet after switching to multi-instance/psets. Figure 30 sched_yield rates (after)
... PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: PS: ... sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield sched_yield 716 475 443 4217 6101 17678 1230 8189 9697 26599 3372 335 9028 1821 1037 1571 16607 734 5939 3713 2887 3362 3997 8214 14315 31993 33223 41412 51148 77747 81119 81575 90603 92424 93461 95032 111639 112373 118312 122025 146.1 95.0 90.4 843.4 1245.1 3535.6 246.0 1671.2 1939.4 5319.8 688.1 68.3 1805.6 371.6 207.4 314.2 3389.1 149.7 1187.8 757.7 24.0 26.8 29.5 58.5 98.5 212.8 213.9 258.5 300.3 443.5 450.1 428.8 464.1 461.6 455.2 451.8 518.7 510.3 525.3 530.0 0.08 0.10 0.09 0.11 0.06 0.02 0.05 0.08 0.08 0.03 0.00 0.06 0.13 0.01 0.01 0.04 0.01 0.00 0.08 0.00 1.45 1.56 1.67 1.78 1.85 1.87 1.91 1.99 2.08 2.11 2.11 2.18 2.31 2.32 2.33 2.37 2.38 2.39 2.47 2.47

As you can see, sched_yield rates are substantially reduced after switching to multiple Java instances running in psets, indicating reduced lock contention. Finally, figures 31 and 32 compare scavenge durations before and after optimization. Even though scavenge GCs are quick and were not causing a major problem, note the first screenshot shows occasional higher scavenge times, as high as 500ms. In the second screenshot, the scavenges are more uniformly under 150ms, with only a few hitting 250ms.

33

Figure 31 Scavenge Duration (before)

Figure 32 Scavenge Duration (after)

34

Other Factors
This section discusses other factors that can affect the performance of your java application.

OS Scheduler
The scheduling policies in the OS can have a significant impact on heavily multi-threaded Java applications. By default, the OS scheduling policy is SCHED_HPUX (or SCHED_TIMESHARE), where the priority value of the thread is decayed over time as the thread consumes processor cycles and boosted when the thread waits for processor cycles. If your application is experiencing performance problems due to heavy lock contention among threads, you might see some improvement by changing the scheduling policy to SCHED_NOAGE. The priority value of a thread executing with the SCHED_NOAGE policy is not decayed or boosted by the operating system scheduler. To switch the scheduling policy to SCHED_NOAGE, use the following JVM option: XX:SchedulerPriorityRange=SCHED_NOAGE

Hyper-threading
If you are migrating your application to Intel(R) Itanium(R) processor 9300 series, you might see some performance gain by enabling hyper-threading. Whereas enabling hyper-threading on the older 9100 series did not see much performance benefit, improvements to thread-switching decisions in the 9300 series make hyper-threading perform better. Whether or not you see a benefit depends on the application, so experimentation is required. To turn on hyper-threading, do the following: 1. Enable hyper-threading on your machine: $setbootmon $reboot 2. Dynamically turn on the hyper-threading tunable: $kctunelcpu_attr=1 For more details, refer to the setboot and lcpu_attr man pages.

Other Java Options

ForceMmapReserved Some applications will see improved performance by using the option: XX:+ForceMmapReserved This option tells the JVM to reserve the swap space for all large memory regions used by the JVM (Java heap).

Java Exceptions Exception handling in Java is very expensive, and processing an excessive number of exceptions will cause non-optimal application performance. When migrating your application from another platform, errors in deployment can cause exceptions to occur. For example,

35

configuration files may require updating - if a config file points to an old IP address or missing file, an exception will occur. Excessive exceptions can also be caused by nonoptimal programming, where exceptions are used for control logic. You can use HPjmeter to see the number of exceptions and where they are occurring: 1. Collect Xeprof profile data. 2. Open file in HPjmeter (File->Open File) 3. Select Estimate->Exceptions Thrown This pops up a box with "Exceptions Thrown" showing the number of exceptions and in what methods they occur. 4. Click on the <method name>, and click button "Mark to Find" 5. Bring up Call Graph Tree: Select Metrics->Code/CPU->Call Graph Tree (Call Count) 6. Find Method in Call Graph Tree: Select Edit->Find Immediately 7. Expand method (click +) to see type of exception being thrown. If these exceptions are the result of an unexpected deployment error, then you can fix the error. If it is not possible to remove the source of the exception, then use the following option to suppress the filling in of the stack trace by the JVM when an exception occurs, thereby reducing some of the performance hit from exception handling: -XX:-StackTraceInThrowable

System Components
There are other system components that could be causing bottlenecks in overall system performance, thereby hiding application server performance. These include the database, network, and file system. You can use HPjmeter and glance as an indicator of whether these components are affecting performance. Then, use other tools to monitor the database, network, and file system, and tune them if necessary.

Memory Footprint of Migrated Java Application


When migrating your Java application to HP-UX, there will be some differences in the size or footprint of the Java process due to different memory layout and different JVM defaults. In general, we expect the size of the Java process on HP-UX to be comparable to other platforms. You might observe small differences in the size at the initial startup of your application. But, the growth in size over time and the steady-state footprint should be similar to your previous platform. For a long running server application, with a large Java heap, the small differences at startup will be insignificant compared to total size after the application reaches steady-state. Therefore, we recommend that you do not get overly concerned with small differences in starting footprint.

Java Process Memory Footprint


The Java process memory footprint is made up of the following: (Note: the defaults listed apply to 32-bit processes)

Java heap (mmap region)

36

(-Xmx, -Xms) (see discussion on defaults in the Key Factors Affecting Performance section.)

Permanent space (mmap region) (-XX:PermSize, -XX:MaxPermSize) (defaults: 16m, 64m) Code Cache (mmap region) - contains runtime compiled code (-XX:ReservedCodeCacheSize) (IA default: 64m, PA default: 32m) C heap (data region) - contains JVM C/C++ data structures Java application thread stacks (mmap region) [multiply by number of threads] (-Xss) (IA default: 512k**, PA default: 512k) ** see clarification in the Threads in the Java Process section below. main stack (stack region) text (text region) shared libraries internal JVM thread stacks (mmap region) see Threads in the Java Process section below

Tools to Examine Java Process Memory Footprint


To look at the memory usage of your Java process and what is contributing to the footprint, use Xverbosegc data and glance adviser script data to give you a high-level view. The following is an example from glance adviser output:
PH: proc_name pid cpu cpu_sys cpu_usr vss rss data_vss data_rss threads files io disk_io P: java 28818 353.78 29.58 324.20 3534380kb 3403056kb 77660 77660 89 173 0.00 0.00

Columns 6 and 7 show VSS (3.5GB) and RSS (3.4GB) of the entire process. Columns 8 and 9 show VSS (77MB) and RSS (77MB) of the data segment (C heap). From this data, you can observe the memory usage over time and whether the process stabilizes. You can also see how much of the memory usage is due to C heap (JVM structures) versus Java heap and other memory-mapped regions. To look at the memory regions in more detail, use gpm: 1. Start up gpm 2. Select Reports->Process List (brings up process screen) 3. Select java process from process screen 4. From process screen, select Reports->Memory Regions 5. Sort Memory Regions by VSS (Select Configure->Sort Fields) click on VSS, click on left most column header, click "done"

37

6. Top half of display gives a summary. Bottom half lists all memory regions. The java heap regions will appear towards the top (if you sorted by VSS). If you are migrating from Solaris or AIX, you may be accustomed to using pmap (Solaris) or procmap (AIX) commands to look at size of the memory regions. The pmap command is also available on HP-UX 11.31.

Threads in the Java Process


The Java process contains both Java threads (application-level) as well as internal JVM threads. The Java application-level thread stack size is determined by the parameter -Xss<size>:

On PA, the default is 512k. On Itanium, the stack region contains 2 parts - the memory stack and the register stack. The default stack region is 512k of which 256k goes to the memory stack. The -Xss parameter specifies the size of the memory stack only. When -Xss<size> is specified, the JVM will double <size>, so the actual stack region will be 2*<size>. For example, -Xss512k results in a stack region of 1M.

Internal JVM threads include the following: vm thread, compiler threads, parallel GC threads, watcher thread, ... To print the default sizes of the thread stacks, use the option: -XX:+ThreadPriorityVerbose On Java 5.0 and java 6 default vm thread stack size is 1M default compiler thread stack size is 4M In a production environment, for a long-running server application, we generally do not recommend modifying the internal JVM thread stack sizes. For testing purposes, if you want to experiment with reduced thread stack sizes, see the next section "Reducing Starting Footprint".

Reducing Starting Footprint


On Java 6 running on HP-UX 11.31, the initial RSS of the Java process has been improved. That is, starting footprint (RSS) is reduced. However, for Java 5 and Java 6 on 11.23, or Java 5 on 11.31, you can obtain a lower initial RSS by using the option:
-XX:+ForceMmapReserved

For a large, long running server application, the starting footprint or memory usage is not as important as the longer-term steady-state memory usage. However, sometimes an application deployment uses many little Java processes all running on one machine. In such cases, the starting footprint of the java process becomes critical. To lower the footprint of your java process on Itanium, you can experiment with these options:

Minimize -Xms parameter; allow -Xmx to be larger to accommodate the few processes that will require a larger heap.

38

lower PermSize to a minimum (enough to accomodate startup) Reduce code cache size. For example: -XX:ReservedCodeCacheSize=32m Reduce java thread stack size to minimum possible; watch for stack overflow: -Xss200k Reduce number of threads in application. Reduce VM thread stack size to 512k: -XX:VMThreadStackSize=512 Reduce CompilerThread stack size to 1m: -XX:CompilerThreadStackSize=1024

As mentioned in the Key Factors Affecting Performance section, defaults on Solaris or AIX may differ from HP-UX. If there is a large difference in footprint when moving to HP-UX, most likely the settings of the above parameters are very different on the platforms, and need to be adjusted.

39

For More Information


To read more about Java on HP-UX, go to www.hp.com/go/java. To read more about HPjmeter, go to www.hp.com/go/hpjmeter.

Call to Action
HP welcomes your input. Please give us comments about this white paper, or suggestions for HP-UX Java or related documentation, through our technical documentation feedback website:
http://www.hp.com/bizsupport/feedback/ww/webfeedback.html

2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. 59000594, March 2010

Share with colleagues

40

Vous aimerez peut-être aussi