Académique Documents
Professionnel Documents
Culture Documents
Date: 08-2008
Copyright 2008, CA. All rights reserved. Wily Technology, the Wily Technology Logo, Introscope, and All Systems Green are registered trademarks of CA. Blame, Blame Game, ChangeDetector, Get Wily, Introscope BRT Adapter, Introscope ChangeDetector, Introscope Environment Performance Agent, Introscope ErrorDetector, Introscope LeakHunter, Introscope PowerPack, Introscope SNMP Adapter, Introscope SQL Agent, Introscope Transaction Tracer, SmartStor, Web Services Manager, Whole Application, Wily Customer Experience Manager, Wily Manager for CA SiteMinder, and Wily Portal Manager are trademarks of CA. Java is a trademark of Sun Microsystems in the U.S. and other countries. All other names are the property of their respective holders. For help with Introscope or any other product from CA Wily Technology, contact Wily Technical Support at 1-888-GET-WILY ext. 1 or support@wilytech.com. If you are the registered support contact for your company, you can access the support Web site directly at http://support.wilytech.com. We value your feedback Please take this short online survey to help us improve the information we provide you. Link to the survey at: http://tinyurl.com/6j6ugb
US Toll Free 888 GET WILY ext. 1 US +1 630 505 6966 Fax +1 650 534 9340 Europe +44 (0)870 351 6752 Asia-Pacific +81 3 6868 2300 Japan Toll Free 0120 974 580 Latin America +55 11 5503 6167 www.wilytech.com
CONTENTS
Table of Contents
Chapter 1
Introscope Sizing and Performance Introduction . New and changed features in Introscope 8.0 Agent load balancing . Agent metric aging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
9 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12 12 12 13 13 13 13 13 14 14
Changed Heap Capacity (%) metric . Changed Metric Count metric . Changed way of determining events. Changed Number of Inserts metric . Dynamic instrumentation . . . . . . . .
Enterprise Manager dead metric removal . How to detect metric explosions . Metric clamping . MOM hot failover . . . . . . . . . . . . . .
New metric for Collector Metrics Received Per Interval . New metric for Historical Metric Count . New metric for Number of Historical Metrics . New tab for CPU Overview . New tab for Metric Count . . . . . . . . . . . . . . . . . . . . . . . .
New metric for Transaction Traces Dropped Per Interval. New tab for Enterprise Manager Overview Ping time threshold properties . Scalability . . . . . . . .
Contents iii
CA Wily Introscope
. .
. .
. . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
14 14
. 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 20 20 21 22 26 27 27 28 29 30 31 32 33 34 35 38 40 40 43 43 43 44 44 45 46 47 47 47 48 49 49 50
Factors that affect the Introscope environment . Factors that affect EM maximum capacity . About Introscope system size . Enterprise Manager health . . . . . . . . . . . Differences between EMs and J2EE servers .
About the Enterprise Manager Overview tab . About EM health and supportability metrics . Harvest Duration metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of Collector Metrics
Collector Metrics Received Per Interval metric . Converting Spool to Data metric . Overall Capacity (%) metric Heap Capacity (%) metric . . . . .
About SmartStor spooling and reperiodization . Report generation and performance . Concurrent historical queries and performance . About SmartStor and flat file archiving . MOM overview . . . . . . . . . . . . . . . . . . . . . . Collector overview .
Collector metric capacity and CPU usage . About the CPU Overview tab Enterprise Manager basic requirements .
Enterprise Manager file system requirements EM OS disk file cache memory requirements . Enterprise Manager heap sizing . SmartStor requirements . . . . . . . . . . . . . . . .
Each EM requires SmartStor on a dedicated disk or I/O subsystem . SmartStor Duration metric limit .
iv Contents
. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51 51 52 53 53 55 55 57 57 57 58 58 59 59 60 60 61 62 63 68 69 69 70 70 71 71 71 72 73 74 74
Local network requirement for MOM and Collectors Introscope 8.0 EM settings and capacity . SmartStor settings and capacity . . . . . . . . . . . . . . .
Estimating Enterprise Manager databases disk space needs Setting the SmartStor dedicated controller property . Planning for SmartStor storage using SAN Planning for SmartStor storage using SAS controllers Enterprise Manager thread pool and available CPUs . Collector and MOM settings and capacity . MOM hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOM disk subsystem sizing requirements . MOM to Collectors connection limits .
Configuring a cluster to support 1,000,000 MOM metrics Agent load balancing on MOM-Collector systems Avoid Management Module hot deployments . Collector applications limits . Collector metrics limits Collector events limits Collector agent limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Collector hardware requirements . Collector with metrics alerts limits Collector to MOM clock drift limit . Reasons Collectors combine slices
Increasing Collector capacity with more and faster CPUs Standalone EM hardware requirements example Running multiple Collectors on one machine . Chapter 3 .
Metrics Requirements and Recommendations Metrics background . . . . . . . . . . . . . . . . . About metrics groupings and metric matching . 8.0 metrics setup, settings, and capacity . Matched metrics limits . . . . . .
. 77 . . . . 78 78 79 79
Contents v
CA Wily Introscope
Inactive and active metric groupings and EM performance . SmartStor metrics limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
80 80 80 80 81 81 82 82 83 84 84 85 85 91 92 94 96 96 98
Performance and metrics groupings using the wildcard (*) symbol . Virtual agent metrics match limits
About alerted metrics and slow Workstation startup . Detecting metrics leaks . Metrics leak causes . Finding a metrics leak. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Metrics for diagnosing a metrics leak Detecting metric explosions Metric explosion causes .
Investigator metrics and tab for diagnosing metric explosions. How Introscope prevents metric explosions . SQL statements and metric explosions . SQL statement normalizers . Metric clamping . . . . . . . . . . . . . . . . . . . . .
Enterprise Manager dead metric removal . SmartStor metadata files are uncompressed Chapter 4
Workstation and WebView Requirements and Recommendations 99 Workstation and WebView background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 . 100 . 100 . 100 . 101 . 101 . 102 . 103 . 103 . 103 105 . 106 . 106 . 107 8.0 Workstation and WebView requirements
OS RAM requirements for Workstations running in parallel . WebView and Enterprise Manager hosting requirement . Workstation to standalone EM connection capacity. Workstation to MOM connection capacity . WebView server capacity . . . . . . . . . . . WebView server guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.0 Workstation and WebView setup, settings, and capacity .
vi Contents
. .
. .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. 107 . 107 . 108 . 109 . 109 . 109 . 110 . 110 . 110 . 112 113 119 . 119 . 122 125
Transaction Trace component clamp Configuring agent heuristics subsets Virtual agent metrics match limits Agents limits per Collector . Agent heap sizing . . . . . . . . . . . . . . .
High agent CPU overhead from deep nested front-end transactions . 111 Dynamic instrumentation . Appendix A Appendix B
Sample Introscope 8.0 Collector and MOM Sizing Limits by OS Sample Introscope 8.0 Collector sizing limits table Sample Introscope 8.0 MOM sizing limits table . . . . . . . . . . . . . . . . . . . . . .
Index .
Contents vii
CA Wily Introscope
viii Contents
CHAPTER
This document contains background, instructions, best practices, and tips for optimizing the sizing and performance of your Introscope 8.0 deployment and environment. Use it in conjunction with the following Introscope 8.0 documentation:
Introscope Configuration and Administration Guide Introscope Installation and Upgrade Guide Introscope Java Agent Guide Introscope .NET Agent Guide Introscope Overview Guide Introscope WebView Guide Introscope Workstation User Guide
For additional information about this product, you can take the CA Wily Technology Education Services class, Introscope: Enterprise Manager (EM) Capacity Management. For more information, go to http://www.wilytech.com/ services/education.html. In addition, CA Wily Technology Professional Services and Technical Support have service offerings to address specific needs in your application management environment.
CA Wily Introscope
a Collector. Also, the MOM keeps the metric load balanced between Collectors by
ejecting participating 8.0 agents from over-burdened Collectors. A participating agent is one that connected to the MOM. The ejected agents reconnect to the MOM, and are reallocated to under-burdened Collectors. To configure agent load balancing, see the Introscope Configuration and Administration Guide. To understand how agent load balancing affects Introscope performance, see Agent load balancing on MOM-Collector systems on page 63.
Dynamic instrumentation
Introscope uses dynamic instrumentation (also called dynamic ProbeBuilding) to implement new and changed PBDs without restarting managed applications or the Introscope agent. Dynamic instrumentation affects CPU utilization, memory, and disk utilization. See Dynamic instrumentation on page 112.
Metric clamping
Several properties that limit, or clamp, the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. See Metric clamping on page 96.
CA Wily Introscope
Scalability
Introscope 8.0 includes a number of scalability improvements, which are documented across this guide:
Each Collector Enterprise Manager can handle up to 500 K metrics (varies
according to hardware) about twice the Introscope 7.x Enterprise Manager metric limit.
Collectors can take advantage of additional CPUs to increase these limits:
CA Wily Introscope
number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager).
Each MOM can connect to a five million metric cluster (10 collectors, 500 K
metrics per collector), which is a five-fold increase in clustered Enterprise Manager scale.
The MOM now requires more powerful hardware than Collectors. See MOM
Important The limits may differ substantially depending on the specific platform and hardware used in your environment.
CHAPTER
This chapter provides background and specifics to help you understand how to size and tune your Enterprise Manager for good performance. In this chapter youll find the following topics: Enterprise Manager overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 20 21 22 27 28 40 40 43 43 43 44 44 47 47 47 49 49 51 51 52 53 53 55 55 58
Factors that affect the Introscope environment Factors that affect EM maximum capacity . Differences between EMs and J2EE servers . Enterprise Manager health . SmartStor overview . . . . . . . . . . . . . . . . . . . About EM health and supportability metrics
About SmartStor spooling and reperiodization. Report generation and performance . Concurrent historical queries and performance About SmartStor and flat file archiving . MOM overview . . . . . . . . . . . . . . . . Collector overview
Enterprise Manager file system requirements . EM OS disk file cache memory requirements . SmartStor requirements. . . . . . . . . . .
Each EM requires SmartStor on a dedicated disk or I/O subsystem . MOM and Collector EM requirements .
Local network requirement for MOM and Collectors . Introscope 8.0 EM settings and capacity SmartStor settings and capacity . . . . . . . . . . . . . . .
Estimating Enterprise Manager databases disk space needs Setting the SmartStor dedicated controller property . Collector and MOM settings and capacity
CA Wily Introscope
MOM disk subsystem sizing requirements . MOM to Collectors connection limits . MOM to Workstation connection limits . . . .
. . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
58 59 60 60 61 62 63 68 69 69 70 70 71 71 71 72 73 74 74
Configuring a cluster to support 1,000,000 MOM metrics Agent load balancing on MOM-Collector systems . Avoid Management Module hot deployments . Collector applications limits Collector metrics limits . Collector events limits Collector agent limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Collector hardware requirements . Collector with metrics alerts limits Collector to MOM clock drift limit . Reasons Collectors combine slices
Increasing Collector capacity with more and faster CPUs Standalone EM hardware requirements example . Running multiple Collectors on one machine . .
CA Wily Introscope
In a more complex environment, as shown in the figure below, Enterprise Managers in the role of Collectors can be clustered so that their collected metrics data is compiled in a single Manager of Managers (MOM) Enterprise Manager. The MOM provides a unified view of all the metrics to the connected Workstation and WebView instances.
Note In cases where the data is specific to a single Enterprise Manager or where clustering makes no difference to the topic, this guide uses the generic term Enterprise Manager. However in some cases, Collectors and MOM Enterprise Managers perform different functions that require different sizing capacity guidelines or result in different performance behaviors. In these cases, the term Collector or MOM is used as appropriate. While the Collector and MOM perform very different functions within a cluster, the system requirements are quite similar with the exception of data persistence, as the MOM persists relatively little data in its role.
In an Introscope deployment, the agent collects application and environmental metrics and relays them to the Enterprise Manager. Multiple physical agents can be configured into a single virtual agent, which enables an aggregated, logical view of the metrics reported by multiple agents. To an Introscope Enterprise Manager, an application is an agent-specific association of metrics that is derived from the Java application .war files deployed on the managed J2EE application server. In an Introscope Enterprise Manager Investigator metric tree, applications, which are agent-specific, are found under the Frontends node, as shown in the following figure. Note You can have multiple applications running within a single JVM, but you can assign only one Introscope agent per JVM to collect the performance data.
CA Wily Introscope
Important On typical server configurations, the metrics limit is usually the primary limitation on the capacity of the Enterprise Manager. This is a critical factor when sizing an Enterprise Manager. CPU performance, network bandwidth, and availability of RAM are also influential, but disk I/O seek time is typically the primary bottleneck. In Introscope 8.0, exceeding the limits found in the Sample Introscope 8.0 Collector sizing limits table on page 119 will potentially bring the system to a state where you begin to see performance problems. These problems depend on what is impacted. Overloaded disk I/O typically causes combined time slices and sluggish Workstation refresh times. Lack of RAM causes memory exceptions during spool file conversion, as too many metrics are tracked. Network bandwidth problems cause slow cluster response time, and more rarely, may cause agents to be dropped. Lagging CPU causes performance problems including calculators not updating and alerts to be missed. Another example, as seen in Sample Introscope 8.0 Collector sizing limits table on page 119, the recommended limit for monitored applications (maximum number of applications) for a Windows-based Enterprise Manager is about 170% of that found on a Solaris machine. In the case of applications, the limit is strongly dependent on the performance characteristics of the CPUs available to the Enterprise Manager, since applications create alerts that must be calculated every time slice.
The Enterprise Manager runs at greater Collector metric capacity and CPU than 40-50% average CPU utilization range usage on page 45 The sum of all metrics behind every TOP N graph viewed by every Workstation instance exceeds 100,000 Top N graph metrics limit per Workstation on page 103
CA Wily Introscope
request
not forced through a common checkpoint for synchronization (although J2EE
Therefore, in most situations, application servers scale well in throughput by adding additional CPUs, because each CPU can run additional worker threads to satisfy more requests. Occasionally one request might be slowed down, but whether it takes 100 milliseconds (ms) or 5 seconds doesn't cause the rest of the system to come to a halt. Only in the event of an external bottleneck, such as a database, can all threads come to a halt waiting for data. Eventually the request threads all become busy, and the application server slows to a crawl, maintaining most throughput while rejecting additional requests for work. When the bottleneck is relieved, the system begins to service requests again, and returns to normal. In contrast, the Enterprise Manager behaves very differently because of its architecture and the nature of the work it performs. Introscope monitors production systems in real time, and provides information, warnings, and alerts in real time. In order to accomplish this, the Enterprise Manager performs as a real time system as well. The Enterprise Manager receives a continual flow of data from agents every 7.5 seconds. Once every 15 seconds, the Enterprise Manager must do all of the following:
examine all of the metric data that it has received for the interval for
consistency
perform calculations perform actions, such as fire alerts or send messages store the data to disk respond to Workstation requests for live data handle incoming events (Transaction Traces, errors, and so on) and persist
them.
CA Wily Introscope
For the most part, the Enterprise Manager can only use two threads to perform calculations and actions on the large set of agent-generated data, and only a single thread to perform the data storage. If the Enterprise Manager is unable to complete these operations within the 15 second interval, it may fall behind and not catch up with all the processing that needs to be completed because another set of data arrives. The Enterprise Manager then continually combines data or suffers from sluggish performance as it attempts to process and write more data than it can handle. There are internal buffers to allow for bursts of activity so that the Enterprise Manager can catch up, but if the Enterprise Manager has too many metrics being reported, these buffers fill up quickly. The Enterprise Manager is very different from a J2EE server in this regard, because the standard J2EE server does not examine data requests on a regularly scheduled basis to decide what to do with them. The Enterprise Manager's scenario is more similar to the classic factory production conveyor belt analogy, in which a continual set of finished products (data) arrives for two workers to examine. Then the two workers must transfer the product packages (metric data) to a single worker who drives the packaged data in a truck down a single-lane road to a warehouse, where several more workers off load the packages from the truck into storage (SmartStor database). Because of the nature of the tasks that the Enterprise Manager performs, there are currently limitations in the number of CPUs that the Enterprise Manager can use effectively. A minimum of 2 CPUs are required for optimum performance. However, the use of 4 CPUs increases performance by allowing more of the following:
number of applications per Collector number agents per Collector number of metrics that can be placed in metric groupings (if using a standalone
Enterprise Manager). More than 4 CPUs do not enhance performance. However, CA Wily recommends faster CPUs because each of the threads can then examine the data much faster. For the maximum limits on 4 CPU Enterprise Managers for matched metrics, see Matched metrics limits on page 79.
Another difference between J2EE servers and Enterprise Managers is in how they perform data processing. J2EE servers largely perform batch processing, while Enterprise Managers largely perform real-time processing. J2EE applications are batch processors. Work queues up and is handled as quickly as possible. As the machine slows down, the batch processes take longer and longer. In contrast, the Enterprise Manager, which has some batch processing functions (for example, responding to historical data query requests), handles most data flow in realtime. This means that the Enterprise Manager can take whatever time it needs to process incoming data, as long as it finishes within the 15-second harvest duration period. Once the Enterprise Manager takes longer than that time frame, it starts to combine data. Sizing a real-time system can be difficult because you need to size for the maximum load, not the average load on the machine. If you only size for the average load, then during maximum load times you'll lose data. More ways that Enterprise Managers perform additional work and have limitations that affect performance atypical of standard J2EE systems include:
Introscope Workstations provide different load characteristics than typical Web
clients. Workstations allow users to view live data in real time. Depending on the feature or data requested, a Workstation can be a continual tax on the Enterprise Manager even if no user is watching the console, as the Enterprise Manager continues to serve data. In contrast, if a user stops interacting with a browser-based Web application, the data/refresh requests typically stop.
Workstations can perform historical queries for data, which cause the
Enterprise Manager to retrieve data from storage. This can interfere with the Enterprise Manager's ability to effectively process and store incoming agent data due to disk contention. J2EE systems don't typically serve requests directly from databases or have disk contention issues.
The Enterprise Manager periodically reorders and reperiodizes stored data.
Incoming metric data is written sequentially to a spool file, which is reorganized and indexed once every hour. This reorganization process is a resource expensive (CPU and disk I/O intensive) operation that can interfere with the Enterprise Manager's ability to process and store incoming data. J2EE servers don't typically perform periodic intense housekeeping operations such as reperiodization.
Agents can experience metric leaks over time, without the user knowing, which
causes more data to be processed by the Enterprise Manager. Metric leaks occurs when the number of registered metrics being reported by agents is continually increasing. This means that a properly configured system can drift over time into a problem state.
An Enterprise Manager, for all configurations, should run AT MOST within 40%
to 50% CPU utilization range in a steady state. This provides the additional headroom necessary for periodic operations, such as SmartStor spooling, reperiodizing, and user Workstation requests (alert requests) that may
CA Wily Introscope
saturate the CPU. Typically J2EE systems can be run much closer to saturation because there are no hidden operations that can consume CPU above and beyond steady state. In the event the system is saturated, the J2EE server refuses incoming requests to alleviate the pressure. No other applications/processes should be running on an Enterprise Manager in order to avoid contention for system resources available to Enterprise Manager. Enterprise Managers (both Collectors and MOM) queue up incoming data query requests and aggregate the data as it is read in from SmartStor.
Introscope business logic handles the data collected in the monitoring operations and determines what will be done with the data. Introscope business logic operations include determining or handling the following:
total number of metrics groupings maximum number of metrics in a metrics groupings number of metrics persisted per minute calculators alerts management modules containing a lot of dashboards, calculators, alerts, and
so on
large numbers of reports Top N graphs.
tab, below)
Enterprise Manager health and supportability metrics (see About EM health and
supportability metrics on page 28) The Enterprise Manager generates and collects metrics about itself that are useful in assessing its health and determining how well it is performing under its workload. These are sometimes referred to as supportability metrics because these metrics help support the healthy functioning of the Enterprise Manager.
CA Wily Introscope
Number of Workstations
Custom Metric Host (Virtual) Custom Metric Process (Virtual) Custom Metric Agent (Virtual)(SuperDomain) Enterprise Manager
In a clustered environment, the MOM's metrics also appear under the tree path shown above. However, in a clustered environment, Collector supportability metrics show up in the same Custom Metric Host (Virtual) and Custom Metric Process (Virtual) path location, but the last name includes (CollectorHostName@PortNumber).
The Investigator tree with the MOM and one Collector looks like this:
Custom Metric Host (Virtual) Custom Metric Process (Virtual) Custom Metric Agent (Virtual)(SuperDomain) Enterprise Manager Custom Metric Agent (Virtual)(Collector1@5001)(SuperDomain) Enterprise Manager
For more information, see the Introscope Configuration and Administration Guide. When you deploy Enterprise Managers into your Introscope environment, you'll need to look at the Enterprise Manager health and supportability metrics to find out what's really happening in your monitoring solution. Harvest duration, Collector Metrics Received Per Interval, SmartStor spool file conversion, and Overall Capacity (%) are several of the more significant indicators of problems in an Enterprise Manager. For more information, see
Harvest Duration metric on page 29 Collector Metrics Received Per Interval metric on page 31 Converting Spool to Data metric on page 32 Overall Capacity (%) metric on page 33 Additional supportability metrics on page 38.
CA Wily Introscope
The Harvest Duration metric value should be less than 3000 ms [3 seconds] and should not exceed 7,500 ms [7.5 seconds]. The harvest operation usually causes the CPU activity to spike for the full harvest duration and the CPU is often almost idle for the rest of the 15 seconds. If the harvest duration is too long, investigate reducing the metric load on the overloaded Enterprise Manager by having agents report to separate Enterprise Managers or consider moving the Enterprise Manager to a platform with faster CPUs.
Tip
A large Collector Metrics Received Per Interval metric value, coupled with degradation of the cluster, indicates that the MOM has been asked to read too much metric data from the Collectors. This overloading is the result of some combination of the following:
too many Workstations connected too many queries (especially historical queries) being run user alerts and calculators set up to evaluate too many metrics
Although all resource loading issues combine to affect overall cluster performance, a large Collector Metrics Received Per Interval metric value, which reflects too many metric reads, is a different than a metric explosion (see Detecting metric explosions on page 84), which is the result of too many metric writes by the agents. This means, in particular, that reducing metric load on your Collectors may not solve issues on the MOM related to a high Collector Metrics Received Per Interval metric value. If your Collector Metrics Received Per Interval value seems too high, check the number of Workstations attached, and that most are in Live mode. If this fails to solve the issue, you should check to make sure you do not have alerts set up to evaluate too many metrics in the system. You can do this by searching and sorting by the value all metrics named:
CA Wily Introscope
If Collector Metrics Received Per Interval value continues to remain high after carrying out the suggestions above, you can also set the introscope.enterprisemanager.query.datapointlimit property in the EnterpriseManager.properties file to specify a maximum number of metric data points the Enterprise Manager will return from any single query. This read clamp ensures that user queries that accidentally match too much metric data do not negatively impact system performance. Important Clamping the Collector metrics prevents cluster degradation, but queries and alerts that are clamped do not fully evaluate all metrics they match.
When this task is running, the metric has a value of 1. When this task is not running, it has a value of 0. If this metric stays at a value of 1 for more than 10 minutes per hour, this indicates that reorganizing the SmartStor spool file is taking too long. This problem is often progressive. As the spooling time gets longer hour after hour, the Enterprise Manager usually becomes noticeably less responsive overall because the Enterprise Manager is putting more and more effort into reorganizing the spool file. For better performance, add more physical memory (RAM) to the machine. Adding more RAM can help increase the size of OS disk file cache and should reduce the amount of time the conversion task takes. The amount of RAM that will help varies between operating systems, however a good general rule is to dedicate 1 GB RAM for the OS disk cache. In general at full load, you should configure a Collector to use 1.5 GB heap memory. If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.
Additionally, a server host typically requires approximately 500 MB for the operating system (this varies based on hardware and OS). When SmartStor starts the re-spooling operation, the operating system starts reading the spool file into the file cache memory (which is part of the OS, not the Enterprise Manager Java virtual machine). If reading 200,000 metrics into memory, for example, the spool file will usually be over 1.5 GB. For optimum performance the file cache should be large enough to accommodate the entire spool file. So the host machine should have between 3 and 4 GB of physical RAM. Windows machines that are 32 bit use a fixed file cache limited to approximately 1 GB, whereas UNIX systems generally have a configurable file cache limit. This must be physical memory not virtual memory (swap space). Enterprise Manager performance degrades dramatically if the host machine starts paging to and from virtual memory. For more information about the converting spool to data task, see About SmartStor spooling and reperiodization on page 40.
The Overall Capacity (%) metric is computed in part from the following metrics, which you can find at this location in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Health
CPU Capacity (%) (added into the computation in Release 8.0). See Additional
CA Wily Introscope
The Overall Capacity (%) metric is more valuable over a long period of time rather than for a specific 15-second time slice. Since the Overall Capacity metric is based on real-time metrics, you may see the Overall Capacity value spike quite a bit higher than 100% because, for example, the hardware's I/O subsystem could be briefly overloaded. However, the Enterprise Manager tends to recover from these spike situations automatically if they are not long-lasting. In general, a spike (for example, to 200%) isn't cause for concern if it's only for a brief moment, but over a long period of time, the Overall Capacity should ideally average about 75%. However, generally if the Overall Capacity value is 50%, then you should be able to double the load (+/- 15%) to get see a 100% capacity value. Note SmartStor hourly and nightly conversion times are not factored into the Overall Capacity metric, however hourly and nightly operations do affect how much metric load the Enterprise Manager is capable of handling. During time periods that the Overall Capacity (%) metric spikes to high values (for example 600%), at least one of the other metrics listed above should also show a spike. Investigating and understanding the source of the secondary spike might help pinpoint the root cause of the resource issue. For example, the problem might be found by looking at the Heap Capacity (%) metric, which feeds into Overall Capacity (%) metric. See Heap Capacity (%) metric, below.
Investigator tree. For more information see About EM health and supportability metrics on page 28.
examine the perflog.txt file.
The Investigator tree Enterprise Manager health and supportability metrics are easy to view and interpret, so this is first place you should look to understand your Enterprise Managers current health. Perflog.txt is often valuable to CA Wily Support. Several examples of how you can use the perflog.txt file are provided in the topics below.
Harvest Duration
You can find the Harvest.HarvestDuration metric value in perflog.txt, as shown in the figure below. Note This figure shows perflog.txt output in verbose mode. By default, perflog.txt is generated in a compacted mode.
CA Wily Introscope
SmartStor Duration
You can find the Smartstor.Duration metric value in perflog.txt as shown in the figure below. Note This figure shows perflog.txt output in verbose mode. By default, perflog.txt is generated in a compacted mode.
If you want to know how many events the Enterprise Manager received from agents for an interval, add the performance.transaction.num.inserts.per.interval metric plus the Performance.Transactions.Num.Dropped.Per.Interval metric. Although one would expect the values for the
If, for example, at one sample time the number of inserted events is 500, this implies that the Transaction Trace insert queue should have a positive value and you would expect to see a value of 500 as well for the Performance.Transactions.TT.Queue.Size metric. However, by the time the Transaction Trace insert queue is sampled, it can be empty and record a sample number of zero.
CA Wily Introscope
Description
Same as EM CPU Used (%) (see below). Duplicated to easily relate to Overall Capacity (%) metric, which now takes into account this metric. The number of currently connected agents. The Enterprise Manager's perflog.txt file records and reports the number of actual agents connected in the Agent.NumberOfAgents metric value.
Number of Agents
Enterprise Manager|CPU
The percent of the total available CPU was used by running Enterprise Managers during the time period specified.
Enterprise Manager|Health
Description
The metric load on an Enterprise Manager. When an agent disconnects, this number drops. Percent of time needed for the SmartStor write process in a 15000 ms (15 second) time slice, where 100% is the full 15 seconds. For example, if the SmartStor write duration is 15000 ms, then this metric value is 100. The duration of SmartStor Capacity (%) metric time (see above) spent writing metadata. If this metric value doesnt change proportionately as the SmartStor Capacity (%) metric value increases or decreases, there may be an issue with the file system.
Enterprise Manager|Health
Data Store|SmartStor|MetaData
CA Wily Introscope
SmartStor overview
Introscope 7.1 included significant optimizations in disk read/write synchronization that take advantage of a dedicated SmartStor disk. All performance improvements and sizing increases starting with Introscope 7.1 depend on those optimizations. SmartStor writes to disk data supplied from agents sent to the Enterprise Manager/Collector first, and performs all other operation after that. For example, if 10 users are running large historical queries (over 1000 metrics/query) at the same time, an Enterprise Manager performs more slowly. The users experience sluggish Workstation response time is because SmartStor is simultaneously writing new agent metric data, running extensive user queries, doing reports, and converting files to the faster query file format. The Workstation queries are slow (or metric data is aggregated) due to the disk being overloaded.
Reperiodization is both I/O and CPU intensive, as the data archive files are read, the data is compacted by aggregating multiple time slices, and then the resulting data is written back to SmartStor. This means that the period after midnight is the busiest time for an Enterprise Manager. The entire reperiodization process should not take more than two hours. During this time, no other Enterprise Manager operation such as report generation (see Report generation and performance on page 43) or OS-level operation should be scheduled. Note If the Enterprise Manager is stopped in the middle of reperiodization, it will, upon restart, delete the partially written files and restart reperiodization after 45 minutes. This restart may not occur during the regularly scheduled reperiodization time. The 45 minute delay allows the system to register all its agents and metrics before launching the restart of this compute-intensive reperiodization task. SmartStor spooling and reperiodization can be verified in the Enterprise Manager log in verbose mode, which records that the spooling process starts at the top of the hour. Under standard conditions, within 10 minutes, a second recorded message reports that the spooling process has completed. In addition there are three SmartStor management metrics, which you can find at this location in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Data Store | SmartStor | Tasks.
As shown in the figure below, the three tasks that are monitored are:
Spool to Data Conversion Data Appending Reperiodization
SmartStor overview 41
CA Wily Introscope
These tasks have metric values that oscillate from 0 to 1 when the respective task is running. You can see when those tasks are running and how long they are taking by selecting a task in the tree, then picking an appropriate time from the Time Range drop down list in the Viewer pane. Top of the hour problems are generally related to slow SmartStor spooling. Early morning (after 6 A.M.) problems are usually due to reperiodization not being completed quickly enough. This usually implies that the Enterprise Manager is excessively loaded. For more information, see EM OS disk file cache memory requirements on page 47.
Reports that are either larger than 50 graphs or longer than 24 hours should not be scheduled during the hours when SmartStor is reperiodizing (usually midnight to 3:00 A.M.) because of high CPU activity and the large amount of disk activity.
SmartStor overview 43
CA Wily Introscope
First, avoid using SmartStor and flat file archiving at the same time. Flat file archiving duplicates some of the functionality of SmartStor. In addition, flat file archivings compression feature (if enabled) requires noticeable CPU resources that can adversely affect the Enterprise Managers performance when the compression feature periodically runs. In the event that flat file archiving must be used, configure the smallest possible number of metrics to be logged. Second, do not use flat file archiving in production. Readable metric values are most useful in a QA debug environment. Third, SmartStor should not be located on the same disk as a flat file archive. SmartStor should be on its own dedicated disk. For more information, see SmartStor settings and capacity on page 55.
MOM overview
MOMs are CPU intensive, in contrast to Collectors, which are I/O and CPU intensive. For more information about MOM requirements, see MOM and Collector EM requirements on page 51 and Collector and MOM settings and capacity on page 58.
Collector overview
Collectors are I/O intensive, and perform most of Introscope's difficult and intensive calculation processing work. Cluster performance is dominated by the Collectors. Given the synchronous communication model between MOM and Collectors, the responsiveness of a MOM (in terms to data refresh to the Workstation) is related to responsiveness of the Collectors. Any performance problems causing response problems in the Collector will be magnified by the MOM. For more information see, Collector to MOM clock drift limit on page 71. If upgrading a Collector from 6.x to 8.0, as long as there is a dedicated disk for SmartStor and Boundary Blame is turned on, there should be enough resources left over on the same host to handle the new functionality including metric baselining (heuristics) and creating virtual agents. If you need to migrate a 6.x Enterprise Manager to become an 8.0 Collector, see Related Knowledge Base article(s):
Migrating a 6.x Enterprise Manager to an 8.0 Collector (KB 1630)
Collector overview 45
CA Wily Introscope
platform. More CPUs will not improve performance. An Enterprise Manager with fewer CPUs than recommended results in the system performing poorly.
All Enterprise Managers need a minimum of 3 GB OS RAM to effectively run at
for SmartStor with no other processes competing for it. After those basic requirements, system performance is determined by the speed of the CPUs, the speed of the I/O subsystems, and the file cache performance. WARNING The recommendations for maximum metrics/Enterprise Manager, agents/Enterprise Manager, physical memory, and so on, should be strictly followed. If you are seeing less CPU utilization than the recommended maximum threshold (at full metrics load), it is NOT a reason to add additional load (above CA Wily recommendations) to the Collector. In general, metrics load is highly I/O bound rather than CPU intensive, so even with CPU cycles available, the Enterprise Manager can get I/O bound on metric data and the whole system can start slowing down.
CA Wily Introscope
If your hardware allows it, CA Wily recommends running the OS in 64-bit mode to take advantage of the large file cache. The file cache is important for the Enterprise Manager when doing SmartStor maintenance like spooling and reperiodization. This cache resides in physical RAM, and is dynamically adjusted by the OS during runtime based on available physical RAM. Therefore, our recommendation is for 4 GB RAM. As general guidance, each Enterprise Manager should have about 1.5 GB of OS file cache available in its memory. Top of the hour problems are usually related to SmartStor spooling which are best addressed by additional physical memory, especially disk file cache. The biggest single influencing factor for SmartStor spooling is the file cache size. Typically, 32-bit Windows allows a file cache just under 1 GB, and typically SmartStor spooling files for a full load are closer to 2 GB. That difference in size causes performance pressure. In providing a larger OS file cache, you are providing a large enough Enterprise Manager file cache to allow the OS to read the entire spool file into memory, then process the profile and dump it straight back out into the SmartStor archive as a data file.
RAM Total Example Enterprise Manager GC Flag Settings (GB) Metrics Monitored
2 90,000 lax.nl.java.option.additional=-server -Xms512m Xmx512m -showversion -XX:+UseConcMarkSweepGC XX:+UseParNewGC -XX:+DisableExplicitGC XX:NewSize=128m -XX:MaxNewSize=128m XX:PermSize=64m lax.nl.java.option.additional=-server -Xms800m Xmx800m -showversion -XX:+UseConcMarkSweepGC XX:+UseParNewGC -XX:+DisableExplicitGC XX:NewSize=128m -XX:MaxNewSize=128m XX:PermSize=64m
210,000
RAM Total Example Enterprise Manager GC Flag Settings (GB) Metrics Monitored
3 400,000 lax.nl.java.option.additional=-server -Xms1400m Xmx1400m -showversion -XX:+UseConcMarkSweepGC XX:+UseParNewGC -XX:+DisableExplicitGC XX:NewSize=128m -XX:MaxNewSize=128m XX:PermSize=64m lax.nl.java.option.additional=-server -Xms1500m Xmx1500m -showversion -XX:+UseConcMarkSweepGC XX:+UseParNewGC -XX:+DisableExplicitGC XX:NewSize=128m -XX:MaxNewSize=128m XX:PermSize=64m
500,000
If you are operating a high-performance Introscope environment, contact CA Wily Professional services for the appropriate Enterprise Manager JVM heap settings.
SmartStor requirements
Each EM requires SmartStor on a dedicated disk or I/O subsystem
In Introscope 7, significant performance improvements were made in SmartStor that freed up CPU resources for other features such as virtual agents, calculators, Transaction Tracing and sampling, and applications with associated heuristic calculations (baselining). What matters to SmartStor is concurrent I/O throughput and how many disk spindles are servicing those requests. Having SmartStor on a second dedicated disk is required to take advantage of these enhancements. Point the SmartStor location to a separate dedicated disk or disk-array than the Transaction Event database (traces.db) and metrics baseline (heuristics) database (baselines.db). Verify that the SmartStor file persistence is actually going to that different disk. Ensuring that the SmartStor data directory is on its own disk is the top solution to many Introscope performance issues. When SmartStor is not on its own dedicated disk, the first indication that there is a problem is when there are SmartStor spooling problems. For more information, see About SmartStor spooling and reperiodization on page 40. Note For information about a spreadsheet to help you determine your SmartStor disk requirements, see the Introscope Configuration and Administration Guide.
SmartStor requirements 49
CA Wily Introscope
Under standard Enterprise Manager conditions, the average Smartstor Duration value should be less than 3500 ms (3.5 sec). The Smartstor Duration value MUST be less than 15,000 ms (15 sec). If this metric value is greater than 15 seconds this indicates a critically overloaded EM. For more information, see Enterprise Manager health on page 27 and the Introscope Configuration and Administration Guide.
clock (see Collector to MOM clock drift limit on page 71). For optimal Workstation responsiveness, the ping metric, which is reported by the MOM for each Collector each time slice, should be less than 500 ms. Note The Introscope ping metric monitors only the lower boundary of the round-trip response time from the MOM to each Collector. This ping time is not the same as the network ping time, which is the sending of an ICMP echo request and getting an echo response. To view the ping metric, use the Search tab to view metrics named ping in the supportability metric section of the Investigator tree. You will find a ping metric reported for each Collector. Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. Ping times over the 10 second threshold cause the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 2. You can adjust this threshold for your environment by changing the
CA Wily Introscope
In Introscope 8.0, there is an additional ping time threshold of 60 seconds. If the ping time exceeds this value, the MOM automatically disconnects from the Collector associated with the slow ping time. This prevents the entire cluster from hanging, which is a side effect of when one Collector in a cluster is greatly underperforming. A disconnected Collector causes the Enterprise
Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 3. You can adjust this threshold for your environment by changing the introscope.enterprisemanager.clustering.manager.slowcollectordis connectthreshold property in the IntroscopeEnterpriseManager.properties file. For more information, see
the Introscope Configuration and Administration Guide. Tip You can set an alert on the Enterprise
Manager|MOM|Collectors|<host@port>:Connected metric value. For more information on creating and configuring alerts, see the Introscope Configuration and Administration Guide.
When a Collector disconnects from the MOM, the metric flow from that Collector to the MOM stops. This means you will see a data gap in the Workstation metric reporting. However, the Collector is still gathering and persisting the agent metrics. When the Collector reconnects to the MOM, you can run a historical query to see the metrics reported during the disconnected period.
data coming from agents. For more information, see SmartStor overview on page 40. The SmartStorSizing8.0.x.y.xls spreadsheet, which is located in the <Introscope_Home>/examples directory, can help you determine your SmartStor disk space requirements. For information about using the spreadsheet, see the Introscope Configuration and Administration Guide.
traces.db contains all Transaction Traces and events data, such as error
snapshots. This database spans multiple files. One file is created per day and this data is kept for the number of days specified in the IntroscopeEnterpriseManager.properties file. In the example file snippet below, the daily file is stored for 14 days.
introscope.enterprisemanager.transactionevents.storage.max.data. age=14
baselines.db stores all of the Introscope metrics baselining (heuristic) data in
a single file. The traces.db and baselines.db databases collect and maintain data at different rates. Therefore, to determine the database disk space needs for your Enterprise Manager you will have to perform disk space calculations for traces.db and baselines.db separately, then sum the two calculations.
CA Wily Introscope
This baseslines.db calculation example makes the following assumptions: Note The number below are only examples; they are NOT provided as recommendations for any or all Introscope environment(s).
Nodes/each Overview dashboard = 100 (which is very big) Heuristics/node = 3 Objects generated by each heuristic (in steady state) = 2 (objects/hr/
objects/agent/week NOTE: Baselines roll over at weekly boundaries. Every baseline is stored for 30 minute increments across a week. Once you roll into the next week, the baseline data is loaded from the last week and then is updated with this weeks data.
# agents reporting data to the Enterprise Manager = 200 Baseline objects/agent/week = 100,000 (from the calculation above) Bytes/baseline object = 100
MB/agent = 100,00 baseline objects/agent/week x 100 bytes/object = 10 MB/ agent/week The baselines.db file size = 10 MB/agent x 200 agents = 2 GB.
introscope.enterprisemanager.smartstor.dedicatedcontroller=true
Providing a separate disk for each SmartStor AND setting the dedicated controller property to true affects the total number of metrics an Enterprise Manager can handle because these allow for better sharing of disk resources. This allows for a number of performance enhancements including:
CA Wily Introscope
larger virtual agents can be created. agents can report a larger number of applications. more calculators can be used. more Management Module logic is possible. Workstation responsiveness is faster.
The dedicated controller property is set to false by default. You MUST provide a dedicated disk for SmartStor in order to set this property to true; it cannot be set to true if there is only a single disk for each Collector. The reason is that with a single disk for Collector operations AND SmartStor, context switching would be performed on the disk level (rather than the software level). This could cause severe Collector and possibly OS performance problems. When the dedicated controller property is set to false, the Collector assumes that there is one disk for all Enterprise Manager operations, and therefore uses one disk-writing lock. This means that only one area at a time is written. For example, the Collector will write only to SmartStor or only to the heuristics database that supports the Investigator Overview dashboard. Performance disadvantages to having the dedicated controller property set to false are:
Only one I/O task can be running at a time. SmartStor writes are in shorter segments. The disk's seek pointer is invalidated after each context switch. If there is a second disk for SmartStor, but the property is set to false, there
When the dedicated controller property is set to true, the Collector uses two locks: one lock is dedicated to SmartStor, and the second lock is for everything else. Performance advantages to setting the dedicated controller property to true include:
SmartStor I/O tasks can run concurrently with other I/O tasks, which improves
(heuristics) database (baselines.db), which stores metrics baseline data. For instructions about how to set the SmartStor dedicated controller property to true, see the Introscope Configuration and Administration Guide.
If Redundant Array of Independent Drives/Disks (RAID) configuration is desired, CA Wily recommends RAID 0 or RAID 5. Each SmartStor database MUST reside on its own dedicated RAID setup. All the restrictions above apply to all the varied storage choices available (local disks, external storage solutions such as SAN, and so on). The SmartStor requirement for a separate disk/controller DOES NOT mean that a separate host adapter (such as fiber channel adapter, SCSI adapter, and so on) is required. It only means that a separate dedicated, physical disk or RAID setup is used for each SmartStor database. To determine if a machine being considered for use as SmartStor is a single dedicated disk or drive, you may need to determine if the machine has multiple controllers (same as multiple hard drives). It's important to understand that multiple partitions on the same drive share a controller, which is not an appropriate environment for the SmartStor instance. You can use commands like du (for disk usage) on Unix/Linux or Windows Device Manager to determine whether two drives are logically different or physically different. It's critical that the drives are physically different.
CA Wily Introscope
For example, if you are running five Enterprise Managers on an 8 CPU quad core machine, each Enterprise Manager bases the size of its thread pools on the 32 available CPUs. This configuration can reduce throughput due to context switching as the threads from all five Enterprise Managers contend for the 32 available CPUs. In Introscope 8.0, the Enterprise Manager properties file (IntroscopeEnterpriseManager.properties) includes the new available processors (CPUs) property to tell the Enterprise Manager how many processors (CPUs) it can expect to have available:
introscope.enterprisemanager.availableprocessors=
See the Introscope Configuration and Administration Guide for more information about setting this property. To continue the example, in the case where there are five Enterprise Managers on a host machine with 32 CPUs, you would allocate six processors for each Enterprise Manager. Youd then set the available processors property to six as shown:
introscope.enterprisemanager.availableprocessors=6
information, see Collector to MOM clock drift limit on page 71. Important You must run time server software to synchronize at regular intervals the clocks of all the machines in the cluster. Time server software synchronizes the time on a machine with either internal time servers or internet time servers.
the system may take longer to start up. increased likelihood that a misbehaving Collector affects the entire cluster.
Note In a clustered environment, a single Collector that is performing poorly can make it appear as if the entire cluster is performing poorly. For these reasons, a single MOM should be connected to a maximum of 10 Collectors.
CA Wily Introscope
It is important to ensure that every Collector is running smoothly because any individual nonresponsive Collector causes the entire system to lock up until the Collector either responds, drops its connection, or the MOM times it out (see Local network requirement for MOM and Collectors on page 51). This is because SmartStor data is held on the Collectors, not on the MOM. So to retrieve query or alert information, the MOM must wait for every Collector to respond with its portion of the result before sending the combined query or alert data response back to the Workstation. The Workstations, in turn, are delayed waiting for the MOM's compiled data to display. The responsiveness of a cluster is therefore the response of its slowest connected Collector. In contrast, a single standalone Enterprise Manager has no outside dependencies.
Or you could set up four Collectors to handle 200,000 metrics each (totalling 800,000 metrics) and the remaining five Collectors to handle 40,000 metrics each (totaling 200,000 metrics), for a MOM-Collector system total of 1,000,000 metrics. In order for Introscope to support 1,000,000 metrics on the MOM, you need to configure the MOM and meet specific JVM requirements on each clustered Collector. See Configuring a cluster to support 1,000,000 MOM metrics on page 61.
IntroscopeEnterpriseManager.properties file.
a On each Collector machine in the cluster, go to the <Introscope_Home>/ config directory and open the IntroscopeEnterpriseManager.properties file. b Add this transport property value as shown:
transport.outgoingMessageQueueSize=4000
c Save and close the IntroscopeEnterpriseManager.properties file. 3 Run each Collector in the cluster on a 32-bit JVM with 1.5 GB heap size. The required Collector configuration as well as MOM and Collector JVM sizing requirements are complete. For MOM sizing examples, see Sample Introscope 8.0 MOM sizing limits table on page 122. For Collector sizing examples, see Sample Introscope 8.0 Collector sizing limits table on page 119.
CA Wily Introscope
CA Wily Introscope
An agent is only assigned to a Collector that supports the same connection type that the agent uses to connect to the MOM. For example, if the agent connects to the MOM using HTTP, then the Collector must have enabled HTTP connections.
Configuration done in the loadbalancing.xml file
You fill out the loadbalancing.xml file to restrict agents to a specific set of Collectors, or exclude agent from a specific set of Collectors. For more information, see the Introscope Configuration and Administration Guide.
Agent connection history with a specific Collector
To prevent an explosion of SmartStor data as 8.0 agents are transferred from one Collector to another in the cluster, if an 8.0 agent has connected to a Collector previously, the MOM favors that Collector for future connections unless there is an alternative Collector that is underloaded or a favored Collector is overloaded. Note Pre-8.0 agents do not connect to the MOM; instead, they must connect directly to a Collector.
introscope.enterprisemanager.clustering.login.em1.weight
property name, em1 is an arbitrary identifier. Each Collector has a unique identifier. Provide an appropriate identifier for your environment.
introscope.enterprisemanager.clustering.login.em1.weight property
is a positive number that controls the relative load of the Collector. If the factors affecting how the MOM assigns agents to a Collector (see Determining how MOMs assign agents to Collectors on page 64) are not providing a different agent connection decision, then the weight load on a specific Collector divided by the total weight load of the cluster is the percentage of the metric load assigned to that Collector. The MOM then uses weight-adjusted metric counts when assigning agents to Collectors and when rebalancing the agent metric load. For example, a MOM connects to three Collectors that all have zero metrics currently being reported. If Collector A has a weight of 150, Collector B has a weight of 100 and Collector C has a weight of 50, then the MOM assigns metrics to Collectors A, B, and C approximately in the ratio of 3:2:1.
introscope.enterprisemanager.loadbalancing.threshold property,
choose a number of metrics that prevents the MOM from constantly reallocating agents. When the MOM disconnects an agent from one Collector and assigns it to another, overhead is added to the cluster. When load rebalancing is needed, the added overhead is fine. However, unnecessary rebalancing adds unnecessary overhead to the cluster, which can diminish system performance. When a cluster is a little unbalanced, there is not a negative effect on performance because there is a certain amount of flux. An appropriate introscope.enterprisemanager.loadbalancing.threshold property level is a value at which the MOM brings agents into balance by making fewer, but larger, adjustments, which is better for system performance than more, but smaller, adjustments.
CA Wily Introscope
CA Wily Introscope
CA Wily Introscope
So when your Collector approaches the metrics limit and you see the warning messages described above, add a new Collector to your system. Note If you are running a standalone Enterprise Manager that is approaching the metrics limit, you may need to implement a clustered environment.
In Introscope 8.0, the Enterprise Manager can take advantage of additional CPUs to increase the maximum agents limit. The Enterprise Manager must be using 4 CPUs or Cores to take advantage of the increased Collector agents capacity (see the Sample Introscope 8.0 Collector sizing limits table on page 119). This limits are dependent on the specific hardware in use. The number of currently connected agents is available as the Number of Agents Enterprise Manager health and supportability metric, which you can find in the Investigator tree at this location:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Connections | Number of Agents.
An overloaded Enterprise Manager starts to combine metrics, so once you approach the agent limit, add a new Collector. An inappropriately configured agent can create thousands of metrics in quick succession and overload the Enterprise Manager. To prevent this, the Enterprise Manager uses a metric clamp. For information about metric clamping, see Metric clamping on page 96.
CA Wily Introscope
Important You must run time server software to synchronize at regular intervals the clocks of all the machines in the cluster. Time server software synchronizes the time on a machine with either internal time servers or internet time servers. Workstation sluggishness or unresponsiveness is rarely caused by a problem in the Workstation or MOM. It is usually caused by a single unresponsive Collector, which propagates to the MOM and then the Workstation, and is magnified when Collectors are clustered. One way to determine which Collector is slowing the system down is to look at the round-trip response time from MOM to each Collector. Each Collector has a ping metric that represents the MOM to Collector round-trip response time, and for optimal Workstation response time, should be less than 500 ms on average. This is equivalent to the GetEvent metric in Introscope 7.0. The ping metric shows how quickly the Collectors are responding to messages from the MOM. Note The Introscope ping metric monitors only the lower boundary of the round-trip response time from the MOM to each Collector. This ping time is not the same as the network ping time, which is the sending of an ICMP echo request and getting an echo response. The ping metric is a good way to diagnose which Collector is responding slowly. Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. If the ping time is above the 10 second threshold for extended periods of time, investigate the overall health of the Collector that is reporting the slower ping time. Check for obvious signs that this Collector is overloaded, such as the Collector is combining time slices or receiving very large numbers of events. For more information about Collector health, the ping time threshold, and the ping metric, see Local network requirement for MOM and Collectors on page 51.
The sizing guideline provided for any hardware configuration assumes that no other processes are running on the host. If for example, the Sample Introscope 8.0 Collector sizing limits table on page 119 states that a Collector running on a 2 CPU Xeon can handle 500,000 metrics, that assumes there is no other server or database process running on the machine. This is true for any background process, but is especially important for processes that might impact the disk I/O performance or have a large memory footprint. The Collector doesn't like contention for its disk or the memory resources. This is a significant factor in many performance problems.
I/O contention with SmartStor and other processes including the Enterprise
Very large virtual agents or poorly configured virtual agents with a lot of metrics will start to use up the CPU resources. The two biggest CPU drains are metrics baseline (heuristics) and virtual agents because of the large amount of calculation involved in both.
Large Transaction Traces are running continuously.
The process of accepting and persisting events like Transaction Traces involves deserialization and indexing, which are very CPU intensive. A very large number of Transaction Traces uses a lot of Collector CPU resources.
CA Wily Introscope
Number of EM instances/Server 1 Server Type and Model Operating System CPU Physical RAM Disk I/O Subsystem Windows Server 2003 Windows (running in 64-bit mode for optimum file cache size) Two to four Intel Xeon CPUs @ 2.8 GHz 4 GB The OS resides on a separate physical disk. RAID 0 or RAID 5 configuration. Drive Speed: 10k RPM or greater
Enterprise Manager). You can run multiple Collectors on one machine as long as you follow these requirements:
Run the OS in 64-bit mode to take advantage of a large file cache.
The file cache is important for the Collectors when doing SmartStor maintenance, for example spooling and reperiodization. File cache resides in the physical RAM, and is dynamically adjusted by the OS during runtime based on the available physical RAM. CA Wily recommends having 3 to 4 GB RAM per Collector.
There should not be any disk contention for SmartStor, meaning you use a
separate physical disk for each SmartStor instance. If there is contention for SmartStor write operations, the whole system can start to fall behind, which can result in poor performance such as combined time slices and dropped agent connections.
The Baseline.db and traces.db files from up to four Collectors can reside on a
separate single disk. In other words, up to four Collectors can share the same physical disk to store all of their baseline.db and traces.db files.
CA Wily Introscope
CHAPTER
This chapter provides background and specifics to help you understand sizing and performance-related metrics requirements, settings, and limits for your Introscope system. In this chapter youll find the following topics: Metrics background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 78 79 79 80 80 80 80 81 81 81 82 82 83 84 84 85 91 92 94 96 96 98
About metrics groupings and metric matching. 8.0 metrics setup, settings, and capacity Matched metrics limits . . . . . .
Inactive and active metric groupings and EM performance . SmartStor metrics limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performance and metrics groupings using the wildcard (*) symbol Virtual agent metrics match limits
About alerted metrics and slow Workstation startup . Detecting metrics leaks . Metrics leak causes . . . Finding a metrics leak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Metrics for diagnosing a metrics leak Detecting metric explosions Metric explosion causes . . Finding a metric explosion .
How Introscope prevents metric explosions SQL statements and metric explosions . SQL statement normalizers Metric clamping . . . . . . . . . . . . Enterprise Manager dead metric removal
CA Wily Introscope
Metrics background
Every 15 seconds, the metrics harvest cycle takes place on the Enterprise Manager. During this process, the two sets of metrics data reported by agents are aggregated by the Enterprise Manager. This time slice data is processed to perform calculations, check alerts, update heuristics, update Workstation views and are persisted to disk by SmartStor. Typically at load levels close to the limits recommended in the Sample Introscope 8.0 Collector sizing limits table on page 119, the harvest duration time takes no more than about 3 to 4 seconds. The metrics limit that an individual Collector can handle is influenced by the CPU speed. As discussed in EM basic requirements on page 20, CA Wily recommends two to four dedicated CPUs per Collector (depending on hardware platform). Additional dedicated, physical CPUs won't increase the number of metrics and agents a Collector can handle. However, faster CPUs may help increase the Collector's maximum capacity. Introscope business logic is handled by the following:
total number of metrics groupings maximum number of metrics in a metrics groupings number of metrics persisted per minute.
Understanding metric groupings and metric matching, then following the guidelines discussed in Matched metrics limits on page 79 can be helpful in avoiding performance problems.
1 Introscope business logic monitors including include Alerts, Dashboards, and the Workstation Investigator tree want data from the Enterprise Manager. 2 Introscope business logic monitors request metric data using a metric group. For example an Enterprise Manager gets the Workstation request, Give me the data for the Servlets metric group. 3 When the data query is submitted, the Enterprise Manager scans all metrics to see which match the metric group Servlets. Those metrics are then subscribed to. 4 Every 15-second harvest cycle, the metrics that are subscribed to have their 15-second time slice data routed to the subscribing Introscope business logic monitor. The total number of metrics that the Collector must assess during each time slice can easily become so big that it cant process all the business logic youve defined for all your metrics in the 15 second harvest cycle period. This situation can lead to performance problems. Therefore, CA Wily recommends that the total number of metrics to be placed in metrics groupings is no more than 15% of the metrics limit if you are running a 2 CPU Collector. And no more than 30% of the metrics limit if are running a 4 CPU Collector. For example metrics limits, see Sample Introscope 8.0 Collector sizing limits table on page 119.
It doesnt matter how many alerts that youve set. What matters is how many metrics are matched. CA Wily recommends that the total number of metrics to be placed in metrics groupings is no more than 15% of the metric limit if the Enterprise Manager is using its minimum requirement of 2 CPUs. If an Enterprise Manager is using 4 CPUs, this limit increases to 30%. For example limits, see Sample Introscope 8.0 Collector sizing limits table on page 119. Note If you are using standalone Enterprise Managers, you define metrics groupings on the Enterprise Manager. However, if you are using clustered Collectors, set up metrics groupings on the MOM.
CA Wily Introscope
CA Wily Introscope
A metrics leak happens when a metric produces data for a very short period of time, and then never produces data again. This happens when part of the metric name includes something transient, like a session key or a SQL parameter. Note A metric explosion happens when an agent is inadvertently set up to report more metrics than the system can handle. In this case, Introscope is bombarded with such a large number of metrics that performance gets very slow or the system cannot function at all. For more information, see Detecting metric explosions on page 84.
The SmartStor metadata save time is recorded and stored in the Enterprise Manager log, as shown in the log snippet below. In this case, it took 86209 ms (86 seconds) to save this piece of metadata. This long saved metadata time is a strong indication of a metrics leak problem.
Description
Replaces the previous metadata metric and renames it to better convey the notion that this is the number of metrics known to SmartStor. The number of agents that the metadata knows about that have data in SmartStor. The number of agents that the metadata knows about that have no data in SmartStor. The number of partial metrics (metrics under an agent node in the Investigator) that the metadata knows about that have data in SmartStor. The number of partial metrics (metrics under an agent node in the Investigator) that the metadata knows about that have no data in SmartStor.
You will not solve your metrics leak until you identify the cause of the leaking metrics and plug it. Contact CA Wily Support if you are unsure about how proceed with fixing your metrics leak.
CA Wily Introscope
on page 92.
a large number of unique SQL statements. See How poorly written SQL
page 85.
JMX serverid
Metric explosion due to JMX serverid occurs when there are JMX filter strings given to WebLogic that produce metric names that include a serverid= <int>, where the integer is a unique number for each WebLogic run. This can result in thousands of new metrics with each server restart. In this situation, for example, after several weeks the SmartStor metadata can be in excess of 500 K dead metrics, although the actual metric count should have been no more than 25K. See Metrics leak causes on page 82 and Finding a metric explosion on page 85.
The JDBC URL is formatted into the SQL metric names database name
is not used, then every unique URL generates a different node of metrics. See Knowledgebase Article 1112.
minutes
High CPU utilization (often above 50%) Disk usage is not necessarily higher than usual a very large number of agent metrics being generated, for example more than
reached, and that no more metrics will be accepted. If you have these symptoms, chances are that you have a metric explosion situation. The SmartStor metadata save time is recorded and stored in the Enterprise Manager log, as shown in the log snippet below. In this case, it took 31701 ms (31 seconds) to save this piece of metadata. This long saved metadata time is a strong indication of a metric explosion problem.
CA Wily Introscope
You can find this metric under the Custom Metric Agent (Virtual) node in the Investigator tree; it will look similar to this:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)| Agent | Metric Count
Note Before Introscope 8.0, the Metric Count metric node was located under the Agent Stats node.
You can also configure the EM Capacity Dashboard to access the current metric count and the metric count from the top five agents.
Note You must configure the EM Capacity dashboard before use, as it does not automatically contain links to underlying data. For information about creating and editing custom links, see the Introscope Configuration and Administration Guide.
CA Wily Introscope
CA Wily Introscope
When you hover the cursor over the data points, you can get more metric information, as shown in this Historical metrics example.
results in little or no build up of unused metrics on the agent and Enterprise Manager. See About agent metric aging on page 91.
SQL statement normalizers. See SQL statements and metric explosions on
page 92.
Unused metrics are regularly removed from the Enterprise Manager. See
CA Wily Introscope
In this case, you can update your agent metric aging properties so that they use less system overhead. Update the introscope.agent.metricAging.numberTimeslices property and increase its value. In addition, avoid reporting metrics that need to be removed and then turned on again. For example, you could stop reporting a SQL statement metric that gets invoked every two hours when the associated dead metric ages out every hour. Second, if Introscope checks too many metrics during each heartbeat, this can reduce performance. In this case, you might not see agent metrics being aged and removed, however, during each heartbeat metric review, Introscope checks metrics for possible metric removal. This adds to performance overhead. In this case, update the introscope.agent.metricAging.dataChunk property to a lower number so that Introscope checks fewer metrics for metric removal during each heartbeat metric review. You can also decrease the heartbeat frequency by reducing the value of the introscope.agent.metricAging.heartbeatInterval property, so that Introscope checks for metric removal less often. For information about configuring agent metric aging properties, see the Java Agent Guide or the .NET Agent Guide.
Note that the comment is part of the metric name. While the comment is useful for the database administrator to see who is executing what query, the SQL Agent does not parse the comment in the SQL statement. Therefore, for each unique user ID, SQL Agent creates a unique metric, potentially causing a metric explosion. The database that executes the SQL statements does not see these metrics as unique because it ignores the comments. This problem can be avoided is by putting the SQL comment in single quotes, as shown:
"/*' John Doe, user ID=?, txn=? '*/ select * from table..."
The SQL Agent then creates the following metric where the comment no longer causes a unique metric name:
Example 1
In looking in Investigator at this path under an agent node
Backends|{backendName}|SQL|{sqlType}|sql
you notice that temporary tables are being accessed like this:
Example 2
You have been alerted to a potential metric explosion and your investigation brings you to a review of this SQL statement:
#1 INSERT INTO COMMENTS (COMMENT_ID, CARD_ID, CMMT_TYPE_ID, CMMT_STATUS_ID, CMMT_CATEGORY_ID, LOCATION_ID, CMMT_LIST_ID, COMMENTS_DSC, USER_ID, LAST_UPDATE_TS) VALUES (?, ?, ?, ?, ?, ?, ?, "CHANGE CITY FROM CARROLTON, TO CAROLTON, _ ", ?, CURRENT)
In studying the code, you notice that "CHANGE CITY FROM CARROLTON, TO CAROLTON, _ " recurs as a dizzying array of cities.
CA Wily Introscope
Example 3
You have been alerted to a potential metric explosion and your investigation brings you to a review of this SQL statement:
CHANGE COUNTRY FROM US TO CA _ CHANGE EMAIL ADDRESS FROM TO BRIGGIN @ COM _ "
In studying the code, you notice CHANGE COUNTRY results in an endless list of countries. In addition, the placement of the quotes for countries results in people's e-mail addresses getting inserted into SQL statements. Heres the source of metric explosion as well as other negative consequences.
Description
Normalizes text within single quotation marks ('xyz') The SQL Agent allows users to add extensions for performing custom normalization A SQL Agent extension that normalizes SQL statements based on configurable regular expressions (regex).
For more information about working with Introscope SQL statement normalization capabilities, see the Java Agent Guide or the .NET Agent Guide. The two examples below can help you understand how to implement the regular expression SQL statement normalizer.
Example 1
Heres a SQL query before regular expression SQL statement normalization:
INSERT INTO COMMENTS (COMMENT_ID, CARD_ID, CMMT_TYPE_ID, CMMT_STATUS_ID, CMMT_CATEGORY_ID, LOCATION_ID, CMMT_LIST_ID, COMMENTS_DSC, USER_ID, LAST_UPDATE_TS) VALUES(?, ?, ?, ?, ?, ?, ?, CHANGE CITY FROM CARROLTON, TO CAROLTON, _ ", ?, CURRENT)
Heres the desired normalized SQL statement:
INSERT INTO COMMENTS (COMMENT_ID, ...) VALUES (?, ?, ?, ?, ?, ?, ?, CHANGE CITY FROM ( )
Heres the configuration needed to the IntroscopeAgent.profile file to result in the normalized SQL statement shown above: introscope.agent.sqlagent.normalizer.extension=RegexSqlNormalizer
introscope.agent.sqlagent.normalizer.regex.matchFallThrough=true introscope.agent.sqlagent.normalizer.regex.keys=key1,key2 introscope.agent.sqlagent.normalizer.regex.key1.pattern=(INSERT INTO COMMENTS \\(COMMENT_ID,)(.*)(VALUES.*)''(CHANGE CITY FROM \\().*(\\)) introscope.agent.sqlagent.normalizer.regex.key1.replaceAll=false introscope.agent.sqlagent.normalizer.regex.key1.replaceFormat=$1 ...) $3 $4 $5 introscope.agent.sqlagent.normalizer.regex.key1.caseSensitive=false introscope.agent.sqlagent.normalizer.regex.key2.pattern='[a-zA-Z1-9]+' introscope.agent.sqlagent.normalizer.regex.key2.replaceAll=true introscope.agent.sqlagent.normalizer.regex.key2.replaceFormat=? introscope.agent.sqlagent.normalizer.regex.key2.caseSensitive=false
Example 2
Heres a SQL query before regular expression SQL statement normalization:
CA Wily Introscope
Metric clamping
Several properties that limit, or clamp, the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. No new metrics are displayed in the Workstation after a clamp has occurred. Metric clamping is enabled through four new properties: Note The default values for each property are used by Enterprise Manager if the line for the property is commented out in the EnterpriseManager.properties file. Property Name Description
Limits the number of live and historical metrics an agent will report. The default is 50,000. Limits the number of live metrics reporting from agents per Enterprise Manager. The default is 500,000.
introscope.enterprisemanager.agent.metrics.limit
introscope.enterprisemanager.metrics.live.limit
Property Name
Description
per agent (both live and historical) per Enterprise Manager. The default is 1,200,000.
introscope.enterprisemanager.query.datapointlimit
Limits the maximum metric data points each Collector or standalone Enterprise Manager returns from any one query. The clamp is per query, not all concurrent queries. Queries to MOMs are only indirectly clamped by the data limit on each Collector. Default=0 (No limit).
For more information about these properties, see the Introscope Configuration and Administration Guide. When the Enterprise Manager starts up, the values of these properties are logged. When an Enterprise Manager hits a clamp value based on the total number of metrics that it can process in total or when an agent hits the agent clamp, a log message appears in the Enterprise Manager log. If clamping is no longer necessary due to a change in the limits, then another log message is logged in the Enterprise Manager log. All supported agents obey these clamps, though the custom metric agent and agent clusters (virtual agents) are not subject to the clamps.
CA Wily Introscope
introscope.enterprisemanager.agent.metrics.limit=10000 introscope.enterprisemanager.metrics.live.limit=800
Then you start the Enterprise Manager and two agents. Youd see that the Enterprise Manager gets clamped when 800 metrics have been reported, even though the agent clamp number of 10,000 metrics has not yet been reached. This means there are now no new metrics from the agent getting reported. In addition, the agent logs state that the Enterprise Manager clamp has been reached and no more metrics will be reported to the Enterprise Manager. If you increase the Enterprise Manager clamp value, youd see that new metrics from the agent start to be reported.
CHAPTER
This chapter provides background and specifics to help you understand sizing and performance-related Workstation and WebView requirements, settings, and limits for your Introscope system. In this chapter youll find the following topics: Workstation and WebView background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 100 100 100 101 101 102 103 103 103
OS RAM requirements for Workstations running in parallel . WebView and Enterprise Manager hosting requirement . Workstation to standalone EM connection capacity Workstation to MOM connection capacity WebView server capacity . . . . . . . . . WebView server guidelines . . . . . . . . . . . . . . . . . . . . . . 8.0 Workstation and WebView setup, settings, and capacity
CA Wily Introscope
CA Wily Introscope
Important Although in a MOM environment, data collection is spread across a number of Collectors, there is a case where Workstation performance problems can occur in a clustered environment. This happens if all the Workstation connections involve active users, and all their queries are based on data coming from a single Collector. In that case, the users may experience sluggish performance due to the Collectors own internal limitations on simultaneous historical queries.
CA Wily Introscope
At all times, the sum of all metrics (metrics and metrics groupings) for every TOP N graph viewed by every Workstation instance (all Workstations total) should not exceed 100,000 metrics. Try to use Top N sparingly because whenever a Top N request is made, all the data is provided in real time, which puts a large resource demand on your Introscope system. And when used, have as few viewers as possible actively view Top N graphs. If in a single moment in time Introscope system users are actively viewing dashboards and graphs representing more than 100,000 metrics, performance problems can occur. For example, dashboards can have very slow refresh times.This can occur when a number of users log in at the same time to view a dashboard containing a Top N graph. For example, imagine that there are ten dashboards defined in a system, and two of the ten of dashboards include 10 graphs on them that are Top N graphs. The other eight dashboard have 10 standard (not Top N) graphs. And lets say that each of the ten Top N graphs has a metric grouping that matches 1,000 metrics. This means a total of 10,000 metrics is requested when the dashboard containing the Top N graphs is requested to be displayed. Now imagine that 10 Introscope users at different machines decide to log in and all at the same time look at one of the dashboards containing the Top N graphs. This requires the system to request and handle 10,000 metrics x 10 user instances as output to Workstations = 100,000 metrics requested at once. In this situation, its highly likely the users would experience slow Workstation performance as they click on the dashboard elements.
CHAPTER
This chapter provides background and specifics to help you understand sizing and performance-related agent requirements, settings, and limits for your Introscope system. In this chapter youll find the following topics: Agent background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 107 107 108 109 109 109 110 110 110 111 112
Agent sizing setup, settings, and capacity . Agent metrics reporting limit Transaction Trace component clamp . Configuring agent heuristics subsets. Virtual agent metrics match limits Agents limits per Collector . Agent heap sizing. . . . . . . . . . . . . . .
High agent CPU overhead from deep nested front-end transactions . Dynamic instrumentation
CA Wily Introscope
Agent background
In an Introscope deployment, the agent collects application and environmental metrics and relays them to the Enterprise Manager. Agent features that affect overhead are Boundary Blame, Transaction Trace sampling, and URL normalization. The agent allows Introscope to collect minute details about how your applications are performing. What types of data the agent collects depends on which ProbeBuilder Directives (PBDs) files you choose to implement. Several standard PBDs are included when you install the Java or .NET agent, as well as specific PBDs for your application server. The instrumenting process is performed using CA Wilys ProbeBuilding technology, in which tracers, defined in ProbeBuilder Directives (.pbd) files, identify the metrics an agent will gather from applications and the JMS virtual machines at run-time. ProbeBuilder Directive (.pbd) files tell the ProbeBuilder how to add Probes, such as timers and counters, to .NET or Java components that Introscope-enable the application. ProbeBuilder Directive files govern what metrics agents report to the Introscope Enterprise Manager. Custom directives can also be created to track classes and methods unique to specific applications.
Each of the virtual agents has a metric count. Sum all of these counts to determine the total number of metrics matched.
Virtual agents are a significant drain on the CPU. For example, a 1500-metric virtual agent can result in a 10% increase in CPU usage. If the recommended number of metrics matched by the virtual agents is exceeded, there is significant impact on the CPU. There is some trade-off between the total number of applications (baselined heuristics) and virtual agents, since they are both dependent exclusively on CPU resources. In general, if the total number of monitored applications is significantly less than the limit, the metric match limit for virtual agents can be increased. However, metric match limit for virtual agents should never exceed 150% of the limit set in the guidelines. A virtual agent deployed on a MOM only creates load on the Collectors, which do the aggregation and pass the result back to the MOM. Note Be aware that the Collector does most of the work in performing the calculations needed for virtual agents; the MOM is not performing the calculations.
CA Wily Introscope
introscope.enterprisemanager.heuristics.agentspecifier=.*
which is a regular expression that matches the agents for which heuristics are enabled. The default .* matches all agent names. Limiting this property to a subset of agents you are interested in can improve performance, largely without limiting the ability to analyze the Enterprise Manager. For more information, see the Introscope Configuration and Administration Guide.
CA Wily Introscope
Notice the recurring servlet calls. In this case, a servlet keeps calling itself, resulting in a 2125 ms transaction time for this deep nested transaction
Notice in this case a servlet continuously calling itself (recurring call). This is just one example. This can also happen when a servlet continuously calls other servlets. In either case, you may see an increase in agent CPU overhead. If the overhead is unacceptable, contact CA Wily Technical Support.
CA Wily Introscope
Dynamic instrumentation
Introscope uses dynamic instrumentation (also called dynamic ProbeBuilding) to implement new and changed PBDs without restarting managed applications or the Introscope agent. This is useful for making corrections to PBDs, or to temporarily change data collection levels during triage or diagnosis without interrupting application service. For more information about dynamic instrumentation, see the Java Agent Guide or the .NET Agent Guide. Dynamic instrumentation affects CPU utilization, memory, and disk utilization. This is because dynamic instrumentation includes redefining the monitored classes, which is a resource intensive process. To avoid performance problems after you enable dynamic instrumentation, CA Wily highly recommends that you:
use configuration to minimize the classes that are being redefined (see the
APPENDIX
Frequently asked questions about Introscope sizing and performance are listed in the table below. Typical answers or solutions to each question are provided, with the most common being provided first, the second most common listed second, and so on. Question Most Common Answers/Solutions
General Performance Questions Can I handle the same number of metrics that I used to in 7.x versions of Introscope? What about 6.x versions?
If you are upgrading from 7.x to 8.0, then
number of metrics that Introscope 8.0 can handle is double then 7.2 limits. So, for example, if a given 7.x system used to handle 250 K metrics, that limit is now 500 K without requiring any changes to the hardware. For more information, see 8.0 metrics setup, settings, and capacity on page 79 and Virtual agent metrics match limits on page 80.
My Collector is at maximum recommended capacity. I'm looking at the CPU, and the system doesn't appear busy. Why can't I add more metrics or agents to this Collector?
behavior of the Collector is 100% CPU usage for 3-4 seconds (at full load), and then idle until the next metric harvest from the agents. This happens every 7.5 seconds, which is how CA Wily arrives at the 45% average CPU utilization recommendation. For more information, see Collector metric capacity and CPU usage on page 45.
CA Wily Introscope
Question
What were the Introscope 8.0 sizing and performance improvements?
Question
My Collector is combining time slices throughout the day and appears to respond slowly, but I'm at or below the maximum capacity limits. What could be wrong?
1 Other processes are running on the machine. 2 I/O contention with SmartStor and other
processes. SmartStor is not located on a separate disk or I/O subsystem.
average. It operates within the heap footprint specified in the .lax file. For more information, see Workstation and WebView background on page 100.
OS has dedicated physical RAM for each Workstation running in parallel, above the memory required for the OS itself. For more information, see OS RAM requirements for Workstations running in parallel on page 100.
the CA Wily requirements when setting this up. For more information, see Running multiple Collectors on one machine on page 74.
115
CA Wily Introscope
Question
I launched my MOM and logged in, but I'm not seeing any metrics in the Investigator tree for a long time. Why does the MOM take a long time to begin sending data?
Collectors will cause a great deal of overhead in the MOM as the Collectors register these alerts in the MOM at startup. If the startup time is unacceptable, you will have to reduce the number of alerted metrics, or get a machine with faster individual CPUs. For more information, see About alerted metrics and slow Workstation startup on page 81.
Can I connect more agents to a 8.x Collector than a 6.x or 7.x Collector?
available. See Sample Introscope 8.0 Collector sizing limits table on page 119 for some examples. For more information, see Agents limits per Collector on page 110.
to, the more complicated the system becomes and the greater likelihood for instability or failure. For example, clock sync issues may be more difficult to manage, the system can take longer to start, and there's a higher likelihood that a misbehaving Collector can affect the entire cluster. For more information, see MOM to Collectors connection limits on page 59.
system to appear slow and lock up, due to the synchronous mechanism the MOM uses to poll information from Collectors. For more information, see Collector to MOM clock drift limit on page 71.
The requirements state that I can Yes, however, this impacts CPU significantly have X metrics in the virtual agents (not I/O or memory), so you must decrease Can I exceed that number? the Collector's capacity.
Question
Will additional dedicated physical CPUs increase the number of metrics and agents that my Collector can handle?
make a total of 4 CPUs helps increase these limits: * number of applications per Collector * number agents per Collector * number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager). In addition, faster CPUs may help increase the Collector's maximum capacity and improve performance. For more information, see Collector hardware requirements on page 71 and the examples in Sample Introscope 8.0 Collector sizing limits table on page 119.
My system has 16 SPARC CPUs. Although the Collector is heavily multiWhy can't a single Collector on this threaded, there are certain operations that platform handle any more load require synchronization and cannot effectively than a 4 CPU Xeon machine? leverage more than 4 CPUs. The Collector, therefore, does not scale well with additional CPUs beyond 4, depending on the hardware platform. Individual processor speed is the most important success factor for a Collector. For more information, see Increasing Collector capacity with more and faster CPUs on page 73. What are the main performance considerations for the MOM?
The MOM requires more powerful CPUs and
better network connections than Collectors, but does not require fast disk access (the MOM performs little disk I/O). For more information, see MOM disk subsystem sizing requirements on page 58.
I changed the virtual agent definitions in my MOM/Collector and everything came to a halt. What happened?
Note: In a clustered
environment, deploy Management Modules and virtual agents only on the MOM, not on a Collector.
Management Modules is very CPU intensive and can lock up the MOM for a couple of minutes during which metrics harvesting doesn't happen. CA Wily strongly recommends not performing Management Module hot deployments on production Collectors and MOMs. For more information, see Avoid Management Module hot deployments on page 68.
117
CA Wily Introscope
Question
Do Collectors and MOM have to be on the same subnet?
MOM requests data from a Collector, the round-trip response must be less than 500 ms. Whenever possible, a MOM and its Collectors should be in the same data center; preferably in the same subnet. For more information, see Local network requirement for MOM and Collectors on page 51.
What is the limit for ChangeDetector events, Transaction traces, errors, stall events, and so on? How do I determine that limit? Is that for each?
event objects. As of Introscope 7.1, the Maximum Number of Events limit represents the total number of events a Collector can receive and persist from all agents. There is one limit for steady state event persistence and another for burst capacity. Steady state means 24/7. Burst capacity means that the Collector can sustain this load for no more than a couple of hours. For more information, see Collector events limits on page 70.
APPENDIX
Operating System
Hardware
Physical RAM
JVM Heap
Max # Metrics*
Max # Events/ Max # minute Virtual Agent Matched Metrics Steady State Burst 3500 3000
Solaris
4 GB
1.5 GB
200
400,000
900
700
20
Solaris
4 GB
1.5 GB
250
400,000
1800
700
3500
3000
40
Solaris
8 GB
1.5 GB
200
400,000
900
700
3500
3000
20
4 GB
1.5 GB
300
500,000
1500
1000
5000
5000
50
CA Wily Introscope
Operating System
Hardware
Physical RAM
JVM Heap
Max # Metrics*
Max # Events/ Max # minute Virtual Agent Matched Metrics Steady State Burst 4000 3500
AIX 5.3
4 GB
1.5 GB
250
400,000
1500
850
50
Windows 2000/2003
4 GB
1.5 GB
300
500,000
1500
1000
5000
5000
50
Windows 2000/2003
4 GB
1.5 GB
400
500,000
3000
1000
5000
5000
50
Windows 2000/2003
8 GB
1.5 GB
300
500,000
1500
1000
5000
5000
50
Solaris
14 GB
12 GB
250,000
20
4 CPU Xeon or Opteron, Clock Speed ~ 3 GHz 4 CPU Power 5, Clock speed ~ 2.2 GHz
14 GB
12 GB
1,000,000
50
AIX 5.3
14 GB
12 GB
500,000
50
Windows 2 CPU 2000/2003 Xeon or Opteron, Clock Speed ~ 3 GHz Windows 4 CPU 2000/2003 Xeon or Opteron, Clock Speed ~ 3 GHz
14 GB
12 GB
500,000
25
14 GB
12 GB
1,000,000
50
CA Wily Introscope
Index
Symbols
*, See wildcard symbol
C
calculators, and slow cluster start-up time 60 causes of slow start-up time, in cluster 60 clamp metrics See metrics clamp Transaction Trace component 108 clock drift, performance problems due to 71 cluster agent load balancing examples 66 applications and virtual agents 106 cause of slow start-up time 60 configuring to support 1 million metrics on MOM 61 tolerance for imbalance 65 determining when to implement if using a standalone EM 70 environment, explained 18 fault tolerance for Collectors 62 hanging prevented by MOM disconnecting under performing Collector 52 how MOM balances the metric load 63 improving performance by adjusting Collector weighting factors 64 likely cause of Workstation sluggishness in 72 location of MOM and Collector metrics 28 metric for total number of metrics currently tracked in 30 overhead 65 performance problems due to hot Management Module deployment 68 performance, based on Collectors 44 poor performance due to a single Collector 59 setting up metrics groupings in 79 slow response time due to network bandwidth problems 21 time synchronization 59, 72 when MOM drops a Collector 51 Workstation and WebView connections 102 Workstation performance problems in 102 CLW, See Command Line Workstation
A
active metrics groupings, defined 80 users and WebView servers limit 103 agents Collector connection history and future connections 64 connection architecture 70 heartbeat, defined 91 how MOM assigns to Collectors 64 increased CPU overhead from deep transactions involving multiple front-ends 111 load balancing and cluster fault tolerance for Collectors 62 configuring frequency on MOM 66 defined 63 differentiated from metric clamping 63 example scenarios 66 metric counts after weight adjusting 65 setting metric weight load 64 setting threshold for imbalance 65 memory cache, removing dead metrics 91 metric aging defined 91 performance problems related to 91 properties to configure 92 Agents with Data metric 83 Agents without Data metric 83 alerts, and slow cluster start-up time 60 applications, clustered and usefulness of virtual agents 106 asterisk, See wildcard symbol
B
baselines.db about 53 calculating disk space needed burst limit defined 70 events 70 54
Index 125
CA Wily Introscope
Collector cluster performance 44 CPU requirements 74 speed and disk I/O system 71 steady state usage 45 usage for high resource operations 45 viewing high usage, 45 CPU usage, described 45 diagnosing slow response to MOM using ping metric 72 effect of faster CPUs 71 file cache requirements 74 good performance in individual 60 hardware requirements 71 how agents are assigned to by MOM 64 increasing capacity with faster CPUs 73 JVM requirements in 500 K metric MOM cluster 61 limits, increasing with more CPUs 73 location of supportability metrics 28 migrating from 6.x to 8.0 44 persisting event objects 70 ping time from MOM 51 reperiodization 45 run only Introscope process 72 running multiple on one machine 74 sign of overloaded 51 sizing limits examples 119 SmartStor minimum requirement 47 synchronizing clocks with MOM 71 under performing and cluster performance 59 unresponsive 72 upgrading 44 using loadbalancing.xml to restrict agents to specific 64 Workstation/WebView connections to in clustered environment 102 Command Line Workstation, heap size needed 100 concurrent queries recommended number of historical 43 configuring agent failover when host defined in DNS 62 agent metric aging 92 how often MOM rebalances cluster agent load 66
MOM-Collector cluster 61 RAID setting 57 Workstation log-in when host defined in DNS 62 connection history, agent to Collector and future connections 64 type MOM uses to assign agents to Collectors 64 converting spool to data metric, defined 32 CPU Overview tab 46 CPUs Collector requirements 74 Enterprise Manager basic requirement 47 guidelines 25 faster and Collector capacity 71 using to increase Collector capacity 73 high usage Collector 45 increasing and Collector limits 73 large MOM overhead and alerted metrics 81 resource contention with WebView and EM 100 speed, Collector 71 usage Collector 45 Collector, described 45 Collector, steady state 45 during heuristics calculations 69 reports 43 scheduling of heavy processing 52 virtual agents using large resources 73 WebView server load 103 custom scripts, scheduling 52
126 Index
D
dashboard, using EM Capacity to determine metric explosions 87 dashboards, cause of slow WebView processing 103 data, about historical queries and performance problems 25 dead metrics See metrics, dead dedicated controller property for SmartStor 55 deployments, hot, Management Module cost 81 disk drive, determining number of controllers 57 file cache size, SmartStor 48 OS file cache 32 space estimating for baselines.db 54 estimating for traces.db 54 DNS agent config for MOM failure 62 Workstation log-in config for MOM failure 62 dynamic instrumentation defined 112 performance problems related to 112 ProbeBuilding See dynamic instrumentation
Overview tab 27 processing of time slice data 78 RAM minimum requirement 47 running multiple Collectors on one machine 74 standalone connections to Workstations 101 hardware requirements example 74 supportability metrics 51 symptoms of metric explosion 85 using EM Capacity dashboard 87 when to grow from standalone to cluster 70 events determined number received 37 high volume 70 limit burst 70 maximum, defined 70 steady state 70 objects, in Collector databases 70 explosion, metrics See metrics explosion
F
faillover, planning for MOM 62 failure, planning for using MOM, See failover file cache, requirements for Collector 74 system, general requirements 47 flat file archiving, using with SmartStor 44 front-ends, multiple and transaction problems 111
E
EM Capacity dashboard, using to determine metric explosions 87 em.db, See baselines.db Enterprise Manager 48 capacity and metrics limits 21 configuring for heap memory 32 CPU basic requirement 47 resources and running WebView 100 utilization guidelines 25 determining capacity 20 finding problems using specific metrics 29 heap settings 48 metrics grouping limits 79 location 28 migrating from 6.x to 8.0 44 OS disk file cache requirements 47 overloaded and combining metrics 69 time slices 69
G
GetEvent metric, See ping metric graph, Top N, defined 103 groupings, metrics, defined 78
H
hardware requirements Collector 71 MOM 58, 59, 60 harvest cycle, metrics 78 Harvest Duration metric 29, 35, 45
Index 127
CA Wily Introscope
heap capacity (%) metric 34 Command Line Workstation size needs 100 settings, Enterprise Manager 48 size Enterprise Manager 32 Workstation 100 heartbeat, agent, defined 91 heuristics CPU usage for calculations 69 database, See baselines.db Historical mode using for viewing data in Workstation 43 historical queries and EM agent data storage 25 and MOM overloading 31 poor performance caused by 40 recommended number of concurrent 43 running 43 host defined in DNS and agent failover 62 and Workstation log-in 62 hot deployments, of Management Modules, performance problems 81
J
JVM Collector requirements in cluster with 500 K metric MOM 61 heap settings, Enterprise Manager 48
L
leaks, metrics, symptoms 81 limits Collector, examples 119 MOM, examples 122 limits, metrics, definition 20 Live mode viewing Workstation data in 43 load balancing for agent, defined 63 reducing metrics 30 loadbalancing.xml, using to restrict agents to specific Collectors 64
M
Management Module cost of hot deployments 81 hot deployment and cluster problems 68 and virtual agents 68 problems with hot deployment 68 maximum events limit, defined 70 memory, Workstation requirement 100 metadata file, about uncompressed 98 SmartStor, using to find metric explosion 85 metric aging, agent defined 91 clamp, differentiated from agent load balancing 63 count metric, defined 98 Metric Count metric 85 Metric Count tab 107 metrics Agents with Data 83 Agents without Data 83 alerts, large numbers and performance problems 81 baselining database, See baselines.db checked during agent heartbeat 91
I
I/O contention, reason for SmartStor problems 73 disk system for Collector 71 throughput, SmartStor 49 inactive metrics groupings, defined 80 instrumentation, dynamic and performance problems 112 defined 112 Introscope agent connection architecture 70 business logic 78 defined 26 monitors 79 improving slow startup time 81 metric explosion prevention 91 no other processes may run on Collector 72 workload, defined 26 Is Clamped metric, about 98
128 Index
clamp about related supportability metrics 98 defined 96 properties to enable 96 scenario 98 cluster load balancing 63 combined as symptom of overloaded EM 69 converting spool to data 32 counts, weight-adjusted for agent load balancing 65 dead about 91 defined performance problems related to 91 removal 96 Enterprise Manager supportability 28 explosion and SmartStor metadata save time 85 causes 84 configuring 87 defined 82, 84 due to poorly-written SQL statements 92 how Introscope prevents 91 symptoms of 85 explosion, defined 84 groupings active, defined 80 defined 78 Enterprise Manager limits 79 inactive, defined 80 performance problems when using wildcard symbol 80 relationship to regular expression 78 groupings, setting up in a cluster 79 harvest cycle 78 Harvest Duration 29, 35, 45 Heap Capacity (%) 34 Is Clamped 98 leaks defined 82 diagnosing using SmartStor metadata save time 83, 85 symptoms 81, 82 limits and Enterprise Manager capacity 21 definition 20 MOM-Collector system 60 related to Top N graphs 104
load reducing 30 metadata about 82 problems with continuous growth 82 symptoms of metrics leaks 82 Metric Count 85, 98 Metrics with Data 83 Number of Agents 71 Number of Inserts Per Interval 70 Overall Capacity (%) 33 Partial Metrics with Data 83 Partial Metrics without Data 83 ping 51, 72 SmartStor capacity 80 management 41 SmartStor Duration 36, 50 subscribed defined 79 limits 59 MOM limits 60 supportability Is Clamped 98 Metric Count 98 using to find Enterprise Manager problems 29 weight load setting for agent load balancing 64 Metrics with Data metric 83 migrating, 6.x Enterprise Managers to 8.0 Collectors 44 MOM alerted metrics and large CPU overhead 81 configuring cluster to handle 1 million MOM metrics 61 disconnected due to ping time threshold 52 failure planning 62 hardware requirements 58, 59, 60 hardware requirements and subscribed metrics limit 59 hot failover 62 how assigns agents to Collectors 64 limits on subscribed metrics 60 location of supportability metrics 28 ping time to Collector 51 reasons for overload 31 secondary backup for hot failover 62
Index 129
CA Wily Introscope
sizing limits examples 122 SmartStor instance, about 58 synchronizing clock with Collectors 71 to Collector connection limit 59 system metrics limit 60 WebView appears as Workstation client Workstation connections allowed 102
100
N
network bandwidth problem, and slow cluster response times 21 Number of Agents metric 71 Number of Inserts Per Interval metric 70
O
OS disk file cache requirements, EM 47 memory requirements for Workstation RAM and disk file cache 32 Overall Capacity (%) metric defined 33 spiking 34 100
related to agent metric aging 91 related to large numbers of metrics alerts 81 with Management Module hot deployment 68 related to MOM-Collector connections 58 sluggish in Workstation, typical cause 100 WebView slow response times, cause of 103 Workstation problems in cluster 102 ping metric 51 about 72 diagnosing a slow-responding Collector 72 time Introscope 72 network 72 threshold for Collector overload 51 threshold that disconnects MOM 52 production Collector and MOMs, Management Module hot deployments in 68
Q
queries historical See historical queries scheduling large 52
P
Partial Metrics with Data metric 83 Partial Metrics without Data metric 83 passive users defined 103 WebView servers limit 103 performance cluster, and single under performing Collector 59 dedicated controller property 55, 56 improving cluster by adjusting Collector weighting factor 64 in cluster causing MOM to drop Collectors 51 individual Collector responsiveness 60 load, WebView 101 poor due to large historical queries 40 problems due to MOM to Collector clock drift 71 due to recurring servlet calls 111 from large continuous Transaction Traces 73 in cluster due to Management Module hot deployment 68 metrics metadata continuous growth 82
R
RAID configuration recommended 57 setting 57 RAID 0 57 RAID 5 57 RAM adding to improve spooling time 32 increase OS disk file cache 32 EM minimum requirement 47 regular expression, relationship to groupings 78 reperiodization 52 about 41 Collector 45 SmartStor, defined 40
metrics
130 Index
43
S
SAN using for SmartStor storage 57 SAS controllers using for SmartStor storage 57 scheduling custom scripts 52 large queries 52 reports 52 secondary backup MOM for hot failover 62 servlets performance problems from recurring calls 111 recurring calls and high agent CPU overhead 111 seen as Introscope frontends 111 sizing Collector limits examples 119 MOM limits examples 122 SmartStor about 40 Collector minimum requirement 47 dedicated controller property about 55 and performance 55, 56 default installation directory 20 determining if drives are physically different 57 flat file archiving recommendations 44 I/O throughput 49 management metrics, about 41 metadata files, about uncompressed 98 save time and metric explosion 85 metrics about metadata 82 capacity 80 metadata save time and metrics leaks 85 metadata save time related to metrics leaks 83 MOM instance, about 58
problems indications of 49 with I/O contention 73 recommended RAID configuration 57 reperiodization defined 40 verifying 41 requirements 48 setting RAID configuration 57 up 49 spooling 52 about 40 disk file cache size requirements 48 verifying 41 storage SAN guidelines 57 SAS controllers guidelines 57 SmartStor Duration metric 36, 50 metric value 36 spool to data conversion task 40 spooling SmartStor 40 time, lengthening 32 SQL Agent Introscope statement normalizers 94 showing many unique SQL metrics 92 statements causing metric explosions 92 normalizers 94 standalone Enterprise Manager hardware requirements example 74 Workstation connections allowed to 101 startup time, improving slow Introscope 81 steady-state, events limit 70 subscribed metrics See metrics, subscribed supportability metrics Is Clamped 98 Metric Count 98 related to metric clamp 98 synchronizing, clock on clustered machines 59, 72 system performance, determining general 47
Index 131
CA Wily Introscope
T
tabs CPU Overview 46 Enterprise Manager Overview 27 Metric Count 107 threshold, ping time for Collector overload 51 that disconnects MOM 52 time server software, use to synchronize machine clocks in cluster 59, 72 time slices combined, symptom of overloaded EM 69 data processing in Enterprise Manager 78 tool tip 107 Top N graph defined 103 metrics limit 104 traces.db about 53 calculating disk space needed 54 Transaction Event database, See traces.db Transaction Trace component clamp 108 dropped events metric 36 events 36 insert queue 36 performance problems related to 73 queue size 36 transactions, deep, involving multiple frontends 111
W
WebView cause of slow client response times 103 connections in clusters 102 dashboards and slow processing 103 how MOM sees as Workstation 100 performance load 101 running on EM and CPU resource contention 100 servers CPU resource load 103 user limits 103 wildcard symbol, performance issues in metrics groupings 80 Workstation connections allowed to MOM 102 allowed to standalone Enterprise Manager 101 in clusters 102 heap footprint 100 memory requirement 100 OS memory requirements 100 performance problems in cluster 102 sluggishness cause in a cluster 72 typical cause 72, 100 viewing data in Live mode 43
U
upgrading, Collector 44 using time server software 59, 72
V
virtual agents and Management Module hot deployments 68 useful for clustered applications 106 using large CPU resources 73
132 Index