Vous êtes sur la page 1sur 142

Compuware APM - Introduction

Market Trends, Business Challenges & APM

The World is Changing & the Rate of Change is Accelerating


Application visibility and optimization of the customer experience are more important than ever
Complexity Explosion Business Demands More Change, Faster
Business
I want change! I want competitive advantage! I want stability!

Development

Operations

User Expectations Continue to Rise

Data Smog and Blind Spots

Web Analytics

Virtualization

Third Parties

Java/.NET

Database

Network

Storage

Server

Market Trends, Business Challenges & APM

The World is Changing & the Rate of Change is Accelerating


Application visibility and optimization of the customer experience are more important than ever
Complexity Explosion Business Demands More Change, Faster
Business
I want change! I want competitive advantage! I want stability!

Development

Operations

User Expectations Continue to Rise

Data Smog and Blind Spots

Web Analytics

Virtualization

Third Parties

Java/.NET

Database

Network

Storage

Server

Market Trends, Business Challenges & APM


A Case Study for a Changed World
Verizon CEO Daniel Meadmore than 60% of iPhone sales occurred online. Thats 24,000 sales per day Thats $5-10m per day 4 internal content providers 23 external content providers
Akamai x 4 DoubleClick x 3 HitBox YieldManager Google Ad Services Atlas Advertising Amgdgt.com Interlick Tribal Fusion Turn.com

APM in 2010
End User Experience Monitoring Application Component Deep Dive
1. Captures the End User Experience of an application or service Captures rich statistics regarding components and component domains

2.

APM 2010

3.

Discovers/models application determined logical topology

Business Transaction Process/Flows

4.

Traces transaction flow across the IT environment

5.

PMDB
Performance Management Database

Consolidated, normalised, correlate d & analysed

APM in 2015
APM 2010
1. EUE, deep dive, application model, trans flows, PMDB Policy setting and workflow orchestration Understand, analyse application patterns and spot deviations Distributed knowledge capture, knowledge sharing and improvements Support cloud model and end to end management off premises and on Monitor resource usage

Policy and Orchestration Engine

2.

APM 2015

Application Behaviour Learning

3.

Crowdsourcing and Collaboration

4.

Cloud Enablement

5.

Cost Allocation and Chargeback

6.

Introduction to APM

Introduction to APM
Application Performance Management

End-user
What is the end-user experience?

Enterprise / Business
Why Manage End-user Experience?

Operations
Typical Enterprise Requirements
The Application Performance Challenge: Problems Everywhere Along the Delivery Chain Traditional Operational monitoring

Development
Traditional development flow

Gartners five APM dimensions Compuware APM product range

End-User: What is the end-user experience?

Application availability and performance for the end-user


APM for Retail Banking: http://www.youtube.com/watch?v=M7qEuLxQgOM

Button press or request

Page Load or response

The Answer: Adopt an Application Point of View That Starts with the User
Application Point of View that Starts with the End User
Data Center Cloud: Private and Public Users
ISPs Mobile carriers Browsers Devices AJAX JavaScript Mobile apps

Web Mobile App logic Database Network Mainframe Virtualization SOA CDNs Third party services

Customers

Application

Application

Employees Infrastructure

Enterprise / Business: Why manage end-user experience?


73% of performance issues are user-reported Yet less than 5% actually complain End-user Experience impacts business success

Slow apps reduce revenue by 9% and productivity by 64% *


Most monitoring is at component level Not immediately actionable Efficient enterprises accelerate fault domain isolation 80+ percent of problem resolution is misspent finding the fault, not fixing it Why? Increasing Data Center Complexity

Cost REDUCED

Revenue IMPROVED

* Aberdeen, APM: Getting on the C-Levels agenda

Enterprise / Business: Reduce time spent on Awareness and Isolation


Revenue Impact / Cost
Business Impact

Isolate Remediation
Root Cause Resolve

Response Times

Operations: Typical Operational Requirements


Generate alerts and notifications based on configurable transaction thresholds. Real Time and Historical performance data specific to app transactions Single view of usage, performance and availability for transactions across multiple tiers Real-time, detailed diagnostic data specific to users and their transactions. Report on business impact and relational diagnosis of faults. Enable multiple users at varying levels to consume and use data simultaneously Flexible, conditional alerting and reporting. Service Level Management and Operational views and workflows minus any extraneous information Usage, Performance and Availability monitoring for specific applications and transactions Reduce human hours spent isolating and analyzing performance problems Efficient communication between IT groups for reactive and proactive initiatives Integrate current monitoring investments into a strategic solution for Enterprise development

Operations: The Monitoring Challenge: Problems Everywhere Along the Delivery Chain
The Application Delivery Chain
Data Center Cloud: Private and Public
Web Mobile App logic Database Network Mainframe Virtualization SOA CDNs Third party services Inconsistent geo performance Bad performance under load Blocking content delivery Poorly performing Java or .NET methods Application Slow SQL or Web services transactions Server performance

Users Resource contention Mobile carriers Browsers ISPs Capacity issues Devices AJAX JavaScript Mobile apps Slow bursting
Customers
Poorly performing JavaScript Browser/ device incompatibility Pages too big Low cache hit rate

Network problems Bandwidth contention Improper load balancing

Network peering problems Outages

Network peering problems Bandwidth throttling Inconsistent connectivity

Infrastructure

Configuration issues Oversubscribed POP Poor routing optimization Low cache hit rate

Employees
Network resource shortage Faulty content transcoding SMS routing / latency issues

Operations: Why Traditional Monitoring Fails

APPLICATION TEAM DATA CENTER NETWORK TEAM Third-party/ Cloud Services


Network Middleware Mainframe Servers App Servers Load Balancers

INTERNET

This application CUSTOMERS is slow!

Im on it!

Storage DB Servers

Web Servers

SERVER TEAM Major ISP MAINFRAME TEAM

Local ISP

Content Delivery Networks

Mobile Carriers

Operations: Why Traditional Monitoring Fails

Not my Problem!
APPLICATION TEAM DATA CENTER NETWORK TEAM Third-party/ INTERNET CUSTOMERS

Not my Problem!

Storage DB Servers

Web Servers Network Load Balancers

Not my Problem! Cloud Services


SERVER TEAM Major ISP

Local ISP

This application is slow!

Middleware Mainframe Servers

App Servers

Not my Problem!
MAINFRAME TEAM Content Delivery Networks

Mobile Carriers

Operations: Why traditional monitoring fails: Datacenter Complexity


Component Level Monitoring Tools
Authentication Monitoring

Load Balancer Authentication

Respons e Time

Imprivata, Zimbra, ActiveIdentity, EMI Security, Juniper J-Web, Juniper

Respons e Time

Server Monitoring

Perfmon, Netcool, Sitescope, Solar Winds, Nimsoft, Nagios, MOM

Virtualized Web Servers


Firewall Virtualized Application Server

Impossible to Correlate & Troubleshoot

Respons e Time

Network Monitoring

Netscout, Niksun, NetCool, Opnet, Fluke, Cisco Works, EMC Smarts

Respons e Time

Virtual Env. Monitoring

VMWare, Quest vFoglight, Opnet vMon, ZenOS, NetIQ App Manager

Respons e Time

Application Monitoring

Load Balancer
Virtualized Application Servers Web Services, RSA Log File SAN Message Queue

Wily Introscope, Mercury Topaz , OV Transaction Analyzer, ITCAMs, dynaTrace, Optier, IBM ITCAMs

Respons e Time

Message Queue Monitoring


Candle, BMC Middleware Mgmt, Hyperic, Omegamon

Respons e Time

Database Monitoring

Quest Software, IBM Tivoli, Quest Fog Light , Precise, Oracle App SAN 1000 GB RSA SAN 250 GB

Database Instance

Operations: Why Traditional Monitoring Fails

War Room
APPLICATION TEAM DATA CENTER
. ..

All my lights are green! blah blah


blah blah

NETWORK TEAM

All my lights are green!

INTERNET
. !!!!!...

This application CUSTOMERS is slow!

Storage DB Servers

Web Servers

Service Manager
Network

SERVER TEAM

CTO All my lights are Third-party/ Cloud Services green! ????????

Local ISP

Middleware Mainframe Servers

App Servers

Load Balancers

MAINFRAME TEAM

All my lights are green!


Content Delivery Networks

Major ISP

This application is slow!

Mobile Carriers

Development: Application lifecycle


Business

(local, remote, outsourced)

Development

(local, remote, outsourced) Load testing

Test/QA

(local, remote, outsourced) Cloud load testing Monitoring

Production

Development: Problems with Application Lifecycle


Business
Business impact? Priority? Competitive info? What? Who? When? How? Code? Recreate?

Not enough business context! $$$$$$

(local, remote, outsourced)

Development

(local, remote, outsourced) Load testing

Test/QA

(local, remote, outsourced) Cloud load testing Monitoring

Production

Too much time reproducing problems!

Not engineered for performance! Too many iterations!

Too many business impacting issues!

Development: Lifecycle-Oriented APM


Which users $$ amount Conversions Abandonment Etc.

Business
Business impact $

All transactions Click-to-code All details

(local, remote, outsourced)

Development

(local, remote, outsourced) Load testing

Test/QA

(local, remote, outsourced) Cloud load testing Monitoring

Production

No need to reproduce issues

Performance from the start Fewer iterations

24x7, all transactions Fewer issues

Gartners five APM dimensions

Real User Monitoring Synthetic Monitoring

Browser, Data Center, Mobile


Backbone, LMile, Private, Streaming, Mobile

Java/.NET Network Database Server Transaction Trace

Business Service Manager 3rd Party Adapters

dynaTrace PurePath

4 5

Portal and the CAS, ADS


Dashboards Reports

The Compuware APM Solution


Portal Reporting and Dashboards Business Service Management

On-Premises
dynaTrace Enterprise Analysis
DATA CENTER INTERNAL USERS INTERNET

SaaS
Gomez SaaS multi-tenant data store
CUSTOMERS

Storage

DB Servers

App Servers

Third-party/ Cloud Services


Network Load Balancers

Local ISP

Major ISP

Mainframe Middleware Web Servers Servers

Content Delivery Networks

Mobile Carriers
RUM Browser Mobile

Data Center RUM


EUE and NPM

dynaTrace
Java .NET

Streamin g

Mobile

Backbon e

Last Mile

Enterprise

Internet

The Compuware APM Solution


Optimize performance across the entire Application Delivery Chain
Agentless real user monitoring Multi-tier analysis Application component analysis Network and server monitoring

First Mile

Application monitoring

Enterprise

Monitoring Cross-browser testing Load testing

Backbone

Virtual Test Bed

Monitoring Load testing

Last Mile

Real user monitoring

Real Users

Cloud Private Public

Browsers Customers

Data Center
Virtual/Physical Environment DB App Multi-tier transactions Servers
Java/.NET analysis

Mainframe

Servers

Web Servers

All users All apps All trans

Balancers

PurePat Load h Private


agents Private Last Mile

3rd Party/ 500+ Cloud Services

Local ISP 150,000+ consumergrade desktops

Browsers

150+ Major ISP enterprisegrade nodes

combos of browsers and O/S

168+ countries 2,500+ ISPs

Storage

All network Network segments, servers and infrastructure

Web Services

Mobile Components

WAN Optimization Employees

Data centers & cloudContent 5,000+ supported providers Delivery mobile Networksdevices

Major mobile carriers Mobile around Carrier the globe

Devices

Employees
Mobile apps

New Product names for version 12


APM Product page: http://www.compuware.com/application-performancemanagement/
For more information please refer to the support documentation available on http://go.compuware.com
Current Name Gomez Real User Monitoring Data Center (aka Vantage Real User Monitoring) Gomez Synthetic Monitoring Private Enterprise (a.k.a. Vantage Active Monitoring) Gomez Business Service Manager (a.k.a., Vantage Service Management) Gomez Java and .NET Monitoring (a.k.a., Vantage Java & .NET Monitoring) Gomez Transaction Trace Analysis (a.k.a. Application Vantage) Gomez Server Monitoring (a.k.a., ServerVantage) Gomez Network Performance Monitoring (a.k.a., Vantage Network Monitoring) New Name Data Center Real User Monitoring Synthetic Monitoring Business Service Management Java & .NET Monitoring Transaction Trace Analysis Server Monitoring Network Monitoring

Gomez Mobile Carrier Data Monitoring (a.k.a., Mobile Carrier Vantage Service Check) VantageView VantageView (no change)

DCRUM: Driven by End-User Experience


Optimize performance across the entire Application Delivery Chain

Test/monitor your app the way users access it: What they do: key transactions Where they do it: geographic locations How they do it: fat clients, browsers and native devices
All tiers, all transactions, all users

Prioritize & Resolve Issues: Measure the business impact users Isolate root causes Deep application and transaction analysis
Browsers

Deep analysis

Application

PurePat h

Mobile apps

DCRUM Capabilities
Agentless real user monitoring Unifies network and application reporting Monitors all data center tiers in one dashboard Optimize EUE for web and non-web Diagnose root-cause application problems though dynaTrace integration

DCRUM Differentiators
EUE: all users, all transactions End-to-end: whole ADC Actionable data Simplicity of deployment
Web and non-web applications ERP: SAP, Oracle EBS Business core: IBM MQ, XML middleware, mainframe front-end

Whole Application Delivery Chain Multi-vendor integration and Multi-tier view Network influenced monitoring captures all transactions

Business impact Application-specific decodes (28+) All users, all transactions, granular

No software agents to deploy or maintain Out of Box and bespoke reporting Industrys leading scale for monitoring

DCRUM Monitors All Tiers, Apps and Components


WAN

Internet

Load Balancer Authentication Virtualized Web Servers

Agentless Monitoring Device (AMD)

Firewall
Virtualized App Server Load Balancer Virtualized App Servers Web Services Message Queue

Centralized Analysis Server

Database Instance

DCRUM is Optimized for Cisco UCS


Compuware has optimized its Gomez APM On-Premises solutions for exclusive delivery on Cisco Unified Computing Standard (UCS) servers UCS is the gold standard for delivery of Compuware APM solutions with specialized leasing terms available through Cisco Capital Leasing.

This combination delivers systems excellence and solution differentiation providing our customers with choice and flexibility to respond to the everchanging demands of the business.
Customers can: - improve application performance

&

- increase scalability
- simplify operations.

Cisco UCS Servers

DCRUM Works With Your Environment


Applications
Custom & packaged applications across multiple tiers KEY EXAMPLES

Application Infrastructure
Virtual and physical environments KEY EXAMPLES

Process Automation
Existing solutions e.g., Service Desk and Event Management KEY EXAMPLES

Cloud Services
CDN, Cloud provider, and third parties KEY EXAMPLES

Browsers and Devices


Every commercial browser and mobile device KEY EXAMPLES

+ over 5,000 mobile devices

Complexity Demands Analytics


Multi-tier, multi-vendor data centers increase MTTR 1011101010011110101001100001011101

Simple monitoring does little in complex environments


Advanced root cause analysis finds these hidden problems Data must be collected from all applications and devices across all tiers Root cause analysis must work to method and code level of apps

DCRUM: Industry-leading Application Analysis


Continued investment in application intelligence
Leading end-to-end application performance analysis across entire application delivery chain
Applications

All tiers of the mission-critical applications

360 View of Application Performance


Application Health Status for IT Operational Monitoring

Enterprise Operational Dashboard


Isolate the Poorly Performing Data Center Tier Current vs. Historical Analysis Baseline performance and availability with synthetic Web and Non-Web Applications (e.g. SAP)

Isolated Network Impact on Performance

The new DCRUM troubleshooting workflow

Existing workflow: 3 levels, multiple choices on each level

New workflow: 3 screens, 3 clicks to the clue

Applications transactions health

One report for applications / transactions

Infrastructure and network drill down

One report for all tiers and all operations

Troubleshooting : operations, erro rs, locations, us er activity

One report for locations and users activity

DCRUM Reporting Dashboards


Out of box reporting provides: Enterprise Application Performance view provides up-to-date status on performance, availability, and business impact on your end users as well as a endto-end view your datacenter infrastructure with 1-click access to trend information.

Data Center Analysis View provides instant visual indication of problem areas with 1click access to detailed troubleshooting information.

DCRUM and dynaTrace integration


Reporting and events in Central Analysis server are linked directly to dynaTrace portal for deep dive diagnostics

dynaTrace: Root Cause in Seconds


Goal: get to the root cause as quickly as possible Approach: isolation the problem domain and diagnosis of root cause with an integrated solution of bread and depth

From Problem Isolation to Root Cause

dynaTrace PurePath Provides Deep Dive Diagnostics


Production Test/QA Development

Browser / Rich-Client

Web Server

Java

.NET

Other

Database

Synthetics

End-to-End Transaction Execution Path Across tiers: browser servers - database Remoting Web Services External services Code-level depth Heterogeneous- .NET & Java

Contextual Transaction Information Method arguments SQL bind variables

Environmental Data Memory Dumps

Thread Dumps
Monitoring data

Synchronization
Exceptions Logs

PMI, JMX, CLR Win, Unix, DB, VM Ware, ETC

dynaTrace Session

dynaTrace Platform Enables Unified Lifecycle Approach to Proactive Performance Management

Development Developers, CI

Test Test Centers

Production Production, Staging


Staging Tests, Tuning, Diagnostics 24x7 End-to-end Transaction Tracing, Monitoring, Diagnostics

Performance Engineering (Arch Validation, Profiling)

Automated Testing & Continuous Integration

Automated Testing, Tuning, Diagnostics

Integrate to Automate and Collaborate


IDE, CI, Build Integration System Management

Application Performance Management

Test Tool Integration

Development Team Edition

Test Center Edition

Production Edition

dynaTrace 4 One Platform Single Product

Need to increase test frequency and accuracy?

Automate Performance Analysis In Test & CI.


Integrate dynaTrace into your build, CI and test automation environment. Automate testing Unit, Load & Functional.

42

How often does the same issue resurface in production release to release? How often does the same bug reappear?
Automatically detect & Analyze Regressions
Detect performance and reliability regressions early. Compare performance and behavior of a current build to previous versions and baselines. Automate analysis to enable you to focus on features instead of debugging.

43

Application not scaling in production after passing QA? Assure A Scalable, Performing Architecture PurePath Technology provides true end-toend tracing -- Browser to Web Server to App. Server to Database. Visualize app. behavior under load for even large, complex applications to prevent scalability issues from reaching production.
44

What does fast or slow really mean? What does performs well and it scales really mean? Meet Performance Goals With KPIs Measure, track and alert against KPIs -Service level, Throughput & Response time. Compare performance relative to your competition with SpeedoftheWeb.

45

Debugging applications in the test environment? Firefighting in production? Automate Collaboration & Resolution Capture issue rootcause when they occur so engineers simply replay, at codelevel, precisely what happened.

Alerts publish captured PurePath Sessions to issue tracking systems for engineers to access immediately.
46

Gomez SaaSNetwork: The Worlds Most Comprehensive Performance and Testing Network

Backbone Web Performance Management 150+ locations

Last Mile Web Performance Management and Load Testing 150,000+ locations

Cloud High Volume Load Generation 6 locations

Virtual Test Bed

Your Actual Users

Cross-Browser Testing Real-user Monitoring 500+ browser/ Worldwide, wherever OS combos your users are 5,000+ supported devices

Gomez SaaS Network: Monitoring the Cloud


Community of cloud-based companies and experts providing: Hands-on tools Cloud education Best practices Cloud services evaluation

Cloud Performance Analyzer

Global Provider View

Outside in perspective of cloud service provider performance Real-time data Historic comparisons Performance & availability bottleneck identification Independent validation of providers SLA claims

Future APM

Compuware Delivers
TODAY

Proactive Monitoring Predictive Management

Provide visibility into the performance of heterogeneous applications from the enterprise to the cloud NEAR TERM

Predict application performance issues before they occur


NEXT GENERATION

Active Management

Dynamically adjust the infrastructure to prevent application performance problems

Compuware Concepts

Compuware Concepts
Information Gathering Protocol Analyzers Software Services Operations, Applications and Transactions Reporting Hierarchy

Tiers
Locations Metrics

Information Gathering
Application monitoring can only be as good as it is defined. Therefore, as much information as possible should be gathered surrounding the tobe-monitored applications: Minimally: Logical application topology information IP address (range) supporting the services for this application Port number (range) supporting the services for this application

Information Gathering

End-user Detection coverage


`

Synthetic Auto-check

Real User Monitoring

Synthetic Transaction

Protocol Analyzers
A.k.a. decodes monitors, parses, and analyzes a network protocol in the monitored traffic Some analyzers perform transaction monitoring: they can recognize exchanges of information where there is a recognizable question-and-answer dialog Licensed features Examples: TCP, HTTP, HTTPS, XML, MSSQL and Oracle

Software Services
Services that support an application at different levels, for example on a Web, Application or Database level.

Are minimally defined by a server IP (range) and a server port (range) together with a protocol, for example:
HTTP service on server IPs 10.10.10.1-10.10.10.3 on port 80 SOAP service on server IP 10.10.10.4 on port 8080

Oracle service on server IP 10.10.10.5 on ports 1521-1523


Configurable at different levels depending on the underlying protocol: Action identification Grouping Masking User identification

Operations, Applications and Transactions


Logical names / groupings for TCP level actions at different levels. Operation: Refers to an operation in the context of a particular protocol, and can mean a HTTP/HTTPS page load, database query, JOLT request on a Tuxedo server, DNS look-up etc. Transaction (grouping mechanism for operations):

Simple transaction consisting of a single operation, such as a Web page load.


Complex transaction consisting of a sequence of operations that are HTTP(S), XML, SAP GUI or Cerner based. Unstructured transaction that is a collection of un sequenced operations. Application (grouping mechanism for transactions): A universal container that can accommodate one or more transactions, which consist of one or more Software Services.

Applications and Transactions

Transaction A

Application 1

Transaction B

Transaction C

Applications and Transactions


Medical Records

Physician Login

URL (http://10.21.79.243/physician/login.do)

Admin Login

URL (http://10.21.79.243/admin/login.do)

Patient Login

URL (http://10.21.79.243/patient/login.do)

Application Performance

Transaction Performance

Reporting Hierarchy
Hierarchy levels depend on the analyzer type. The CAS can report on up to four levels for the following traffic types: HTTP SAP GUI Cerner

SOAP
Any database Each level can be reported independently or combined with the remaining ones. If you use DMI you are able to create reports with entries from arbitrarily chosen hierarchy levels.

Reporting Hierarchy
In the current DC RUM release (12) the division to hierarchy levels is supported: Operation The first level in the hierarchy, for example: URL, Query, SOAP Operation type Task The second level in the hierarchy, for example: Page name, Operation name, SOAP Method

Module The third level in the hierarchy, for example:


Database name, SOAP Service Service The highest level in the hierarchy, for example: SAP GUI business process

Reporting Hierarchy

Reporting Hierarchy

Real User Monitoring - Tiers


End-Users -Internal? -External? -Internet? Load Balancers / Content Switches
Web Servers Application Servers Database Servers

Mainframe / Other Tiers

Synthetic End-User Transactions (At Key Locations)

AMD Users

Start by monitoring the initial entry point of the End-Users transaction Add additional tiers for greater Fault Domain Isolation and Visibility Wide variety of transaction support: HTTP/S, Oracle/SQL/DB2/ Queries, SAPGUI, Oracle Forms, XML, MQ

CIO CTO IT Mgt Data Center Ops Monitoring Team Application Owners

CAS and ADS Report Server

Tiers

A tier is a specific layer where DC RUM collects performance data. Tiers are either pre-defined, or defined by the user in the Central Analysis Server (CAS).

Immediately after the CAS is deployed, data is reported based on the default tier configuration. If the default tier configuration does not fit your network architecture, you should configure tiers to match your topology
Tiers are configured globally. You should not create separate tiers for individual applications

Front-end Tiers
Best practice mark the tier as front-end which is closest to the user or to a device that acts on behalf of the user. In short the first layer the user connects with.

1st tier for example load balancer or Web Server

1st tier after Citrix or Terminal Service

Network Tiers
Client Network:
Wide Area Network (WAN) from remote sites. Manually and automatically defined sites (AS and CIDR blocks), except the All other site Network: Datacenter Local Area Network (LAN). All other site

Data Center Tiers


DC RUM defined Tiers that represent measurements originating from RUM DC and based on different analyzer types are listed in the Data center tiers section: Website Oracle Forms SAP GUI Exchange Middleware Message Queue Database Datacenter Infrastructure FIX User Defined Tiers that are based on software service definitions are listed in the Data center tiers with no rules assigned to them: VIP Load balancer Web servers Application servers Business logic Database servers

Locations
DC RUM refers to locations as Sites and defines them as IP address ranges. Location definitions can be made in a three-level architecture in DC RUM : Site: lowest level of granularity Area: Consists of one or more sites Region: Consists of one or more areas

Metrics: TCP Availability


Availability - The percentage number of successful attempts, that is, the total number of attempts minus the number of failures, divided by the total number of attempts and multiplied by 100%. Connection Establishment Timeouts Number of TCP errors of category 'Connection establishment timeout errors'. This category of errors applies when there was no Connection establishment timeout errors response from the server to the SYN packet(s) transmitted by the client. Connection Refused Errors Number of TCP errors of category 'Connection refused errors'. This category of errors applies when the server rejects a request from the client to open a TCP session. Such a situation usually happens when the server runs out of resources, either due to operating system kernel configuration or lack of memory. Server Session Terminations The number of Server Session Termination errors. This category of errors applies when the server detects an error on the application level and closes the TCP session with a RESET packet. Server not Responding The number of Server Not Responding errors. This category of errors applies when the client closes the TCP session with a RESET packet after the server has failed to respond for too long. Idle Sessions - The number of idle TCP sessions, that have not been active for a period of time longer than a predefined time-out time, 5 minutes by default.

Metrics: HTTP Availability


HTTP Availability - The percentage of successful HTTP hits, calculated based on the following formula:

100 * (Hits - HTTP errors) / Hits


All HTTP errors are taken into account. HTTP Client Errors - The number of observed HTTP client errors (4xx) HTTP Not Found Errors - The number of observed HTTP 404 Not found errors HTTP Other Client Errors - The number of observed HTTP client errors other than 401, 404 and 407 HTTP Unauthorized Errors - The number of observed HTTP 401 Unauthorized errors HTTP Server Errors - The number of observed HTTP server errors (5xx)

Metrics: Network Performance


Client ACK RTT - is the time it takes for an ACK packet to travel from the user to the AMD and back again. Client RTT - is the time it takes for a SYN packet to travel from the user to the AMD and back again. Client loss rate (to server)-The percentage of total packets sent by a client that were lost between the server and the AMD - and needed to be retransmitted. Server loss rate (to client)- The percentage of total packets sent by a server that were lost - between the AMD and the client - and needed to be retransmitted. Server realized bandwidth - Server realized bandwidth refers to the actual transfer rate of server data when the transfer attempt occurred, and takes into account factors such as loss rate (retransmissions). Thus, it is the size of an actual transfer divided by the transfer time. Request time - The time it took the client to send the HTTP request to the server (for example, by means of an HTTP GET or HTTP POST). Note: This time includes TCP connection setup time and SSL session setup time (if any). It starts when the client starts the TCP session on the server and ends when the server receives the whole request. Delay - Data transfer delay on a Data Center device, such as load balancer or firewall.

Metrics: Round Trip Time RTT

Metrics: Application Performance


Application Performance For transactional protocols, this is the percentage of application transactions completed in a time shorter than the performance threshold. For generic TCP protocols, this is the percentage of monitoring intervals in which user wait per kB of data was shorter than the threshold value. Operation Time The time it took to complete an operation. The term "operation" refers to an operation in the context of a particular protocol, and can mean HTTP/HTTPS page loads, database queries, XML (transactional services) operations, Jolt transactions on a Tuxedo server, e-mails, DNS requests, Oracle Forms submissions, MQ operations, VoIP calls, MS Exchange operations, or SAP operations. Note that an operation can be split over several packets. For HTTP and HTTPS, operation time is the page load time, which is equal to the redirect time plus the network time plus server HTTP time plus server think time. Person-hours lost (Performance, Errors, Availability) - In Central Analysis Server, the total monitoring time clients waited for pages to load due to bad service availability and bad application performance In Advanced Diagnostics Server, the total time clients waited for pages to load due to bad software service performance, that is, the total monitoring time during which page load time exceeded the predefined threshold. Note that this is not a sum of whole monitoring intervals, but only those intervals' portions during which problems occurred. This metric is not calculated in PVU mode.

Metrics: Operation Time

Metrics: Application Performance


Zero window size events - Client sets this in TCP header when it wants the other side to slow down with data transmission because it cannot keep up with the transmission speed. Indicates that receiving machine is busy with other tasks. Network time - The time the network (between the user and the server) takes to deliver requests to the server and to deliver page information back to the user. In other words, network time is the portion of the overall time that is due to the delivery time on the network. Redirect time - The average amount of time that was spent between the time when a user went to a particular URL and the time this user was redirected to another URL and issued a request to that new URL. The difference between Redirect Time and HTTP Redirect Time is that the former counts all operations, while the latter refers only to those operations for which redirection actually took place. Server Time The time it took the server to produce a response to a given request.

Server operation size - The size of a server operation. In HTTP and HTTPS (decrypted and non-decrypted), server operation size equals the page size.

Components and Relations

DCRUM Components
Enterprise Portal
Dashboards
Operational reports

Central Security Server


LDAP, users DB

Business Service Manager


Service Management

Central Analysis Server (CAS)


Data Mining Interface (DMI) Performance Management Database

3rd-party Integration

Service Model

RUM configuration Console


Configuration database

Synthetic Monitoring

dynaTrace DTM

Agentless Monitoring Device (AMD)

DCRUM Components - Enterprise portal


Role of the Enterprise portal
Adds new report workflow: AHS, DCA Optional component

CAS reports remain as before


Portal workflow drills down to CAS reports for details Seamless from the user perspective
CAS AMD ADS Enterprise Portal

DCRUM Components
Central Analysis Server (CAS) The main reporting component for dynaTrace Data Center Real-User Monitoring Combines measurements from the Agentless Monitoring Device (AMD) using different contexts CAS pulls its data from the AMDs in the form of zdata sample files Stores its results in an MS SQL Server database Results can be viewed real time or historically Agentless Monitoring Device (AMD) Network probes that analyze network traffic Console Client Used for configuring devices and application monitoring Console Server Stores the configuration in a flat file database

DCRUM Components
Compuware Security Server (CSS) New in the 12.0 release is a new functionality called the Compuware Security Server. Provides a central authentication and user management capability for o Central Analysis Server, Console, Advanced diagnostic server, Enterprise Portal and BSM This central component allows Users to defined locally in a CSS database or for the customer to use their own corporate user management system such as the LDAP based systems Active Directory or Apache DS. Advanced Diagnostics Server (ADS) Is a separate report server, that is integrated with CAS on reporting and configuration level Provides a more detailed, troubleshooting-oriented analysis (i.e. element level for HTTP instead of page level on CAS) Supports applications based on HTTP(S), XML over HTTP(S)/MQ, SAPGUI, DB2, MSSQL, Sybase, Informix, Oracle and Oracle Forms

DCRUM Components
ADS pulls its data from the AMDs in the form of vdata sample files Stores its results in an MS SQL Server database Results can be viewed real time or historically Enterprise Portal (EP) Helps speed the isolation of the fault domain and reduces the cost of troubleshooting issues, while restoring service as quickly as possible. Contains robust data mining and report building tools for creating new and customized reports quickly and easily. Contains dashboards which display graphs, geographic views, and tabular data regarding service and application quality, fault domain isolation, business impact, and infrastructure health. Consolidates reporting, security, and configuration functionality into a single component.

Analysis Modules
Transaction decode (analysis modules) include:
HTTP/HTTPS SAP SOAP/XML Databases: MS SQL, Oracle, DB2, Sybase, Informix

Oracle Forms
IBM MQ MS Exchange

Thin Client (Citrix/Terminal Services)

Analysers
Multi-purpose and Expandable Product Family
CAS (Web)
Oracle EBS HTTP(S) Siebel Fault Isolation Detailed HTTP MS Exchange Oracle Forms

Tuxedo/JOLT

Bus Trans

SAP GUI

SQL\ DB

TCP/IP

SOAP

Information Database

Central Analysis Server

Advanced Diagnostics Server

AMD

Network Vantage Probe


Passive traffic analysis (since v 10.1)

Flow Collector
Netflow data analysis (since v 10.2)
87

Collection and Measurement

Passive traffic analysis

SMTP

Citrix

XML

DNS

MQ

Analysis and Reporting

CAS (Ent)

ADS

Analysis Modules

Enterprise Portal Dashboards

Industry-leading breadth of analysis


1

2 3

4
5

1. 2. 3. 4. 5.

Real-time and historical trending views of application , user, network and overall data center performance Supports web and non-web applications such as SAP. Quickly identify poorly performing data center tiers. Isolate network performance impact on applications and users. Monitor baseline performance and availability with synthetic monitoring.

Optimize end-user experience

View overall status of applications and end-user performance through a single dashboard that includes quick drill down views into performance, availability, operation time and usage for individual applications and users.

Multi-tier Data Center Monitoring

Caption: Drill down from Application Health Status for a focused analysis of performance by data center tier. Isolating application performance problems in multi-tier environments in todays modern application and data center architectures is a daunting task for IT, yet the business demands rapid problem isolation to reduce business impact. The new Data Center Analysis View provides instant visual indication of problem areas with 1-click access to detailed troubleshooting information. Isolate tier, server, time period, slow web pages, middleware messages, and database queries in a single interactive view that accelerates fault domain isolation.

Multi-tier Data Center Monitoring (contd)


1 2

1. 2. 3. 4.

Data Center Analysis provides real-time views of application performance, operations, availability and usage along with requests broken down by the supporting tier of infrastructure. Historic detail of performance of tiers is displayed with mouse-over detail of how user and application performance is affected by the corresponding infrastructure tier. Individual application operations are displayed in context of overall application performance, network health and end-user experience. End-user performance is displayed for any infrastructure tier and can be sorted by user group, individual users or client types.

One click to deep-dive application analysis


1
2

1. DCRUM provides a broad view across infrastructure to triage performance of services, servers operations and websites. 2. Reports on affected users, transaction times and availability quickly surface hot spots in application performance. 3. From DCRUM dashboards, a direct drill down into dynaTrace reporting provides method call and code-level analysis of application performance issues.

Optimize end-user experience (contd)


1

1. 2. 3.

Drill down from affected users heat map to view individual user performance Identify the application(s) responsible for poor end-user performance. For specific users, identify the offending application operation with a breakdown of slow, fast and aborted requests

Central Security Server

CSS Consolidated User Management


The Compuware Security Server (CSS) is a new consolidated authentication and user management system in 12.0 CAS RUM Console ADS Enterprise Portal BSM Local defined users Corporate LDAP Active Directory Apache DS

CSS

CSS Features / Value


Users have one account/password to access DC RUM and BSM Seamless pass-through from Enterprise Portal to CAS / ADS Enterprise Portal connects to 12.0 CAS / BSM without login Administrator usernames / passwords: One vs. three Manage users in one location

Common roles across components


Audit logging online and exportable Consistent LDAP and LDAPS access Consistent password policies

Central Analysis Server

Central Analysis Server (CAS) Report Server


CAS is the main report server and repository for real user monitoring Metrics are aggregated at interval level for each unique client + operation + server o An operation is a web page load, database query, web service call, etc. Other features of the CAS o Custom reporting (DMI) o Alerts o Baselines CAS has two personalities. Transactional Monitoring (web analysis) o Focused on specific applications: web, SQL, SOAP, etc. o NOTE: Not just Web analysis Enterprise Monitoring o General network traffic monitoring CAS can also store/report on metrics from synthetic transactions and J2EE & .NET agents

CAS - Data Mining Interface (DMI)


The DMI is the custom reporting tool for DC RUM No need to write custom SQL queries 100% web based Create reports (tabular, charts) from any DC RUM data source: real user monitoring metrics, Java & .NET agent metrics, etc. Reports can be scheduled (send daily summary reports by email every night) Reports can be linked together to create a customized drilldown workflow Data can be exported Report definitions can be imported/exported (for reuse at another client) Metric names can be aliased to match customer terminology Intimidating to use at first glance, but its easy to master

CAS - Data Mining Interface (DMI)

CAS - Data Mining Interface (DMI)

CAS - Alarm System Overview


The alarm mechanism enables you to be proactive rather than reactive
Fixed thresholds V Baselines

Alarms can be sent to a specified e-mail address, or can be sent via an SNMP trap.
There are also alarms that are generated even if they have no subscribers assigned. Such alarm notifications are recorded in the alarm logs, which store records of all alarms generated.

Modify the existing alarm or define new alarms.

CAS - Types of Alarms


Alarms based on SQL detectors
Using SQL queries, these alarms perform queries on the traffic monitoring database. The benefit of using these alarms is that there are no constraints to the complexity of the queries and any event that can be expressed as an SQL query can be detected.

Alarms based on Java/.NET Monitoring measurements


VAMETRIC_ALM - for alarms performing queries on measurements related to entry points VAMETHODMETRIC_ALM - for alarms performing queries on measurements related to object methods or SQL queries PAT_VIO_4_AS_RES - for alarms performing queries on measurements related to JMX/WMI metrics

Metric alarms
These alarms provide a simple and fast mechanism for performing complex queries on a set of pre-defined metrics. The advantage of using these alarms is easy of use and modification as well as performance. To define metric alarms, you do not need to know the structure of the database or how to program in SQL. However, not all conditions can be expressed as metric alarms.

Network alarms
These alarms are similar in design and function to the metric alarms above, though they view the monitor traffic as it is done on the Network View report.

Link alarms
These are fast-executing alarms designed to monitor link utilization as presented on the Link View report.

Other alarms
A few other alarms are available which were designed for very specific purposes and which can be modified in only limited ways and which do not allow user access to the detector code.

RUM Console

Components
RUM Console consists of two components: RUM Console Server A back-end server application that maintains configuration images and device information, runs tasks related to configuration management, and provides a Web services API for RUM Console to manage configurations. The server is a Windowsbased service that can be installed on a machine with Windows 2003 Server or Windows 2008 Server R2 with a network connection to all of the managed devices within the Compuware APM infrastructure. RUM Console A GUI application for configuring report servers and data collectors. With the console, you can create and edit configurations for Compuware APM devices and propagate such configurations to other Compuware APM devices

RUM Console

Guided configuration: first time users, easy configuration first steps

Wizard configuration
Tracing ability Entire configuration: experienced user All same options Health reports Sequence transactions

Guided Configuration

Device information

Agentless Monitoring Device

AMD
The Agentless Monitoring Device (AMD) is a completely passive device, placing no additional load on the network. The AMD can be connected to the network in two ways: Spanning the switch In todays switched environments most switches have the ability to mirror multiple ports and or multiple VLANs to a single monitoring port. This gives the AMD the ability to passively monitor traffic from a number of different perspectives. Therefore the AMD can see traffic in front of and behind load balancers, as well as all the tiers in between. In cases where the switch can not accommodate more spans, the use of regeneration taps can be favourable. Cisco switches may also use VLAN Access Lists (VACLs) to bridge routed traffic to an outgoing port much in the same way as port mirroring. Passive Taps In certain cases, the use of span ports may not be viable. In this case passive taps may be utilized to capture the application traffic to be monitored. This method requires multiple tap points to fully see all tiers within the application.

AMD
AMDs job is to sniff traffic for the purpose of performance monitoring AMD processes performs initial processing of the data. Data is organized into files to be retrieved by report servers at configured time intervals Red Hat Enterprise Linux 5+ and 6+ Hardware slots are filled with additional network interface for monitoring Monitoring NICs are passive Can be copper or fiber or mixed SSL decryption is performed on the AMD RSA private key needed SSL decryption card (Nitrox Cryptoswift) Decryption processing is offloaded from main CPU RSA keys are guarded. They are not stored on disk or in main memory. Software only OpenSSL AMD does not store/keep packet traces. It inspects packets to see the URL, the userid, etc. The exception is HTTP Header request/response and POST data when using the ADS report server (optional) Sensitive data can be masked

Advanced Diagnostic Server

Advanced Diagnostics Server (ADS) Report Server


ADS is the deep-dive report server and repository for real user monitoring on web and SQL applications
Operations are not aggregated (like in CAS). Every monitored transactions can be reviewed in detail

Breaks down the page load time by individual web page element (images, css, javascript, etc.)
Can be used to drill into the transaction to see the input submitted by the user (POSTed data).

Supports monitoring of business transactions


Stores data only 3-4 business days

ADS Report Server Example

ADS Report Server Example

ADS Report Server Example

ADS Report Server Example

Component Scaling

What causes sizing problems


Incorrect Product Positioning Deep-dive bottom-up troubleshooting approach instead of top-down Application Performance and EUE Monitoring Using short-term POV parameters in longer term Post-Sales implementation All-Traffic without any filters limiting IP addresses Too many individual Clients No user aggregation User ID recognition generates too many identifiers Monitor specific page defines URL parameter with too many values (such as phone number, etc.) Too many regular expressions

HTTP Application Error tracking in high-end environment


Storage period is too long without justification ADS in high-traffic trying to handle same page volume as VAS Setting Unrealistic Expectations with the Customer

RECOMMENDED ARCHITECTURES

Report Server Integration and Aggregation


When to integrate multiple report servers?
When one Central Analysis Server is not enough to store all monitoring data from all Agentless Monitoring Devices. When AMDs are geographically dispersed (for example, in different data centers). When you need to use Advanced Diagnostics Servers to broaden your monitoring perspective and add in-depth vision alongside CAS reports. When you need failover and backup operations to provide high availability of reports.

When you want all of the reporting in one place.

RECOMMENDED ARCHITECTURES

Scalability: HTTP decode multi-threading


Heavy HTTP analysis: traffic analyzed by the AMD (Mbps) *
1600 1400 1200 1000 800 600 400 200 0
11.1 32-bit 11.7 64-bit 11.7 64-bit 11.7 64-bit
* - all HTTP analysis feature are enabled, use r recognition and operation recognition uses processintensive regular

multi-threading
32 GB RAM

multi-threading
64 GB RAM

Scalability Each version brings more optimal traffic decoding, the version 12 numbers are bit better than 11.7 version again

Monitoring Component Capacity Guidelines


The CAS database should not contain more than 2 million sessions. ADS offers two modes:
Small Website: Per hit mode can handle 3M page loads (approximately 10M hits) per day. Large Website: Per page mode can handle 13M page loads per day.

For the AMD it differs per traffic profile. Below a few examples can be seen:

123

Distributed data storage benefit


Reduce number of users maintained in the SQL database
This reduces number of CAS sessions Note: CAS client location structure must be welldefined

Practical data reduction levels will vary Theoretical benefit: 3x 7x reduction in number of sessions

Central Analysis Server - Scalability

RECOMMENDED ARCHITECTURES

AMD scaling
Passive in-line tap or splitter AMD in load-balancing mode Intelligent switch (e.g. Gigamon, Anue) Each AMD analyzes one or part of one application SPAN

Tap

AMD

AMD

AMD

126

RECOMMENDED ARCHITECTURES

CAS scaling
Add more CAS servers and distribute data per monitored Server IP Designate one CAS as master

AMD

All DMI reports will use all servers as data source

CAS

CAS

CAS

127

DCRUM Components - CAS master/slave


CAS, ADS network of master and slaves is seen as ONE by the portal

Enterprise Portal

CAS AMD ADS

CAS master-slave network


One of the CASes is designated as the master
Monitoring functionality of this CAS is similar to all other CASEes
Meta-data for consolidated reports is served from slave servers to the master Master builds a consolidated report for the user

Enterprise Portal

DMI front-end AMD Probe DMI back-end

ADS always acts as a slave server There are no performance reasons to set up a separate Master CAS
Just designate one of the CASes in the cluster

Central Analysis Server

Central Analysis Server


AMD Probe

DMI back-end

Central Analysis Server


AMD Probe

DMI back-end

RECOMMENDED ARCHITECTURES

CAS scaling SQL offload


SQL database on separate hardware Makes sense only if I/O of the SQL server is faster then I/O of the CAS h/w

AMD

Shared SQL servers not recommended for high loads

CAS

CAS

ADS

SQL
130

Additional Analysis tools

Transaction Trace Analysis

Complex application and network interaction can demand more than real-time monitoring. DCRUM includes a Transaction Trace feature that provides deep root cause analysis needed to quickly remedy complex network problems

Transaction Trace Analysis


1. Dig deeper into server processing delays with Thread Analysis visibility into popular protocols such as HTTP/S, SQL, SAP, WebSp here MQ, RMI/IIOP, CIFS, and more 2. Pinpoint the source of application performance problems by identifying the impact of the network on transaction response time 3. Roll out applications that perform well from the start by predicting and tuning response time before deployment

Other decodes including Citrix WAN Opimisation

134

Monitoring Citrix
VTCAM software is installed on presentation server (Citrix or MS Terminal Server)
Runs a Windows service Collects CPU & Memory utilization stats of Citrix host Maps back-end application traffic to the responsible end-user (session mapping data)

CAS reports
CAS

Gomez User

Monitoring Citrix
Citrix Remote Users Citrix Server Farm

Database Servers Corporate Network

Appropriat e Analysis Modules Web TCP level analysis AMD Applicati ons

TCAM

CAS + Enterprise Analysis

Other Application s

Thin Client Analysis Module (TCAM)

TCAM Vantage Thin Client Analysis Module


Target Environments Citrix and WTS enabled applications Deployment Considerations

A lightweight component is placed on the server to correlate user logins and back end Citrix conversations.
The agent uses Citrix API and Microsoft Windows API to obtain information on which user is opening which TCP sessions from the Citrix/WTS server. Agent communicates in real-time with AMD and provides mappings from TCP session IDs to Citrix user login names. This information is used by AMD to tag measurements taken on the Citrix<->application server path with actual user login names.

Thin Client Analysis Module (TCAM)

Target Environments
Citrix and WTS enabled applications Deployment Considerations Additional information on resources utilization (CPU, HDD, RAM, TCP, Number of Terminal Services sessions and Number of active Terminal Services sessions) statistics of Citrix server is also available. One AMD can monitor multiple Citrix/WTS machines (different servers, different protocols) One CAS can gather data from multiple AMDs and provide a single view of service delivery

Monitoring Citrix

Monitoring Citrix

Monitoring WAN Optimization

WAN Optimization Controllers (WOCs) are installed at branch office and data center locations
The AMD adds a SPAN or TAP on the optimized side of the data center WOC

Monitoring WAN Optimization

Vous aimerez peut-être aussi