Vous êtes sur la page 1sur 43

Application Monitoring

Jeremy Kalsow

The Northwestern Mutual Life Insurance Company Milwaukee, WI

Why Application Monitoring

Majority of all corporations Northwestern Mutual Total 1,000+ servers Team is 6 people

Team uses 16 servers

Average 50 applications per server

Need a way to know status fast

What is it?
The ability to monitor performance and availability

Gather metrics
Show trends

Pretty pictures for management

Trends predict future problems Solve application issues faster Uptime relates directly to profit for many companies

View all applications, servers, databases and other items being monitored with a single dashboard.

Types of Monitoring
Fault Performance Configuration Security


Detects major errors Easy to implement Examples
Network loss Database Connectivity

Very Important

Type of Monitoring Hardware
CPU utilization Memory utilization Storage System

What to Monitor
CPU load Memory load Available space Application working Error Log monitoring Database is online Latency

When to monitor
Load > 99% for x minutes Load > 99% for x minutes System out of Space Working or Error If error occurred Database is up/down Latency > acceptable range


Application available Application Logs

Databases Network

Database online Latency

Slow Performance Service Level Agreements Metrics Old and New Metrics

Visual Display


http://www.ibm.com/developerworks/websphere/library/techarticles/0304_polozoff /polozoff.html

Configuration variables Connectivity Speed Performance

Proactive Servers and Applications

Why would the configuration change? Hardware Storage Service packs

Hot fixes
Windows Updates

Attempts to access the system Open ports Inventories Firewall

System events

Blocked Exploits

Monitors Usage Generally used for fees Profit/Loss

Electric Company Northwestern Mutual

Types of Monitoring Recap

Fault Performance Configuration Security


Types of Monitoring Recap

Historical data Baseline test Current test Performance disagreements

Types of Monitoring Recap

Allows for trends to be seen Modifications can be made Trends over multiple releases

Types of Monitoring Recap

Monitoring is important Not enough time is given Implemented After discovery of an issue Monitoring only in areas of known problems

Adding monitoring requires time and money

Challenges of application monitoring

Various types of systems Shared Clustered Virtualized

Production logging

Shared Systems
1 server / Multiple applications System resources are shared Tracking individual usage is difficult Many applications may be impacted

Server without access (production)

Clustered Systems
Applications on more than one server Avoid single point of failure May be hard to target the issue

Production Logging
Generally Limited Most errors repeated in test Application downtime Use of company resources

Implement Application Monitoring

Plan Early Monitor Proactively Create a Recovery Plan Create and use SLAs

Plan Early
Planning stage Add monitoring during development Late additions cover known issues

Monitor Proactively
Harder to implement Issues are dealt with before end user knows

Monitor Proactively
Tools based approach Easy and relatively fast setup No code Multiple applications

Monitor Proactively
Logging is directly in the code Less efficient More specific Developers have less time

Create a Recovery Plan

Fast resolution Knowledge management

Recovery Plan Template

Service Level Agreements

What percentage of time that the services will be up (uptime) How many people can use the application at once without performance issues Performance metrics and benchmarks to be used with performance monitoring alerts The rules for notification announcements What statistics will be monitored and when and where they will be available Acceptable response time

Service Level Agreements

Using the Statistics

Visual display Alerts Tickets

Visual (Dashboard)
Easily view statistics Comparison results Trend comparison Cross Platform

Auto-generated management reports


Alerts and Tickets

Auto-generated alerts Tickets for queue system Vital information in each

Alerts and Tickets

Most common: Email Text, popup, printout, recording and more Tickets: auto-generated Knowledge databases

Common fixes and resolutions

Application Monitoring
Maximize application uptime Higher end user satisfaction Higher Profit

Polozoff, A. (2003, April 9). Proactive Application Monitoring. IBM - United States. Retrieved October 20, 2011, from http://www.ibm.com/developerworks/websphere/library/techarticles/0304_polozoff/polozoff.html Choice. (2009, December 20). Application Monitoring. Adminschoice - Unix Made Easy. Retrieved October 31, 2011, from http://adminschoice.com/application-monitoring Application Monitoring Software - uptime software. (n.d.). Server Monitoring Software - IT Systems Management, Capacity Planning, Application and Server Monitoring Tool by uptime software. Retrieved October 31, 2011, from http://www.uptimesoftware.com/application-monitoring.php Marko, K. (2005, December 30). Proactive Application Monitoring. Processor.com:

Data Center IT Equipment at Processor, Routers, Storage, Rackmount Servers, Computer Room Cabling and Flooring. Retrieved October 29, 2011, from http://www.processor.com/editorial/article.asp?article=articles%2Fp2752%2F43p52%2F43p52.asp
"IT Service Level Agreement Templates | ContinuityPlanTemplates." ContinuityPlanTemplates | Free Business Continuity Plan (BCP) Templates. ContinuityPlan Templates, n.d. Web. 30 Oct. 2011. http://www.continuityplantemplates.com/it-service-level-agreement-templates


Upcoming events with Dashboard

Ability to display visualized graphs and other pertinent information

Ability to click a failed component and have the system auto generate a ticket
Ability to Alert others of the issue found Performance monitoring as well as fault