Occam's Razor - An Introduction To Holistic Troubleshooting

ID902 Occam's Razor: An Introduction to Holistic Troubleshooting
Wes Morgan Senior Software Engineer
2011 IBM Corporation
Agenda
Why are we here?

Increasingly Complex Architectures Specialization within IT/IS Command and Control Issues Consequences of Fix it NOW! Preparation Understanding Your Deployment Knowing Your Routine Knowing Your Limits Execution Ask Your Neighbors Identify/Refine Your Target Problem vs. Routine Client, Server or Both? Recent Changes Lather, Rinse, Repeat...
The Holistic Approach Occam's Razor
Questions & Answers

Why Are We Here? Complex Architectures

Fault Tolerance/Redundancy Load Balancers Firewalls Intranet/Extranet Virtualization
Why Are We Here? IT/IS Specialization

We don't handle that Different team Communication often rare and/or difficult Simple questions answered slowly No one really sees big picture
Why Are We Here? Command and Control

We can't do that until the next window Change Control != everyone informed Software integration demands team integration as well Multiple vendors/contractors may be involved
Why Are We Here? Fix It NOW Consequences

Panic mode Time-to-resolution faces sometimes arbitrary limits All hands on deck Overall technical guidance lacking Troubleshooting becomes scattershot
The Holistic Approach Occam's Razor
Pluralitas non est ponenda sine neccesitate. Plurality should not be posited without necessity. William of Ockham, c. 1285-1349
Close relatives:

When two theories explain the same phenomenon, choose the simpler admit no more causes..than such as are both true and sufficient... (Newton) KISS: Keep It Simple, Stupid
Why Use Occam's Razor?

Multiple failures highly unlikely Far more likely that one root failure triggered additional problems Playing it could be introduces complexity and (probably) politics Don't chase rabbits!
Preparation Understand Your Deployment

It's far more than just your stuff Hardware (or lack thereof!) Operating System Network (within the data center) Network (long haul/extranet/VPN) Dependencies (directory, SAN) Special-purpose devices (firewalls/proxies/reverse-proxies) Network appliances
KNOW YOUR DATA PATH!

Preparation Know Your Routine
Profile your systems!
perfpmr (AIX), perfmon (Windows), iostat/vmstat (Linux)
Understand what normal looks like Be sure to profile peak time too! Logins/sessions per day User patterns (e.g. Accounting end-of-month) Domino platform statistics can be VERY useful
Preparation Knowing Your Limits
Compare your routine use to:

Vendor benchmarks Third party testing/whitepapers Software specifications CPU utilization RAM consumption ESPECIALLY important in virtual environments
Know how much wiggle room you have

Execution Ask Your Neighbors
Many deployments in your environment share potential points of failure

Load Balancers SAN
Quick check with peers may identify common problem quickly Formalize this process if you can weekly outage reports? May also be indicative of general network issues Allows you to handle some issues without vendor involvement
Execution Identify/Refine the Target

Most missed aspect of troubleshooting Identify scope/range of affected users Identify scope/range of affected servers LOOK FOR COMMON FACTORS!

Third-party applications Same location Same release Time of day
Check for customizations Follow the data flow!
Execution Problem vs. Routine

Take a snapshot of the problem Compare it to routine data May identify particular areas of concern May allow vendor to focus their efforts better/faster Examples:

Domino NSD NAMElookup activity Perfmon/perfpmr/iostat disk queuing
Pay particular attention to period just BEFORE problem (last 10 minutes) Be prepared to be pointed in a different direction!
Execution Client, Server or Both?

DON'T GO AFTER A FLY WITH A SLEDGEHAMMER! Resist the urge to turn on all the debug Overly ambitious debug can present its own performance cost

DEBUG_TCP_ALL in IBM Lotus Domino VP_TRACE_ALL in IBM Lotus Sametime debug=FINEST in Java
It's worth a round of data gathering to target server debug more specifically High-level client-side debug correlates well with trace logs

Live HTTP Headers (Firefox add-on) Firebug (Firefox add-on) Fiddler (MSIE proxy)
Again, gather twice - routine and problem - when possible
Execution Recent Changes

Back to Change Control Look for ANY changes close to start of problem Don't forget to check for OS patches/updates Look for new stuff too... Check all along the data flow
Lather, Rinse, Repeat...

Be prepared to cycle through this process several times Apply same principles to each area of troublehsooting Example:

Identify/Refine shows only particular users suffering Logs show directory issues Now, users not experiencing problems are routine Troubleshoot directory by comparing problem users against routine users e.g. get LDIF dumps for both
Only go where the evidence takes you!
QUESTIONS & ANSWERS
Please
complete a session evaluation!
More
questions? Find me in the Lotus Solutions Development Lab!

THANKS
FOR BEING HERE!

Occam's Razor - An Introduction To Holistic Troubleshooting

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Occam's Razor - An Introduction To Holistic Troubleshooting

Transféré par

Droits d'auteur :

Formats disponibles

ID902 Occam's Razor: An Introduction to Holistic Troubleshooting

Wes Morgan Senior Software Engineer

2011 IBM Corporation

Why are we here?

The Holistic Approach Occam's Razor

Questions & Answers

Why Are We Here? Complex Architectures

Fault Tolerance/Redundancy Load Balancers Firewalls Intranet/Extranet Virtualization

2011 IBM Corporation

Why Are We Here? IT/IS Specialization

2011 IBM Corporation

Why Are We Here? Command and Control

2011 IBM Corporation

Why Are We Here? Fix It NOW Consequences

2011 IBM Corporation

The Holistic Approach Occam's Razor

2011 IBM Corporation

Why Use Occam's Razor?

2011 IBM Corporation

Preparation Understand Your Deployment

KNOW YOUR DATA PATH!

Preparation Know Your Routine

Profile your systems!

perfpmr (AIX), perfmon (Windows), iostat/vmstat (Linux)

2011 IBM Corporation

Preparation Knowing Your Limits

Compare your routine use to:

Know how much wiggle room you have

2011 IBM Corporation

Execution Ask Your Neighbors

Many deployments in your environment share potential points of failure

Load Balancers SAN

2011 IBM Corporation

Execution Identify/Refine the Target

Third-party applications Same location Same release Time of day

Check for customizations Follow the data flow!

2011 IBM Corporation

Execution Problem vs. Routine

Domino NSD NAMElookup activity Perfmon/perfpmr/iostat disk queuing

2011 IBM Corporation

Execution Client, Server or Both?

Again, gather twice - routine and problem - when possible

2011 IBM Corporation

Execution Recent Changes

2011 IBM Corporation

Lather, Rinse, Repeat...

Only go where the evidence takes you!

2011 IBM Corporation

QUESTIONS & ANSWERS

complete a session evaluation!

questions? Find me in the Lotus Solutions Development Lab!

FOR BEING HERE!

2011 IBM Corporation

Vous aimerez peut-être aussi