BSM Guide

tm
The Definitive Guide To
Business
Service
Management
Greg Shields
Introduction
Introduction to Realtimepublishers
by Don Jones, Series Editor
For several years, now, Realtime has produced dozens and dozens of high-quality books that just
happen to be delivered in electronic formatat no cost to you, the reader. Weve made this
unique publishing model work through the generous support and cooperation of our sponsors,
who agree to bear each books production expenses for the benefit of our readers.
Although weve always offered our publications to you for free, dont think for a moment that
quality is anything less than our top priority. My job is to make sure that our books are as good
asand in most cases better thanany printed book that would cost you $40 or more. Our
electronic publishing model offers several advantages over printed books: You receive chapters
literally as fast as our authors produce them (hence the realtime aspect of our model), and we
can update chapters to reflect the latest changes in technology.
I want to point out that our books are by no means paid advertisements or white papers. Were an
independent publishing company, and an important aspect of my job is to make sure that our
authors are free to voice their expertise and opinions without reservation or restriction. We
maintain complete editorial control of our publications, and Im proud that weve produced so
many quality books over the past years.
I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially if
youve received this publication from a friend or colleague. We have a wide variety of additional
books on a range of topics, and youre sure to find something thats of interest to youand it
wont cost you a thing. We hope youll continue to come to Realtime for your educational needs
far into the future.
Until then, enjoy.
Don Jones
Table of Contents
Introduction to Realtimepublishers.................................................................................................. i
Chapter 1: The Power of Business Service Management................................................................1
The Intent of this Guide ...................................................................................................................4
Business Service ManagementMore than a Framework..............................................................4
The Chasm Between IT and the Business............................................................................5
What Is a Business Service? ................................................................................................6
Example Business Services..................................................................................................8
Managing Business Services..............................................................................................10
Dashboards and Service Visibility.....................................................................................13
Elements of BSM...........................................................................................................................14
Alignment of IT and the Business .....................................................................................14
The Evolution of IT Service Management.........................................................................14
Implementing BSM............................................................................................................15
End User Experience Monitoring ......................................................................................16
Achieving Management Value ..........................................................................................16
Achieving Operational Value ............................................................................................16
Achieving IT Value............................................................................................................17
ITIL and Six Sigma............................................................................................................17
Important Definitions.....................................................................................................................17
Business Impact Management ...........................................................................................17
Service Level Management................................................................................................18
Real-Time Service Visualization .......................................................................................18
Operational Metrics ...............................................................................................19
Service/Asset Metrics ............................................................................................19
Business Metrics ....................................................................................................19
Executive Views ....................................................................................................19
Fault Trees .............................................................................................................19
Impact Trees...........................................................................................................20
Business Calendar..................................................................................................20
Process Integration.............................................................................................................20
Workflow ...............................................................................................................20
Six Sigma ...............................................................................................................21
ITIL ........................................................................................................................21
ii
Table of Contents
BSM Empowers Decision Makers.................................................................................................21
Chapter 2: The Alignment of IT and Business ..............................................................................22
The Chasm Between IT and the Business......................................................................................23
Responsibilities and Priorities............................................................................................24
Early IT ..............................................................................................................................25
Users Become Customers ...........................................................................................26
Proactive IT........................................................................................................................26
Alignment Inhibitors......................................................................................................................27
No Common Dialog...........................................................................................................27
Mismatched Expectations ..................................................................................................28
Technology-Focused Metrics.............................................................................................28
Siloing ................................................................................................................................28
Reactive Mode IT ..............................................................................................................29
The Gartner IT Maturity Curve......................................................................................................29
Chaotic ...............................................................................................................................30
Reactive..............................................................................................................................31
Proactive ............................................................................................................................32
Service................................................................................................................................33
Value ..................................................................................................................................33
BSMs Impact at the Various Maturity Levels ..................................................................34
IT Focus Is Changing.....................................................................................................................35
Revenue Impact .................................................................................................................36
Competitive Advantage .....................................................................................................36
Agility ................................................................................................................................37
Reactive to Proactive IT.....................................................................................................37
Why Invest in BSM?......................................................................................................................38
Where it Works ..................................................................................................................39
The Dashboard Audience.......................................................................................40
Technicians and Administrators ............................................................................40
Managers................................................................................................................41
Executives ..............................................................................................................41
Where It Doesnt Work......................................................................................................41
Low Risk Implementation..................................................................................................42
iii
Table of Contents
Cost Containment Aspects.................................................................................................43
Governance and Compliance Aspects ...............................................................................43
The Value of Alignment ................................................................................................................43
Chapter 3: IT Service Management Evolution ..............................................................................44
Maturity Impacts IT Goals.............................................................................................................45
What Is an IT Service?...................................................................................................................46
Service Management..........................................................................................................47
The Timeline of Management and Monitoring..............................................................................48
Early Management .............................................................................................................50
Proprietary Agents .............................................................................................................50
Native/Agentless ................................................................................................................51
Focus on Value ..................................................................................................................53
The Evolution of Service Management Targeting.........................................................................56
Network Availability and Utilization.................................................................................57
Server Performance............................................................................................................57
Troubleshooting and Predictive Analysis ..........................................................................58
End User Experience..........................................................................................................59
J2EE & .NET Application Performance................................................................59
Business Service Management ..........................................................................................62
An Example ...................................................................................................................................62
Network Availability and Utilization.................................................................................63
Server Performance............................................................................................................63
Troubleshooting and Predictive Analysis ..........................................................................64
End User Experience..........................................................................................................64
Business Service Management ..........................................................................................65
Moving Along the Evolutionary Curve .........................................................................................66
Speeds Troubleshooting.....................................................................................................66
Improves Performance .......................................................................................................67
Fills Out Systems Vision ...................................................................................................67
Enables Proactive Management.........................................................................................67
Summary ........................................................................................................................................68
iv
Table of Contents
Chapter 4: Implementing BSM......................................................................................................69
BSM Provides a Business Focus to IT Operations ........................................................................70
Three Reasons to Implement BSM ................................................................................................71
Understand the Critical to Quality Services.......................................................................71
Manage Daily Risk and Improve Business Decision Making ...........................................71
Initiate Service Improvement Activities ............................................................................71
The Seven Steps of a BSM Implementation ..................................................................................72
Step 0 Preparation .......................................................................................................................72
Identify Key Project Members...........................................................................................72
Identify Stakeholders and Build the Project Plan ..............................................................73
Step 1 Selection...........................................................................................................................74
Identify Critical and Measurable Business Services..........................................................74
Assess Services ..................................................................................................................75
Assess Cost to the Business ...............................................................................................75
Step 2 Definition .........................................................................................................................76
Define Services ..................................................................................................................76
Define Service Requirements ............................................................................................78
Define Problems and Opportunities...................................................................................79
Define Critical Success Factors .........................................................................................79
Step 3 Modeling..........................................................................................................................79
Model Defined Services and Dependencies ......................................................................80
Model Associated Metrics .................................................................................................81
Build the Service Model ....................................................................................................81
Step 4 Measurement....................................................................................................................83
Implement Data Collection ................................................................................................84
Measure Services & Gaps..................................................................................................85
Step 5 Data Analysis...................................................................................................................86
Analyze Returned Monitoring Data...................................................................................86
Validate Measurements & Costing Assumptions ..............................................................86
Build Fault Tree Analyses .................................................................................................87
Build Impact Analyses.......................................................................................................88
Step 6 Improvement....................................................................................................................89
Locate Problem Domains...................................................................................................90
Table of Contents
Identify and Resolve Gap...................................................................................................90
Revise the Service Model ..................................................................................................90
Step 7 Reporting .........................................................................................................................91
Implement Dashboards ......................................................................................................91
Implement Notification......................................................................................................91
Hand-off to Operations ......................................................................................................92
A Carefully Planned Implementation Is a Successful Implementation .........................................92
Chapter 5: End User Experience Monitoring.................................................................................93
System Counters Alone Cannot Fully Represent the End Users Experience...............................94
Looking at the Wrong Set of Data .....................................................................................96
The Egg Timer Problem .................................................................................................96
System Counters Are Critical to the Systems Administrators and End User Experience Is Critical
to the System Users........................................................................................................................97
Agent-Based Monitoring ...................................................................................................98
Agentless Monitoring.........................................................................................................99
Understanding the CNS Spread....................................................................................100
Watching How Users Interact with the System ...............................................................101
An Example .................................................................................................................................102
Visibility ..........................................................................................................................102
Prioritization ....................................................................................................................102
Resolution ........................................................................................................................103
Improvement ....................................................................................................................103
Impacted Technologies ................................................................................................................104
Web Front End.................................................................................................................105
Packaged Applications.....................................................................................................106
Thin Client .......................................................................................................................106
Middleware ......................................................................................................................107
Databases .........................................................................................................................107
Importance to IT Goals ................................................................................................................108
Problem Identification .....................................................................................................108
Prioritization ....................................................................................................................109
Pre-Failure Warnings .......................................................................................................109
Finger Pointing Prevention...........................................................................................110
Clear Problem Communication........................................................................................111
vi
Table of Contents
Vendor Accountability.....................................................................................................111
Customer Satisfaction ......................................................................................................112
EUE Ties into BSM .....................................................................................................................112
Necessary for a Complete Picture of BSM ......................................................................113
Importance of Using Both Methods for Monitoring........................................................114
Proactive Awareness........................................................................................................115
EUE Drives BSMs ROI..............................................................................................................115
Chapter 6: Achieving Management Value...................................................................................116
Obtaining and Maintaining Value in a BSM Implementation .....................................................118
Obtaining Value ...............................................................................................................119
Maintaining Value ...........................................................................................................120
Calculating ROI ...............................................................................................................121
Cost to Implement................................................................................................122
Cost Savings Associated with Implementation....................................................122
Revenue Benefits .................................................................................................123
Management Visibility.................................................................................................................124
Visibility & Dashboards ..................................................................................................124
What to Display ...............................................................................................................125
What Not to Display ........................................................................................................126
Access Control .................................................................................................................126
Trend & Reaction Lines...................................................................................................127
Management Control ...................................................................................................................128
Control Dashboards .........................................................................................................128
What to Display ...............................................................................................................129
What Not to Display ........................................................................................................129
Management Impact on Operations .................................................................................129
SLA Measurement & Fulfillment ....................................................................................130
Purchase / Upgrade Decisions .........................................................................................131
Process Integration...........................................................................................................132
Fitting BSM into the Overall Operational Scheme..........................................................133
End User Visibility & Control .....................................................................................................133
System Status ...................................................................................................................135
Outsourcers & Service Providers.................................................................................................135
vii
Table of Contents
Cost & Risk Reduction ....................................................................................................136
Contract Compliance .......................................................................................................136
Enterprise IT ................................................................................................................................137
Cost & Risk Reduction ....................................................................................................138
Customer Satisfaction ......................................................................................................138
BSM Enables an Ongoing Measurement of Management Value ................................................138
Chapter 7: Achieving Operational Value.....................................................................................139
Post-Implementation Operational Achievement..........................................................................141
BSM Correlates and Consolidates to Make Sense of the Data....................................................143
Unifying Management Controls ......................................................................................145
Operational Visibility.......................................................................................................146
BSM as an Extensible Visualization Tool ...................................................................................147
For Management ..............................................................................................................148
For IT ...............................................................................................................................149
For Customers..................................................................................................................150
Example Visualization Data Blocks ............................................................................................151
Availability Charts ...........................................................................................................151
Control Charts..................................................................................................................152
Dial Charts .......................................................................................................................152
Metrics Charts..................................................................................................................153
Pareto Charts....................................................................................................................154
6 Sigma Charts.................................................................................................................155
Outage Impact Charts ......................................................................................................156
Service Statistics Charts...................................................................................................157
Stoplight Charts ...............................................................................................................158
Heat Charts.......................................................................................................................158
Business Calendars ..........................................................................................................159
Service Quality (Real Time) Charts.................................................................................159
Service Quality (History) Charts .....................................................................................160
Image Maps......................................................................................................................161
Drill-Down Reports .........................................................................................................161
Service Trees....................................................................................................................162
BSM and its Visualizations Provide Return through OPEX Reduction ......................................163
viii
Copyright Statement
Copyright Statement
2008 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that
have been created, developed, or commissioned by, and published with the permission
of, Realtimepublishers.com, Inc. (the Materials) and this site and any such Materials are
protected by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web
site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be
held liable for technical or editorial errors or omissions contained in the Materials,
including without limitation, for any direct, indirect, incidental, special, exemplary or
consequential damages whatsoever resulting from the use of any information contained
in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, noncommercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent
& Trademark Office. All other product or service names are the property of their
respective owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtimepublishers.com, please contact us via e-mail at
info@realtimepublishers.com.
ix
Chapter 1
[Editor's Note: This eBook was downloaded from Realtime NexusThe Digital Library. All
leading technology guides from Realtimepublishers can be found at
http://nexus.realtimepublishers.com.]
Chapter 1: The Power of Business Service Management

Its 2:15am on Monday morning and First Class Glass (FCG), a global distributor of high-end
automotive and industrial glass products, is about to experience a server outage of a minor IT
system at its Denver data center. But this is no standard outage
With domestic data centers in Denver and Baltimore as well as international operations in
Geneva, Switzerland and Osaka, Japan, IT Operations for FCG has hundreds of interconnected
systems and network devices under management. With rare exception, each of these systems is
monitored under the watchful eye of First Class Glass monitoring and notification system.
Whats different about todays outage is that the alert for this minor IT system in Denver actually
got lost in the daily shuffle of alerts and notifications brought up by FCGs network management
system. The alert for the minor IT system was categorized with a very low priority and was
missed by FCGs monitoring help desk. So, this outage will go unnoticed for nearly 6 hours
before IT in the Denver office begins arriving for the Monday morning grind.
When IT personnel in charge of the system begin arriving at around 8:00am, however, they do
notice the outage of this minor IT system, identify the problem with the hosting server
computer, and resolve the problem with little effort. In their chart of system priority, FCGs IT
recognizes this system as having a Tier III criticality level. Within the Service Level Agreement
(SLA) theyve agreed upon with the business, FCGs Tier III systems must be brought back
online within 8 hours. Since the problem began at 2:15am and was resolved prior to 10:15am, IT
sees the problem as within the scope of its SLA and writes off the problem as fixed to agreedupon standards.
These sorts of problems happen every day in IT. With the complexity of todays environments the number of interconnected devices, virtualization, SOA, etc. -outages and degradation of
service on any network is a regular occurrence for networks both large and small. And over the
years, IT organizations have developed a finely tuned sixth sense for finding and fixing problems
within their networks.
But the problem with IT and the way IT handles the identification and resolution of those
problems is in its infrastructure-centric focus. IT, by definition and by charter, concerns itself
with the functioning and non-functioning of infrastructure components. However, the business
services that run on this infrastructure are what ultimately make the business function. Lets look
a little deeper into todays problem over at FCG.
Chapter 1
Mission Critical B2B

Web System
Core Web Site

Sub-System
Moderate Data
Processing System
Minor IT System
Figure 1.1: Monday mornings problem with a minor IT system actually drove a problem into a much larger
enterprise-wide system.
Turns out that the minor IT system wasnt so minor after all. In its end-of-month metrics back
to the business, FCGs IT reports no SLA violations for the month. Our Monday morning
problem didnt rise to the level requiring identification to the business, so ITs breach report
didnt include the outage. However, what IT never realized is that its minor IT system is
actually a small component of a much larger enterprise-wide problem.
The business actually felt the problem quite a bit more than IT did because that minor IT
system was a low-level component of a system thread linking all the way to FCGs Tier I
business-to-business (B2B) Web system. This B2B Web system handles all the purchasing,
returns, and delivery information for FCGs glass purchases worldwide. The role of the minor IT
system is to feed special-order delivery routing information to a moderate data processing system
that in turn feeds into the sites returns subsystem.
Chapter 1
FCGs 6-hour outage between 2:15am and 8:15am caused all incoming special orders to crash at
the most delicate step in the processafter the order had been taken and charged but before it
was completed with routing and delivery information. Because of that error, any special orders
received within that 6-hour period were charged to customer accounts but no delivery or routing
information was captured.
The end-result of todays problem is that the ordering department must undergo a timeconsuming and manual process to identify the failed orders and work with each customer
individually to populate its delivery and routing data. This manual process costs the business
moneyat a cost level typically associated with a Tier I outage.
You can see in the example the dissonance between the ITs priorities and those of the business.
What IT sees as a minor IT system actually affects the business in a highly critical way. IT is
not necessarily at fault for their mislabeling of the system. Theyre doing their job the best way
they can. Where the fault lies is in the essential translation from the priorities for IT and those for
the entire business.
Adding to todays problem is the global nature of FCGs business. Though the problem occurred
between the early hours of 2:15am and 8:15am (MST) in the United States where virtually no
B2B business is being transacted, business was only just beginning 9 hours ahead in Europe, the
Middle East, and Africa (EMEA). Sixteen hours behind, Japan and the rest of Asia-Pac are just
finishing their days. Understanding the business calendar and business periods means
recognizing how time shifts affect a globalized company.
Figure 1.2: Tracking FCGs problem along a graph of time shows the skew created by time zones. An impact
that occurs at 2:15am in one part of the world affects another part of the world in the middle of the workday.
Chapter 1
The Intent of this Guide

The intent of this guide is to discuss in detail the issues and solutions associated with true
management of a business systems. This guide will also help IT translate that monitoring into a
framework that makes sense from a business perspective. Part of this process of making sense
from monitoring data means translating it into terms that non-technical business leaders can
understand.
Well talk about the dissonance between ITs perception of monitoring and how that perception
sometime misses the mark of what truly interests a business. Well discuss how a top-down
approach to monitoring business systems can provide a solution set to overcome that dissonance.
Well go into the operational aspects of measuring your infrastructures availability and
performance, not only from the perspective of the IT employee but also from the perspective of
the customer. That measurement will include metrics that validate or invalidate the success of the
network in terms of providing and hosting services that ultimately run the business.
All these conversations relate to a way of thinking about systems management that turns the
traditional approach on its head. This collection of tools, technique, and technology is called
Business Service Management (BSM), and they enable the business to understand how it is
impacted by IT. It provides a framework of real-world solutions for monitoring, reporting, and
notification that make sense and provide value to those outside the IT organizationits business
leaders.
By the conclusion of this guide, you will be comfortable with the concepts and the
implementation of BSM, what it impacts and what it doesnt. Youll understand the managerial,
operational, and technical value of implementing BSM in your environment. And youll know
exactly what you need to do to properly plan, design, and incorporate BSM methodologies into
your daily workflow.
Business Service ManagementMore than a Framework

Lets start off with a formal definition of BSM, courtesy of Gartner, and then throughout the rest
of this chapter well deconstruct that definition into its individual elements: BSM is a category of
IT operations management software products that link the availability and performance status of
IT infrastructure components to business-oriented IT services that enable business processes.
BSM is effectively a mechanism by which the goals of business are applied to the technology of
IT. With BSM, IT goes about a reconfiguration of the way it considers technologyadding to
the traditional device-based approach it begins embracing a service-based approach to
monitoring. This embrace of technology in terms of business services means that individual
outages are treated as profit-and-loss events for the business. A loss in a subsystem to a business
service feeds into the total quality of that service. A reduction in the performance of a system
reduces its quality. And, most importantly, a decrease in response time for a customer-facing
system reduces its service quality.
The idea of a framework signals to the reader that this concept incorporates a suite of ideas.
Those ideas encapsulate an understanding of the concept, its inputs and outputs, and the
computations that it performs upon its data. Unfortunately, all too often, some concepts rarely
get out of the framework phase. They dont turn into useable products.
Chapter 1
BSM is more than just a framework. It is a fully defined category of software and
implementation guidelines. It ingests availability and performance data and outputs qualityrelated metrics to the business on the health of the networks business services. BSM applies a
dollar value to the reduction in quality for each identified service and serves up that information
on dashboards viewable and understandable to both IT and business leaders. Taking it one step
further, BSM represents the combination of Monitoring + Money.
The Chasm Between IT and the Business
Chapter 2 will provide a much more detailed discussion of the alignment between IT and the
business. That chapter will discuss how IT is maturing past the days of pure firefighting and the
break/fix mentality and will talk there about the differences in vocabulary and prioritization that
can be inhibitors to attaining a high level of organizational maturity. But lets take just a minute
here to talk about the dissonance between the mindset of the IT guys in the basement and the
executives on the top floor.
For most of the early years of computers and networking, IT organizations have used rich tools
to monitor the status of computer hardware and software. Utilizing such technologies as SNMP
for network and UNIX devices, Windows Management Instrumentation (WMI) for Windows
devices, Java Messaging Services (JMS) for Java applications, or any of the many Web Services
protocols, systems administrators have been able for years to query network-attached devices for
status, inventory information, performance metrics, and active configurations. Connecting these
technologies to a centralized Network Management System enables the IT department to build a
single-screen view into the network.
Figure 1.3: Mature IT organizations have for years incorporated Network Management Systems to monitor
and notify when network-enabled devices incur problems.
Chapter 1
That single-screen view helps to enlighten IT as to the health of the network and the devices and
applications that make up that network. If a device goes down, the Network Management System
notifies the administrator through a pop-up alert, an email, or a page to a mobile device that the
system has gone offline. Help desks everywhere have installed heads-up displays where green
lights go red when bad things happen.
When criteria for performance are preconfigured into the system, the same Network
Management System can notify administrators when performance dips below preset thresholds.
Highly mature IT organizations even define auto-remediation actions to occur when
preconfigured events occur. A mature IT shop has probably been proactively monitoring such
elements for years.
Where the chasm occurs is in the definition of whats important. IT tends to deem the status of
each individual device important. If a device is up, the light stays green. Business leaders have
different priorities, though. For a business leader, importance is best measured by customer
satisfaction, external service availability, and the capability to meet customer needs. If a
customer completes a Web site transaction and is satisfied with the results, the business leaders
light stays green.
But who really owns the service and is responsible for its quality? Is it the business leader who
pays for and relies upon the service? Or is it the IT organization that watches it, manages it, and
ensures it remains up and operational? According to BSM, it is a combination of the two. With
BSM, and the tools that feed its framework, each half of the ownership is provided with the
information it needs to make the best decisions within its universe of control. Table 1.1
highlights some examples of this idea.
Elements Needed by IT
Elements Needed by Business Leadership
Device availabilities
Service quality
System performance metrics
Customer wait time
System performance thresholds
Customer drop rate
Network latency percentages
End user experience metrics
Table 1.1: With BSM, information about a system and the services that reside on that system is broken down
into elements useful to its stakeholder.
What Is a Business Service?

Before we can go too far into this explanation of the information needed by each stakeholder in
the business, we need to talk a little about the definition of a business service. Obviously, not all
services and applications on the network are those that can be categorized as front-line services
or even services that feed front-line services.
Part of the identification of network services requires building a catalog of services. This
creation process identifies the network-based services and applications provided by your
business that in some way impact its profits and losses. Those services can be externally facing
ones to your customer base. They can be external B2B services. They can be internal services
used only by internal employees in the operation and processing of customer accounts. Or, they
can be internal services used for purely internal purposes but whose outage could cause an
inability to complete daily activities in the line of business.
Chapter 1
For the purposes of BSM, the identification of a business service at its most basic form is one
whose operation can be quantified in terms of dollars and cents. If a service can be measured by
some amount of cash that moves during its processingand therefore is missing when it fails to
be processedthen it becomes a good candidate for a business service.
Where the complexity arrives in defining such services comes in finding the lines of demarcation
between individual services. This process of breaking down a business service into its disparate
components is the next step in the BSM process and is really the most critical activity. Some
sample questions to consider:
Do we consider our order processing system a single business service? Or can we break
down the system into an order entry service, an order processing service, and an order
notification service?
Is our external Web site a business service? Does it provide quantifiable cash flow?
Is our internal Web site a business service? Will the loss of it impact our ability to
complete the daily flow of business, and if so by how much?
Is the functioning of our internal Windows domain a business service? What functions
rely on its faithful operation?
Which of our network devices and applications are most critical to operations and which
power only tangential operations?
You can see here that breaking down these services into ever smaller and smaller subcomponents
can be a daunting task. But it is the interrelation of these interconnected service subcomponents
that eventually builds what BSM calls a service model. If each business service subcomponent is
akin to a city on a map, the service model is the complete map including all the roads that
connect those cities.
BSMs service model lies at the core of its processing power. It is within the BSM service model
that dependencies between services are described and where individual service subcomponents
are logically interconnected.
Figure 1.4: In BSMs service model, services and service subcomponents are atomized and interconnected to
show dependencies. Upper-level services rely on lower-level services for functionality. The Quality of Service
(QoS) of lower-level services drives the QoS for those above them.
Chapter 1
Figure 1.4 shows an example of six business services. Each of these business services has a
quantifiable point of demarcation. Services at the bottom provide data processing of some form
useful to services that lie above them. Services above others below them require those
subordinate services to function properly.
The preceding model diagram shows a single core service. Generically, that service could be The
Company eCommerce Site. This business service is the ultimate endpoint in which the business
interacts with the customer, or essentially that portion of the internal network that the customer
sees. But that service relies upon a set of dependencies to function properly. Perhaps the two
second-level services are The Ordering Database and The Customer Database. Each of these, in
turn, relies on other subordinate services. These third-level services could be abstractions for real
network constructs such as The External Network, The Customer Authentication System, and The
Data Encryption System.
What is critical in determining the points of demarcation between such services is that they are
not necessarily aligned to network objects or individual applications. We do not define our
service model as The Network Switch that connects to The Network Database Server that itself
feeds The eCommerce Server Cluster. Instead, BSM requires that for most services, you add a
level of abstraction between the physical network object or application and the business
processes that it enables.
Figure 1.5: Filling in the blanks from Figure 1.4, you see how business services interrelate.
Example Business Services

All this said, to help clarify the role of the business service in building the BSM service model,
lets take a look at some example business services. Well continue with the example that started
off the chapter. In the example for the global glass company, the service that died overnight was
a low-level service residing on a forgotten server in the Denver data center. But the service
provided by the computer that faulted tied into a much greater system unbeknownst to the IT
organization. The total quality of FCGs mission-critical B2B Web system was reduced by the
loss of this low-level system, without registering sufficiently in the IT groups management
system. Lets flesh out the diagram from Figure 1.1 and remove the icons that show individual
systems to get a better picture of how example business services might interconnect to feed a
B2B Web system.
Chapter 1
Mission-Critical
B2B Web System
Customer Account
Auth. System
Inventory
Processing System
Order Processing
System
Customer Account
Database
Inventory
Database
Credit Card
Auth System
External Credit
Service Proxy
B2B Extranet
Credit Card
Extranet
Figure 1.6: A more detailed service model outlines the discrete functions of FCGs mission-critical B2B Web
systems and the dependencies between them.
Figure 1.6 shows the example business system broken down into its various logical components.
Each of these components resides on one or more physical devices within one or more data
centers. But, more importantly, each of these components is critical in some measurement of the
operation of the B2B Web system.
Immediately related to the core system itself are its Customer Account Authorization System, the
Inventory Processing System, and the Order Processing System. The Customer Account
Authorization System is used to store customer credentials using its dependent Customer Account
Database. It also allows for the authentication of customers using the B2B Web site. Its related
database stores account personalization information as well as logins and passwords.
The Inventory Processing System is used to manage the workflow associated with recognizing
inventory levels and acting on the information it gets from its dependency, the Inventory
Database. The Inventory Processing System also updates customer personalization information
within the Customer Account Database to note previous orders and to suggest potential future
orders.
Chapter 1
That same Inventory Database also works with the Order Processing System. The Order
Processing Systems responsibility is to ensure that a customer transaction is completed and
logged correctly when requesting a unit of inventory. Because all orders in the example system
are paid for via credit card, the Order Processing System depends on the Credit Card
Authentication System to process credit card information. An External Credit Card Proxy is used
to complete those transactions. Two separate networks are relied upon for the functionality of the
entire system. Those are the B2B Extranet and the separate Credit Card Extranet that is FCGs
connection to the credit card provider. You should immediately see the complexities involved in
deconstructing what seems a simple business service into its component elements.
Chapter 4 on Implementing BSM discusses this complex process in more detail.
Managing Business Services

As you can see in the preceding example, the core service relies upon numerous services to
function. This is one of the major tenets of BSM. From the customers perspective, they see only
one of three possible outcomes when they attempt to connect to the B2B Web server:
Acceptable ServiceThe customer logs into the system, receives a successful logon
within an acceptable amount of time. They navigate through the system to find the items
of interest, also within an acceptable period of time. Once ready to complete their
transaction, the purchase is completed using a payment method of their choosing that
responds quickly and without exposing error, delay, or security compromise to the
customer.
Unacceptable ServiceThroughout any period during the customers shopping and/or

browsing experience, they may experience a noticeable reduction in performance. That
can be considered by them to be unacceptable service quality. When this occurs to the
point where the customer feels the service cannot handle their requests, they may stop
using the service.
Unavailable ServiceThe customer may be able to successfully login, receive a

successful authentication, and begin shopping for items. But because multiple subsystems
must be present and functioning to complete the transaction, they may not be able to
successfully complete their entire desired taskin this case, finding the items of interest
and purchasing them. Or they may be unable to login or even browse the Web site owing
to a component failure.
With BSM, the monitoring and notification tools present in the suite need to provide information
to both IT and business leaders that actively validates the state of the service in real-time. Is the
B2B Web site currently in State 1 or has it degraded to State 2 or even State 3? And, if the
service has degraded to an unacceptable state, what impact in whole dollars is experienced by the
company per period of time?
Another of BSMs key benefits is its ability to better troubleshoot service degradation as it occurs.
When the service model is built with a high level of granularity and end user experience metrics are
configured into the system, BSM provides an excellent mechanism for drilling down into specific
problem sets. As 80 percent of troubleshooting is often just finding where the problem lies, this
feature speeds problem resolution. Chapter 5 will discuss the benefits and key components for end
user experience monitoring.
10
Chapter 1
To help you further understand the role of BSM in defining these states and the notifications that
occur when state changes happen, lets look at Gartners further definition of BSM and what is
needed for a software package to qualify as a BSM tool:
To qualify for the BSM category, a product must support the definition, storage, and
visualization of IT service topology or dependency maps. It must gather real-time
operational status data from underlying applications and IT infrastructure components.
And it must process status data against the object model to communicate real-time IT
service status.
Thus, managing business services means ingesting real-time status data from the physical
systems that make up business services and translating that into the abstracted service model.
That data can come from any number of placessystem-based, application-based, or even codebased, such as via Java, SAP, or CMDB API interfaces.
Once the data arrives, it is the job of the BSM system to apply predefined logic to that data to
determine the quality of each system. All this information is pushed in real-time to the
communication mechanisms (alert notifications, dashboards and reports) defined by the
administrator. Ill talk about those mechanisms in a minute.
I havent yet talked about the underlying applications and IT infrastructure components that a BSM
system relies upon for its monitoring data. Chapter 5 will do so and Chapter 8 will provide another
discussion in greater detail. However, it is worth mentioning that a BSM tool need not be the tool that
creates monitoring data. A BSM tool need only be capable of ingesting monitoring data and acting on
that data using the notification concepts that make up the BSM framework. Furthermore, BSM is not
intended to be a service catalog itself. Nor is it alone business activity monitoring or a business
process automation tool.
Before leaving the topic of managing business services, it is important to take a quick look at
what business services are not. Because the focus is so heavily on abstractions of physical
constructs into business processes and the health of those business processes, mirroring business
services to already defined business processes is an effective mechanism for encapsulating them.
11
Chapter 1
External B2B
Web Cluster
Kerberos Auth.
System
Java-based
Inventory System
ERP System
LDAP Database
Oracle Database
3rd Party Credit

Proxy System
B2B Extranet
Router
Credit Card
Extranet Router
Figure 1.7: Although BSM can incorporate elements of physical infrastructure into the service model, it is not
intended to be an IT-centric view of the overall system. This image is therefore an incorrect abstraction for
the example B2B service.
Conversely, the incorrect way to abstract the business service is via a purely IT-centric or devicecentric approach. Doing service modeling in this way is no different from standard IT service
management. It serves only to provide the viewer with a device-centric view of the health of the
business service and complicates efforts to understand how the service impacts customer
satisfaction.
Chapter 4 will talk in greater detail about this process of building the service model.
12
Chapter 1
Dashboards and Service Visibility

Once the service model is developed, you need some mechanism for reporting status and
experience information back to interested stakeholders in real time. BSMs implementation of
this status and experience data is through dashboards. These dashboards are customized for the
interests of their viewers. In other words, the same monitoring data that informs the IT
department about an outage in a minor IT system must be refactored to be useful to a business
executive.
Taking the explanation one step further, lets look at what data is of interest to each party. The IT
department is interested in the name of the server that went down. Even better for them is when
the monitoring system can inform IT why that server went down and the event log error that
immediately preceded the outage. This information is consumed by IT in its activity to restore
the server to health and aids the troubleshooting process involved in pursuing that activity.
For a business executive, knowing the name of the system that went down or the event log error
that preceded the system crash is irrelevant. The business executive is not likely to care about
any of this. What is of concern, however, is the impact that a servers outage has on the ultimate
profitability of the business as a whole and the businesss ability to serve its customers.
One of BSMs central tenants involves the digestibility of the information provided. If the BSM
system can provide information related only to downed server names and log data surrounding
an event, an executive is not likely to pay attention to that system. If the system can provide
digestible information to the executivethe name of the business service affected, the number of
whole dollars associated with the reduction in QoS, and how that services outage affects other
servicesthe executive will be empowered to make educated decisions.
The tenet of digestibility for visualization tools is crucial. Dashboards must be customized for each
class of user. Its worth saying that dashboards are not purely an internal function. With BSM, the
creation of dashboards for external entities is valuable as well. Picture the last time your customerfacing Web site experienced an outage. Did your customers see a cryptic HTTP 404 error or did they
see real-time status information about the problem and anticipated time-to-fix?
Both Chapter 6 and Chapter 7 will discuss this topic in greater detail.
13
Chapter 1
Elements of BSM
This guide is broken into 10 chapters, each of which will discuss one facet of BSM. Intended to
function as interdependent building blocks, each chapter draws on its predecessors to flesh out
the BSM picture. As you can see in Figure 1.8, these building blocks start with a description of
the as-is situation in most IT cultures. Well sidestep into BSMs value proposition related to
many networks existing deficiencies and move through the implementation activity with a side
conversation on the experience of the end user. The guide will then branch into three ways to add
management, operational, and IT value, and conclude with a discussion linking BSM with other
management frameworks such as ITIL and Six Sigma.
Figure 1.8: The chapters of this guide incorporate building blocks to guide the conversation towards a full
understanding of BSM as a viable and effective solution.
Alignment of IT and the Business

The relationship between IT and the business continues to improve. But as with personal
relationships, there are always some differences in the language that both parties use. Techies are
genuinely interested in the underlying technology that drives an infrastructure. This can be
skewed or even inverted from the underlying needs of the business from that same network.
Chapter 2 will talk about the alignment between the priorities of IT and the business and how the
priorities of IT sometimes dont match those of the business.
When misalignment is present and IT doesnt understand the true needs of the business, an end
result is that they can seek out solutions unrelated to the problems true criticality. Conversely,
when they are aligned, IT is better able to understand where and how they should apply their
efforts. This goes the same for purchasing decisions. With BSM, business services are up on the
board much more overtly, which ends in better recognition of where limited funding should be
applied. Chapter 2 will discuss how BSM can be a motivating force in the maturation of IT.
The Evolution of IT Service Management
Once you begin that process of identifying alignment issues, you can work towards a resolution
by identifying a framework for that effort. Chapter 3 will discuss views on the state and future of
IT organizations. It will talk about the evolution of service management, from its roots in server
performance through predictive analysis and end user experience monitoring and on up to BSM.
With this understanding of the evolutionary process, the chapter enters into a discussion about
how it best enhances the total operational posture of the network and the services that lie on top
of it. It will conclude with examples of service management at each point in the evolutionary
process and link to service level expectations and reporting needs.
14
Chapter 1
Implementing BSM
Closing out the introductory material on the status of IT and its need for mature tools like BSM,
Chapter 4 will begin the process of explaining the design, installation, and configuration tasks
required to stand up a BSM instance in your environment.
The chapter will discuss the seven steps of a BSM implementation, starting with design tasks all
the way through implementation and constant improvement phases:
We begin with the Preparation Phase where project plans are outlined and project teams
are identified.
In the Selection Phase, you assess critical and measurable business services using the
criteria discussed earlier in the chapter and analyze each services cost to the business.
The Definition Phase takes the input from the Selection phase and makes key decisions
on which services to bring under management immediately, which to delay, and which to
remove from the project scope.
The Modeling Phase begins the process of data collection. Here, you tie identified
services into existing or new monitoring tools for data gathering and begin the process of
building the service model.
Once the initial service model is created and data gathering is complete, you continue
into the Measurement Phase. This phase involves itself with the measurement of services
over time, identification of gaps in monitoring, and validation of costing assumptions.
The Data Analysis Phase ingests the data gathered in the Measurement phase and
completes more rigorous analysis on that data to begin building fault and impact analysis
models.
Once the IT organization understands the complexities of the business services as

identified by data analysis, the Improvement Phase helps to determine where remediation
actions can improve the environment. This phase has the potential for excellent return on
the BSM system by identifying and ultimately fixing performance or other issues that are
causing service quality reductions.
Lastly, you implement the key Reporting Phase, where dashboards and other
visualization tools are implemented for key stakeholders to use.
15
Chapter 1
End User Experience Monitoring

One of the major components of BSM is its ability to enhance the administrators view into the
experience of the user. Administrators and business leaders alike rarely have the time to
manually check and verify the functionality and palatability of customer-facing business
systems. Thus, automation components that can constantly repeat this verification and alert when
out-of-bounds conditions occur is critical to ensuring the QoS and maintaining customer
satisfaction.
Chapter 5 will discuss the needs and the enabling technologies that provide for end user
experience monitoring on systems that impact the companys bottom line. The chapter will talk
about how BSM can link with Web front-ends, packaged applications, thin clients, middleware,
and even databases to find problems deep down in the software code. Well discuss the link
between end user experience monitoring and BSM and show how experience monitoring directly
affects the ability of the business to service its customers well.
Achieving Management Value
Chapter 6, the first of three chapters about getting value out of a BSM solution, will discuss the
components associated with management. Not only management in terms of managers and the
value proposition enabled through their added vision into highly technical environments but also
the management control that such visibility empowers them to use. Too often in IT, the technical
people are incapable or unwilling to communicate with management on the true status of their
systems.
The chapter will discuss how management can enable such a check-and-balance using BSM. It
will also discuss how BSM, and specifically the visualization components of a fully realized
BSM infrastructure, can help right-size contracts for outsourcing, service providers, and internal
IT organizations within the scope of their SLAs. Through the use of quantitative visualization,
SLA contracts can be positively verified. In cases in which contracts are tightly bound by those
SLAs, this ability to prove whats right enhances negotiation proceedings.
Achieving Operational Value
Chapter 7 offers a conversation on the operational benefits of a BSM system. Through a welldesigned BSM system, a business can reduce operational expenditures through better planning
and forecasting vision. BSM can also serve as a management umbrella under which unified
controls such as management tools, notification, automated and partially automated remediation
tools, scripting, and reporting engines can reside.
The chapter talks about how BSM enhances the operational visibility of services including
enhanced situational awareness, better outage planning within global operations due to business
calendar processing, and problem resolution prioritization based on customer need. The chapter
will go into great detail about the processes and best practices associated with building effective
dashboards. Well explore the key needs for business management, for IT, and for customers.
Each of those groups of people has different needs and different data of interest, so designing
dashboards to meet their needs is one of the most critical components in any best-in-class BSM
infrastructure. The chapter will conclude with a review of sample dashboard elements that bring
graphical impact when visualizing monitoring data.
16
Chapter 1
Achieving IT Value
No conversation about the value proposition of an IT system is complete without discussing how
that system provides value to IT itself. Chapter 8 does just that. ITs needs for management and
monitoring are well established. However, BSM provides heretofore unrecognized additional
value through its unique way of looking at data. The chapter will discuss the business, service
desk, configuration, response time, and infrastructure metrics data available to IT within a fully
realized BSM implementation.
It will then dig deep into the IT technology that BSM implementers must understand to link the
BSM system into other systems on the network. Well explore management protocols such as
SNMP, WMI, WS-Management, enterprise messaging, and the Java messaging service and how
these tools are necessary for BSM to link into system data. The chapter will be complete with a
review of the data collection capabilities of a best-in-class BSM system and how these external
data sources connect to BSM.
ITIL and Six Sigma
BSM is a top-down, phased approach that first considers whats most critical to the business. Its
framework for deployment is based on industry-standard practices. Two of these industry
practices, ITIL and Six Sigma, compliment BSM to provide tangible return on investment.
Combining ITIL, Six Sigma and BSM provides rich capabilities for continual quality
improvement with a focus on the business.
Chapter 9 will discuss the ties that connect ITIL and Six Sigma with BSM. It will talk about the
practices and how they interrelate and how a business can use built-in BSM tools to populate Six
Sigma thought-driving and planning discussions. The chapter will also discuss ITIL and Six
Sigma best-practice metrics that are importable into a BSM infrastructure to immediately gain
the benefits of these complementary ideas.
Important Definitions
The next nine chapters will begin the process of educating you on the needs, processes, and
benefits involved in building BSM into your network. But before concluding this chapters
review on high-level topics, lets take a few minutes to discuss important key concepts that
youll encounter again and again throughout this guide. This section introduces concepts specific
to BSM and BSM implementations that will help you understand the necessary underlying
technology and processes associated with BSM.
Business Impact Management
Business Impact Management (BIM) is the idea of network management that monitors the status
of IT devices but not necessarily from a device-centric approach. BIM tools track QoS across
multiple devices but report on a service as a single entity that relies upon those devices.
Where BIM tools differ from traditional management and monitoring tools is in correlating
performance and event data across multiple IT facets for a roll-up view on business system
health. As an example, a traditional management tool may be able to notify administrators when
the network is slow or inoperable. But a BIM tool can wrap this performance shortfall
information with data from the application itself to get a holistic view of the entire system
performance.
17
Chapter 1
Service Level Management
Service Level Management is an ITIL construct that defines the process of constructing,
adjudicating with stakeholders, implementing, and documenting an agreed-upon level of service
for a particular IT system or subsystem as well as the management of the customer relationship.
The following list highlights examples of Service Level Management:
Service Level Management can occur between an IT organization and the business to
outline the specific and quantitative expectations of service quality to be provided for by
IT.
It can occur between the business and its customers, contractually outlining expectations
for service levels from the business to its hosted customers.
It can be contracted between a business and its resource providers. This might seek to
ensure that the business obtains the QoS it requires to provide services to its customers in
turn. It can also provide a basis for contractual remediation when the business does not
receive the contracted level of service.
Should include OLAs .
Penalties avoidance (for providers) and customer satisfaction are factors as to why
organizations have SLM in place.
Service Level Management typically deals an organizations service catalog and performance
metrics associated with those services.
Real-Time Service Visualization
A proper definition of Real-Time Service Visualization requires the term to be broken down into
its two halves and defined piecemeal:
Service Visualization is the idea of providing a graphical abstraction of a business service

and the quality associated with that business service. Service Visualization is used to
encapsulate the concept of the service itself into a single-picture view that explains the
service, its current operation, and any issues associated with that service.
Real Time means simply that the data involved with a system is not snapshot-based but is
instead abstracted to relevant visualization tools as it arrives into the system. Real time is
best contrasted with traditional report-based data, which arrives to the consumer after
collection and preparation.
Real-Time Service Visualization is the idea of providing a graphical abstraction of a business

service and the health and quality of that service. Key to the generation of that data is that the
information provided to its consumer within the abstraction is an instantaneous representation of
momentary status.
18
Chapter 1
Operational Metrics
Operational Metrics are those metrics used to represent the day-to-day health and quality of a
particular business service. Operational Metrics are typically measurements of status and
performance over time based on the behavior of a particular business system. These metrics are
concerned with the availability of a business system, its throughput and observed performance,
and its response time. Operational Metrics are used most often to understand the technical
quality of a system.
Service/Asset Metrics
Service and Asset Metrics are those used to identify, inventory, and generally understand the
physical characteristics of a particular service or asset. These metrics can be used to understand
the characteristics and effectiveness of individual services or assets and potentially drive
decisions as to their utility, efficacy, necessity, and reusability.
Business Metrics
Business Metrics are those that relate an item, a process, or an activitys function and processing
to how it impacts the financial position of the business. For items, business metrics can relate to
its age, its utility, and various elements of financial return on the item. For processes and
activities, this can relate to the efficacy of the process to produce value and/or the quantification
of any value provided by the process.
Executive Views
Executive Views are constructs within dashboard views that are specifically tailored for
consumption by non-technical business leaders. Executive views are critical components in a
mature BSM solution because they empower executives with the knowledge they need to
validate the health and quality of a business system. The BSM tenet of digestibility emphasizes
the ability for executives to understand, or digest, the information contained within their
visualization tool.
Fault Trees
A Fault Tree is a visualization tool used in a Fault Tree Analysis. In these diagrams, an undesired
effect is listed as the root of a logic tree. Each potential situation that could add cause to that
undesired effect is listed on the tree as branches towards its root. Subsequent situations that add
cause to upward-level causes are connected below cause items. Fault Trees are useful in the
identification of root cause for a particular problem and help with the visualization of the current
and future potential situations to identify and track affecting problems in a system.
19
Chapter 1
Impact Trees
Impact Trees are used as a visualization tool in identifying what connected systems could be
impacted by a fault within a particular system or system subcomponent. The element at the
bottom of the tree is typically the faulted item and all objects connected upwards from that item
are recognized to be in a faulted or partially faulted state.
One of the added benefits associated with the creation of the service model is the built-in
knowledge of how services impact each other. Thus, an Impact Tree can be created easily by
utilizing the service models interconnections.
Business Calendar
When an organization expands to global operations, that organization inherits the intrinsic time
skew that occurs across numerous and far-flung time zones. Because of this time skew, the time
frames for activity on network devices and applications change drastically. Because employees
or customers may reside in significantly separated time zones, activities on the network can
impact different geographical regions at different times of day.
The Business Calendar defines not only the operational periods of a service, but also takes into
account scheduled downtimes, as well as the importance of the various schedule periods, such as
peak, off peak, etc. The business calendar is time-zone aware, so truly global services can be
modeled and supported. The business calendar functionality also can automatically work out the
calendars of the supporting infrastructure from the business systems.
Process Integration
Process Integration encapsulates the idea of combining the processes from two separate entities
into a single, cohesive business activity. Process integration between disparate elements of a
system or disparate systems can involve the integration of the individual actions or code of those
systems. Across multiple business partners or between partner and customer, process integration
can involve data manipulation and activity manipulation to ensure that the outward data flows
from one organization correctly meet with the inward data flow of another. Use of industry
standardized processes helps to alleviate the cost associated with integration as both
organizations or system elements will utilize equal or equivalent mechanisms for ingest,
processing, and output of process data.
Workflow
Workflow is the sequence of steps necessary to complete an action while following the business
and technical rules of the acting organization. Workflow for a particular process can entail the
positioning of data, its processing, approval for that processing, the completion of tasks
associated with the data, and the logging of the activitys completion as well as other steps in the
process.
Workflow includes the processes intended to guide data from its creation, through its use and
storage, and until its destruction. Integrating workflow rules with BSM means that elements
brought to operator attention can be adjudicated according to predefined rules and stored for later
referral.
20
Chapter 1
Six Sigma
Six Sigma provides a quantitative methodology of continuous process improvement and
reducing costs, by reducing the amount of variation in process outcomes to a level suitable for
the given organization. It pursues data-driven, fact-based decision-making in which decisions are
tied to corporate objectives. It uses an implementation of measurement-based strategy that
focuses on process improvement and variation reduction (Source: Six Sigma for IT
Management, Sven den Boer et al, June 2006, Page 15).
ITIL
ITIL is a framework of best practices that can be used to assist organizations in developing their
IT Service Management process-driven approaches. ITIL recognizes five principal elements that
give guidance on the provisioning of quality IT services and the processes and facilities needed
to support them: Service Strategy, Service Design, Service Transition, Service Operation, and
Continual Service Improvement.
BSM Empowers Decision Makers

This chapters introduction of the concepts of BSM is intended to drive the idea that BSM is an
enabler for decision makers. By laying BSM atop an existing network, leveraging existing
monitoring and management frameworks, and using a standardized implementation process,
business leaders and other non-technical decision makers become empowered to better
understand their network environment. That empowerment assists them in better aligning IT
goals with those of the core business.
BSMs real-time and historical visualization components improve decision making and aid in
forecasting for future purchase decisions. It speeds the troubleshooting process by quickly
identifying root causes. And, most importantly, it serves to expand the vision of all involved to
ensure that all-important customer satisfaction is kept at the highest levels possible.
21
Chapter 2
Chapter 2: The Alignment of IT and Business

Its 8:45am, two weeks after the outage of our minor IT system at First Class Glass Denver
data center and IT Manager John Brown is finishing up his monthly metrics presentation to
FCGs business leaders. Amped on too much coffee and not enough sleep, John reminds himself
why he hates these end-of-the-month presentations.
Therere just too many manual steps in putting together these reports, he thinks to himself for
what seems the hundredth time since late last night. Every month, the FCG executive leadership
wants all this statistical and trend line data so that they can see that our systems remain as
operational as they always have, he thinks, Yea, I know we sometimes run out of disk space on
some of the servers, and rarely our critical applications may slow down or go down for a short
period of time. But weve thrown so much money at this application and network, adding
clustering and redundancy, disaster recovery, and everything else that we rarely have major
problems anymore.
What I really wish we had was a way to pull these metrics automatically. Then, once a month my
senior staff and I wouldnt have to spend the night here pulling together statistics for this
marginally-useful 30-minute presentation.
John adds the finishing touches to his PowerPoint slide deck, entering in some last-minute
figures he received from his Help desk manager on work order closing metrics. He saves the
completed file to the network and heads off to the boardroom for his presentation.
If youve ever pulled together these kinds of metrics for your companys monthly executive
update meeting, youve probably had the same series of thoughts at the 11th hour. Figuring out
what kinds of metrics the leadership wants to know is only part of the trouble. Compiling those
metrics from dozens of separate and un-automated systems can have the effect of shutting down
the department for a day as people scramble to gather the end-of-the-month statistics. Often, the
presentation goes by with nary a question when senior leadership doesnt understand the
statistics youre providing them.
Contrary to what the IT people in the trenches often believe, this data is critical to the smooth
and continued operations of the business network. Executive leadership needs these statistics so
that they can prove to themselves that the money theyve thrown at the network is actually
providing value and measurable return.
Business leaders by nature have to look at the world through statistics, trend lines, and return on
investment (ROI). IT by nature tends to look at the world in terms of technology. Merging these two
schools of thought into a common languageor at least a common interfaceis one of the tenants of
Business Service Management (BSM).
22
Chapter 2
Where the difficulty often presents itself in these situations is in translating what is important to
IT into information that is digestible to executive leadership. If business leaders cant understand
the kind of data theyre being presented at the monthly metrics meeting, they cant make good
decisions on what to do with that data. This chapter talks about the dissonances between what IT
believes is important and what the business leaders want to see.
When these two groups speak the same language and share the same priorities, we say that they
have achieved alignment. This chapter will discuss the alignment issue as well as common
failures in alignment. Well talk about why misalignment occurs and what IT can do to develop
itself both culturally and technologically to resolve the problem. Throughout the discussion,
well incorporate what weve learned so far from Chapter 1 to show how the implementation of
BSM into the operating environment helps enable the alignment of IT and the business.
The Chasm Between IT and the Business

Remember our minor IT outage from Chapter 1? It was not considered all that problematic an
event by the IT department. The team of server administrators resolved the issue within a
reasonable amount of time, and the system was listed as Tier III, putting it in the same category
as low-priority backup and systems management systems. But that system wasnt as low priority
as the IT department thought, causing a slight outage to a major system but incurring a large cost
to the company. Lets return back to the story to see how our monthly metrics meeting plays out:
and as you can see in this slide, IT met their Service Level Agreement (SLA) for this month,
showing no appreciable network outages for the month period.
John, can I stop you there, interrupts Dan Bishop, First Class Glass COO, why does this
report not include the problem we had with the B2B site back two weeks ago? That outage has
been keeping our accounting people here late nights for the last two weeks cleaning up the mess.
The cost in overtime alone is killing our Q3 budgetary numbers.
John nervously continues, Well sir, that system is categorized as a Tier III Priority system. Our
SLA for Tier III systems states that we have eight hours to get these systems back online, which
we did. In any case, that system is only a minor player in the B2B system anyway. And, the
problem happened at 2:15a, when many of our business customers werent even awake.
What you should see in this example is a conversation you may have had before in your
business. Both parties are perfectly within their rights to believe theyre correct. IT brought the
system back online within its agreed-upon timeframe, but the business has recognized a higherthan-expected cost. When it comes to the total impact, what we have here is an example of the
chasm between IT and the business.
23
Chapter 2
Responsibilities and Priorities

In many businesses, the IT department is run by the techies. Purchase decisions are based on
easing the management responsibilities of those in charge of server care and feeding. Description
of service outages are constrained by the devices taking part in the outage. Those individual
devicesdatabase, network, application servers, infrastructure serversand the services hosted
by those devices are reported up the chain in the parlance of the responsible engineer for the
service. Lets start our analysis by taking a look at the responsibility matrix for a typical business
network.
CEO
CIO
Overall Business Strategy

Daily Operations
Database Manager
Network Manager
Database
Administrator
Network
Administrator
UNIX Server
Administrator
IT Director
Server Team
Manager
Applications
Manager
Field Tech
Manager
Help Desk
Manager
Applications
Administrator
Field Technician
Help Desk
Employee
Windows Server
Administrator
Figure 2.1: An example org chart for an IT organization. Those individuals above the dotted line are typically
responsible for the overall business strategy, while those below the dotted line typically deal with daily
operational issues.
Figure 2.1 shows a typical organizational chart for an IT department. This chart shows the IT
Director reporting to a CIO, who ultimately reports to the CEO. The IT Directors direct reports
are the managers responsible for their portion of the network. Co-equals at this fourth tier in the
organization are each of the managers who lead their team of administrators. Those
administrators are typically identified as responsible engineers for specific portions of the
network: Bill the Windows Administrator is ultimately responsible for the functionality of
Windows Active Directory (AD). Jane the Network Administrator specifically manages the
network gear in the companys DMZ to the Internet. Bob in Applications really only manages
the B2B sales application.
24
Chapter 2
Also shown in Figure 2.1 is a dotted line representing the line of demarcation between those
individuals typically responsible for overall business strategy and those that handle daily
operations. Notice the bottom-heavy nature of the org chart in relation to that dotted line. Due to
the high positioning of the dotted line, it is here where we see the biggest chasm between the
goals of IT and those of the business.
Unlike individuals in sales and marketing who work with business-level goals as part of their daily
operational tasking, individuals in IT are often insulated from business decisions. The summation of
this insulation, along with the vocabulary created and used by IT, is a large contributor to our chasm.
Why this chasm? Principally, due to an individuals scope of responsibility. As an individuals

position in the organization gets further and further away from the money-making decisions in a
business, their prioritiesboth formal and intrinsicgrow less aligned with the overall goals of
business. Adding to that situation is the fact that the responsible engineer for systems monitoring
is often an administrator in the Applications or the Network group.
The applications or network administrator is given the task of installing, configuring, and
maintaining the systems monitoring tool for the network. He or she is a techie. Thus, the
decisions they make in terms of what to monitor and when to alert are based on their priorities
and experience as a techie. Their fifth-tier knowledge of the overall business strategy inhibits
them from enabling notifications that align with the needs of the business. Lets deconstruct our
discussion on alignment a little further and talk about how the culture of the IT organization as a
whole affects its alignment with the business.
Early IT
In the beginning, there were just a few computers. Then some very intelligent people figured out
ways to connect those computers. Since then, those computers interconnectedness has had a
tendency to beget ever more computers. Early IT organizations tend to react to the business need
for additional services and the hardware to host those services rather than plan for and expect
their arrival.
This reaction to the need for service expansion happens due to the reactive nature of Early IT
organizations. Later, this chapter will discuss the IT Maturity Model in great detail; but for this
section, it is important to note that Early IT organizations tend to operate more or less
independently from the rest of the business.
Early IT Goals
Business Goals
Availability
Profitability
Managing change
Managing risk
Supporting existing infrastructure
Expanding the business
Break-fix
Customer Service
Table 2.1: The goals of Early IT are often the least aligned with those of the business. As the business
attempts to grow itself, IT finds itself struggling to manage the existing infrastructure.
25
Chapter 2
This independence predominantly occurs because early networks can be quickly thrown together
to support the needs of the burgeoning business. The details of redundancy, service resiliency,
and manageability are swept to the side as the priority is to simply get the service operational.
The initial network hypergrowth combined with the hypergrowth of the early business often
means plenty of firefighting for IT.
Its worth stating here that firefighting and otherwise reactive modes associated with Early IT should
not be considered a black mark on IT itself. Reactive-mode IT is a necessary evil of any new
business endeavor. This firefighting is more an indication of the sheer magnitude of effort necessary
to build and manage a modern business network.
Users Become Customers

As the business matures, the network tends to mature with it. The initial network that was
quickly thrown together eventually gets a much-needed facelift with an eye towards resiliency
and single point of failure elimination. At some point through this period, the IT department
begins to see improved mean-time between failures for services. This reduction in firefighting
for core services earns IT the ability to stand back from the network and begin looking at the
network as an integrated unit rather than the summation of its individual parts.
The change in mentality associated with greater building and less fixing also tends to drive a
change in how IT sees the employees who use the network. At some point, the IT department
begins to see the users of its network less as the people causing the problems and more as the
people using the services. The negative connotation of users disappears, and the network
residents become in effect the customers of IT. It is this mindset change that signals the
movement of IT away from its initial firefighting days.
Proactive IT
Once that change begins, the move to Proactive IT occurs rather rapidly. Proactive IT
incorporates solid change control into network configurations and incorporates necessary
systems documentation into every task. Most importantly, those change control mechanisms are
agreed upon by all members of IT, are followed, and align with the needs of the business.
Be wary of moving to formalized change control too early. Formalized change control rapidly matures
IT, but at the same time tends to reduce its agility. If the business is still in a startup mode, formalized
change control can inhibit that business agility as well.
IT that begins to operate proactively begins to see the forest for the trees. They begin
understanding the metrics necessary to identify the health of the network. They start to recognize
the natural cycle of IT purchasing and plan more carefully for those purchases. And they begin to
recognize their role in the greater business, providing calculated support for the necessary
business services as theyre needed. The best Proactive IT teams align seamlessly with the
business and its goals.
26
Chapter 2
Figure 2.2: As IT grows, key indicators as to its maturity become apparent.
Lets take a look now at a more formalized model of IT Maturity, presented by Gartner. This
model expands upon what weve discussed in this section to talk about the key indicators
associated with the maturity level of an IT organization. The intent here is to highlight how the
organizations maturity parallels with its alignment to business. In all of this, well discuss how
the tenants of BSM are a catalyst for solidifying that alignment.
Alignment Inhibitors
No conversation on alignment is complete without a discussion of the behaviors that tend to
prevent that alignment from occurring. Weve already discussed alignment inhibitors throughout
the earlier text, but for completeness, lets discuss each in turn. As we consider the roadblocks to
getting IT and the business on the same page, think about where these elements are present in
your organization. Is your IT team a cohesive part of the business strategy or do they sit in their
own part of the building walled off from the rest of the employee base? Among the following
items, alignment between IT and the business principally means that IT and the business know
each other and say hi in the hallways as they pass by.
No Common Dialog
Our conversation in the hallway analogy rings true perhaps most specifically in terms of
vocabulary. If IT and the business are utterly incapable of communicating with a common
dialog, IT will forever be relegated to the left half of our maturity curve. Furthermore, when IT
and the business are incapable of talking, the business itself suffers. Others who figure out the
role of their own IT infrastructures dont get left behind in terms of competitive advantage.
Two things must happen for the common dialog to occur. First, IT needs to figure out
mechanisms for reducing the technical complexity in their communication. Like any good
college speech communication class, IT must learn to tune the conversation to the listener. From
a metrics approach, the understanding of how finance is realized in each business process is a
key component. For a BSM implementation, this component is critical as one of the major steps
in BSM is the identification of granular business processes and the assignment of dollar values to
each. Thus, it can be argued that the common dialog is really the first step in incorporating BSM.
The second element that must occurthough business leaders may not agreeis the need for
them to understand IT. If the business strictly sees IT as a cost center full of techies, alignment
can never occur. Business leaders need to see the value in IT-branded metrics as well. It is as
difficult for IT to derive service quality metrics based solely on business metrics as it is solely on
technology-based ones. The commonality of quantification goes both ways.
27
Chapter 2
Mismatched Expectations
As was explained in the chapter example, the availability expectations of FCGs IT department
didnt match what really happened. Knowing of a service outage and responding to it needs to be
augmented with the knowledge of how that outage affects the business. This is the central tenant
of BSM. Once each service is granularized and analyzed with an eye towards business impact,
those expectations begin to align. Interestingly enough, though the lack of a common dialog is
arguably the biggest inhibitor to alignment, the impact of mismatched expectations often causes
the most impact to the business.
Technology-Focused Metrics
The idea of mismatched expectations feeds directly into the issues surrounding technologyfocused metrics. When metrics are created as technology-centric, they lose the total realization of
the customers experience with the environment.
Consider our example critical B2B system. FCGs technology-focused metrics identified only a
very small outage to the environment, although that very small outage actually causedfrom the
users perspectivea huge problem. The loss of a single system may sound inconsequential on
paper, but the total impact to the business is large.
Only by flipping the metrics 180 degrees can we illustrate the system from the perspective of the
systems users. Ultimately those users are the reason for that systems creation in the first place,
so logically it only seems rational that that systems measurement of success is based on the
satisfaction of its users.
Chapter 3 will incorporate a detailed discussion about the maturation of monitoring tools and
techniques that ultimately culminates with BSM. Through BSMs framework and tools, we can flip
upside-down our traditional ways of thinking about monitoring data and enable metrics that more
closely align to the needs of the business.
Siloing
Siloing is the concept of individual toolsets or teams working in insulated environments where
their activities may not necessarily be communicated elsewhere. In siloed environments, the
activities can be unnecessarily replicated elsewhere in the environment, wasting resources on the
duplication.
From the perspective of misalignment, siloing can also represent the lack of communication and
mismatched goals between departments in an organization (Source:
http://en.wikipedia.org/wiki/Silo_effect). When IT-internal goals are siloed away from the rest of
the business as a whole, they lose the necessary collaboration that drives alignment.
Exacerbating this issue is the fact the there are significant silos within IT as well. The network ,
the servers, and the applications are all managed within separate silos, leading to metrics being
technology-focused, rather than business-focused. IT goals must be a component of business
goals and worked on in collaboration with the business if the two are to effectively merge.
28
Chapter 2
Reactive Mode IT
Our last inhibitor is one weve discussed at length in this chapter. When IT is operating at or
above 100% capacity with just the daily care and feeding of the network, it is impossible for
strategic thinking to occur. Though the process involves an initial cost, in order to elevate IT out
of strictly reactive mode, some team members must be permanently or quasi-permanently set
aside for the tasks of strategic thinking and long-term planning.
Theres an old IT joke that goes something like, I dont have the time to automate this process. Im
too busy doing it manually!
The Gartner IT Maturity Curve

The Gartner model for IT maturity expands on the concepts we discussed in the last section. This
model identifies five categories of IT culture. As an IT organization moves from infancy to full
maturity, it exhibits a series of identifying characteristics and enjoys certain benefits associated
with that level of maturity.
Figure 2.3: In the Gartner model, Proactive IT is only the third step towards maturity. Gartners model also
adds the trigger points and associated benefits enjoyed by organizations that achieve each level in the
model.
What drives the rightward movement of an IT culture is its acceptance of planning and
automation components. Youll see right off that Gartner identifies Proactive IT as only the third
step in the maturity process. With Gartners model, once an organization gets to the Proactive
stage, theyre only halfway to fully recognized maturity. The reason for this is the addition of
service-oriented thinking to the automation and planning components associated with the
Proactive stage. Service-oriented thinking aligns with the business service model that BSM
resides upon. Lets take a look at the characteristics associated with each of the stages in the
Gartner model with an eye towards how that stage integrates with the tenants of BSM.
29
Chapter 2
Chaotic
Earlier in this chapter, we discussed Early IT and the associated mindset. That mindset integrates
well into the Chaotic stage of the Gartner model. In the Chaotic stage, Gartner identifies a few
key characteristics (Source: These and all characteristics to follow are from the Gartner IT
Management Process Maturity Model, Transforming IT Operations into IT Service Management,
Data Center 2003, Deb Curtis and Donna Scott ):
Ad-hoc
Undocumented
Unpredictable
Multiple Help desks
Minimal IT operations
User call notification
These six characteristics identify an IT environment that is highly un-optimized. So much so, in
fact, the IT department has no capability of even understanding the underlying environment
itself. A problem that occurs in this environment is likely not realized until a user notifies IT that
the problem has occurred. Automatic notification capabilities are not established. No
documentation of services is available to track linkages between services and service
dependencies. The lack of control renders the environment highly unpredictable.
In the Chaotic environment, tools are purchased for tactical reasons and freeware and open
source tools are often chosen above enterprise-level tools due to initial cost barriers. However,
its important to note that some IT organizations in the Chaotic state have actually over-invested
in tools with the premise that purchasing a tool equates with improving service. And, because
they dont have mature processes, every department has their own tool, duplicating cost and
effort. Toolsets are siloed as are the personnel who use those toolsets. The culture in Chaotic
environments can involve organizational infighting as lines of demarcation are not wellestablished.
Relating the Chaotic environment to our discussion of BSM, it is very difficultif not
impossibleto bootstrap a BSM implementation into an environment that lacks even a modicum
of definition. The organization will likely require a shift to the right before it can consider a BSM
solution.
30
Chapter 2
Reactive
That first shift to the right for a Chaotic organization is a move to the Reactive stage. Without
repeating what weve already discussed about Reactive IT, lets look at a few of the
characteristics Gartner identifies with Reactive organizations:
Best effort
Fight fires
Inventory
Initiate problem management process
Alert and event management
Monitor availability
As you can see, the first shift adds a host of benefits to the IT organization. Although service
availability is still at a best effort stage and SLAs are likely not yet implemented, the
organization is at this point beginning the process of understanding its environment through the
inventory and availability monitoring process. Inventory and monitoring datathrough
predominantly up/down monitoringare feeding management databases even if the data is not
being acted upon.
IT within most companies resides in this stage of maturity. And interestingly enough, many ITminded professionals prefer to work in environments at this stage. At this stage, IT is still a bit of
the wild west but without the complete adhocracy associated with the Chaotic stage. Change
control measures are voluntary and unplanned service outages may still occur due to
miscommunication between the various components of IT.
With an eye towards BSM, environments in the Reactive mode are fully capable of
implementing a BSM solution. However, the implementation of that solution and its associated
service model will involve heavy documentation and formalizing of the environmenttaking the
wild out of the west if you will. Implementing a BSM solution at this stage will organically
shift the organizations maturity another notch to the right. Because of the nature of IT culture at
this stage, this shift can be painful for the employees within the organization.
As stated earlier, it is not necessarily bad that an IT organization lies in the Reactive stage. Only that
the organization has not invested the time and material into elevating key personnel out of firefighting
and into analysis and automation activities. The slow and steady incorporation of automation activities
has the tendency to organically drive this move. It need not be a dramatic change from Reactive to
Proactive.
31
Chapter 2
Proactive
Our previous discussion ended here with the Proactive stage, but Gartner uses this as a stepping
stone to the higher levels of IT service orientation. Proactive stage IT enjoys some very useful
benefits, and only here do those benefits begin the process of aligning IT goals with those of the
business itself:
Monitor performance
Analyze trends
Set thresholds
Predict problems
Automation
Mature problem, asset, and change management processes
They key determinant in identifying a Proactive stage IT organization is the use and analysis of
performance data and how that data relates to the end user experience. Less-mature organizations
still dont have a good answer to the questions, why is the server slow today?, who is
impacted?, and how long has this been a problem?
With Proactive stage IT, the organization begins the process of actually using and acting upon
the data collected by the inventory and monitoring solutions implemented in previous stages and
adds the crucial component of monitoring performance from the end-users perspective. Here, IT
begins the process of understanding the underlying pinning of the network infrastructure and
how it impacts service availability. The maturity of internal processes at this stage begin ITs
ability to truly fulfill SLA guidelines because an understanding of the actual capability of the
network is known.
End-user experience monitoring will be discussed in detail in Chapter 5.
Very important here is that one major issue still lingers in the Proactive stage: SLA guidelines
and upward flow of metrics remain IT-focused and not business-focused. This lack of business
focus is the single issue that keeps IT from reaching the next stage of maturity.
Organizations that implement management and monitoring at this phase are making good use of
the data, but that use is still IT-centric. Implementing a BSM methodology and solution at this
phase will actually provide the greatest return for the business. The general service model is
relatively understood, if not used, at this point. It is here that the IT organization has the maturity
level to understand the cause and effect associated between availability of the service model and
how it affects business operations. A BSM implementation at this phase will rather quickly move
IT that critical additional shift to the right and will do so with the greatest return on the initial
investment.
In Chapter 1, we said that BSM really is the combination of Monitoring + Money. It is this ability to
relate monitoring data to monetary business impact that shifts an IT organization to the Service and
Value stages.
32
Chapter 2
Service
Few organizations mature on their own to the Service stage. Here, the IT organization truly
understands its role in the daily operations and long-term viability of the business. In Chapter 1,
we talked about how BSM helps illuminate the quality of a business service. Here within the
Service stage is where the dollars and cents figures associated with that quality of service are
actually identified and acted upon. That financial description of a service is the main component
of ensuring a successful BSM implementation.
Here in the Service stage, IT enjoys the following benefits:
Defined services, classes, and pricing

Understand costs
Set quality goals
Guarantee SLAs
Monitor and report on services
Capacity planning
This concept of the dollar value associated with service delivery is crucial to the Service stage.
Unlike in previous stages where IT dollars are typically thrown at products, the Service stage
enables the organization to plan the outlay of money towards enhancing the holistic service
model. This understanding of the linkage between the business service and its impact on business
operations is a key characteristic of the Service stage.
Value
Our final stage is the Value stage. Here, the concepts of utility computing and IT on-demand
surface as business services arrive just-in-time for their need. Value computing embodies the
ability for IT to utilize its metrics gathering and actioning tools to predict and react to real time
changes in business demands. Characteristics and benefits of IT organizations in this stage are:
IT business metric linkage
IT improves business processes
Real-time infrastructure
Business planning
It is generally accepted that no organization has really reached this stage yet, though a fully
realized BSM implementation has every capability of launching a willing organization into this
stage. Whereas the Service stage is the embodiment of BSMs central tenants, the incorporation
of BSMs data into automated business decision-making and real-time service control and
management can enable a tightly defined organization to recognize this stage.
33
Chapter 2
BSMs Impact at the Various Maturity Levels

As an organization moves up the maturity chart, there is a logical progression that must be made
from basic monitoring of infrastructure metrics to deploying end user experience monitoring to a
full implementation of a BSM solution. The benefits derived will differ depending on the
capability of the organization to identify and define the service model and its linkages. The
maturity of the organization will also have a bearing on the maturity of its developed service
model. Table 2.2 discusses each of the levels in the Gartner maturity model and some of the
benefits that can be realized by incorporating a BSM solution at that level.
Maturity Stage
Key Steps Toward BSM
Chaotic
Notification
Environment documentation
Network (in)stability awareness
Reactive
Management data integration

Disintegration of IT data silos
Operational dashboards enable IT data views
Beginning service-level notification
Further environment documentation
Longitudinal (cross-device) analysis
Pre-failure warning capabilities
Proactive
Real-time service dashboards

Real-time executive dashboards
End-user experience awareness
Alignment of services to business processes
SLA right-sizing and fulfillment recognition
Monitoring data analysis capabilities
Auto-remediation capability
Automatic traffic reroute
Service
Service desk integration

IT purchase and expansion planning
Business impact quantification
Brown-out awareness and adjudication
Service Impact to budgetary impact analysis
Pre-failure auto-purchase capabilities
Predictive traffic reroute
Value
Real-time service augmentation

IT prediction of business needs
Environment predictive auto-configuration
Table 2.2: At each level along the maturity model, an organization can move towards IT and business
alignment.
34
Chapter 2
IT Focus Is Changing
Whether your organization chaotically fixes problems as they break or predictively recognizes
pre-failure warnings and auto-reconfigures resources to suit, IT in all types of organizations is
slowly maturing across the board. As the science of IT continues to formalize with business
process frameworks like ITIL and others, computing environments need not necessarily always
begin at the Chaotic stage.
Moving through the stages of maturity in an initial IT implementation is getting easier as
common processes and best practice approaches become more generally available to the public.
Along with that automatic maturity comes a slow change in the focus of IT away from the
device-centric and product-centric behaviors of the past and towards a service-centric focus.
This industry-wide change means that fewer organizations are spending less time in the Chaotic
and Reactive stages. The formalization of IT business processes also makes it much easier to
incorporate technologies such as BSM that augment those processes with data and automation.
Four major topics should immediately come to mind when considering how ITs focus is
maturing towards an integrated component of business: Its impact on total business revenue, how
a well-oiled IT infrastructure is a competitive advantage to business, the ability for agile IT to
enhance the agility of business, and how the movement to Proactive IT directly benefits the
business bottom line.
ITs Old Focus
ITs New Focus
Device Availability
Revenue Negative /
Cost Containment
Technology Focus
Support the Business
Service Availability
Revenue Neutral /
Revenue Positive
Business Impact Focus
Part of the Business
Figure 2.4: For organizations at every point in the maturity curve, maturity naturally occurs due to industrywide effects. ITs old focus on cost containment and supporting the business is slowly transforming to a coequal relationship.
35
Chapter 2
Revenue Impact
Traditionally, the IT organization has suffered the business as a loss center. A necessary evil of
all modern businesseseven those not recognized as technology companiesIT traditionally
centers its budgetary activities around cost containment. If IT can trim costs through process
standardization and automation, the resulting giveback to the organization at years end can be
maximized.
This least worst way of budgeting for IT expenditures has had the effect over time of reducing
ITs ability to service its customers. Organizations that historically incentivized IT upper
management through cost containment goals often found themselves hurting in the long run as
expensive technology investments age and new mechanisms for access are made available.
Exacerbating this problem is the inability for many IT organizations to find effective IT-based
metrics that tie into business goals. IT organizations that lack metrics for quantifying the value of
IT to the business have the most difficulty in validating justification for new projects and
initiatives. Often in these organizations, getting new technology in the door is a measure of the
coolness factor or the this-product-is-no-longer-supported factor rather than any quantified
financial benefit.
Contrast this set of behaviors with those in companies who have determined a best-fit set of
metrics relating IT value back to the business bottom line. In these organizations, metrics are
readily available to justify IT expenditures. Rather than relying on the budgetary handouts of
executive leadership, IT serves as co-equal with the business in identifying and exploiting
business opportunities.
In many companies, this changeover to a revenue-neutral or revenue-positive environment goes
far into rapidly maturing IT and aligning its goals with those of the business. Relating this
conversation back to BSM, the technologies and frameworks that comprise BSM assist with the
value quantification process. With BSMs concern for the quality of a service comes ready-made
metrics for identifying its real business value.
Competitive Advantage
Along with the changeover from a revenue negative to a revenue positive organization, IT gains
the ability to drive competitive advantage for business. As an example, look at a widely spread
organization with employees working outside the traditional brick-and-mortar office. ITs
incorporation of rich, process-aligned remote access tools for non-traditional workers
automatically provide a competitive advantage to the business as a whole. Competitors who
require field workers to work through inadequate interfaces incur a time cost per field worker per
transaction per day for each quantity of data those field workers need to input. That time cost
translates into a business advantage.
In our example, mature IT organizations that are able to recognize the business value of correctly
implemented remote access solutions often get the approval to implement them. Then, after
implementation, correctly defined IT metrics ensure that such a system continues to be valued by
the business.
When IT matures to the level at which it is considered a co-equal part of the business that not
only enables business but also drives business, the business as a whole will enjoy an advantage
over competitors. This elevation doesnt come easily. Through organizational maturity comes the
necessary commonality of language and business relation.
36
Chapter 2
Agility
Very similar to the concepts of competitive advantage are the mature IT organizations ability to
rapidly reconfigure as necessary for the maximized functionality of business. Businesses,
especially SMB and mid-market businesses, require rapid shifts of resources as the market and
economy changes. Mature IT will understand the requirements of constantly shifting business
and implement technologies and processes that can operate in todays business environments.
IT organizations that constantly find themselves catching up as the needs of the business
change are not incorporating the tools and technologies necessary for automation and rapid
service deployment. The instrumentation data and related logic associated with BSM and its
frameworks help identify where automation provides the best value to business. As not all forms
of automation provide good return, a mature IT organization will use its monitoring, inventory
data, and trend lines to quickly determine the best fit for new tools and services.
Agility is a key indicator of an IT organizations stage of maturity.
Reactive to Proactive IT
Lastly is the key element of long-term planning. Immature IT organizations tend to wade in the
daily tasking of system care and feeding with little look towards the future. These organizations
often find themselves overwhelmed when a major service upgrade is forced upon them by
vendors. IT organizations that find themselves paying extra to use yesterdays technologies are
likely not performing the necessary planning. As the focus in IT changes and IT continues to
develop a common dialog with the business, these planning issues fall to the side as the business
budget entangles itself with the planning activities of IT.
Figure 2.5: Five common behaviors can be inhibitors to alignment.
37
Chapter 2
Why Invest in BSM?

Throughout this chapter, weve talked about the need for alignment between IT and the business.
Weve discussed why that alignment makes the business stronger and more competitive. And
weve talked at length about how misalignment can eventually hurt a business and its abilities to
be agile in the marketplace. Throughout our conversation, weve touched on the tenants of BSM
as a catalyst for enabling this alignment to occur.
But how exactly does BSM enable that alignment? Were just talking about heads-up
dashboards for management types arent we? In a way, yes. The end result of a successful BSM
solution is a heads-up display customized for its consumer. Business leadership sees real-time
data on business services and their impact on the bottom line. IT leadership sees asset-centric
statistics related to SLA measurements. Even the systems administrators get benefit, using the
fully realized service model as a sort of map to quickly guide their troubleshooting processes
through fault tree and impact tree analyses.
For these benefits alone, a business may want to consider the incorporation of BSM into their
existing management and monitoring suite to help them understand the pain and pleasure their
customers are feeling when using their systems. Depending on the tool selected, the BSM
framework should augment the existing infrastructure without a rip-and-replace of existing
tools.
But included with each of these benefits is how the mere process of implementing BSM and its
service model into an existing network environment can go far in moving an existing IT culture
to the right in terms of maturity. Developing the BSM service model is no trivial task, but the
accumulated knowledge gained through developing its granular composition will go far in
helping IT understand its own networkand help the business better understand IT. And as we
said when we started this chapter, when these two groups understand each other, we have
achieved alignment.
38
Chapter 2
Executive
Views
What is the Business Impact?
Business Metric
Views
What is the Problems Effect?
Service & Asset
Metric Views
What is the Problems Cause?
Operational
Metric Views
Figure 2.6: The data visualizations or views created by BSM build upon each other. Each representative
consumer is given real-time insight into the answers that make sense to their responsibilities.
Where it Works
There are some obvious places where BSM works best within an organization. Business services
as defined by BSM are:
Revenue generating or revenue/cost impacting
Critical to the business
Supported by the infrastructure of IT
Integrated with business processes
Provided by a service organization, whether internal or external
Understanding that, BSM really only works when quantitative revenue metrics can be created for
the service in question. To fully implement BSM, the service must have some impact on the
revenue bottom line and that impact must be able to be quantified into terms of dollars and cents.
So what about infrastructure servers like Domain Name Servers (DNS) and Windows AD
servers? Though their outage may not necessarily directly impact a business service, the potential
exists that their loss can trace to some loss of service capability.
39
Chapter 2
Most computers rely on DNS for name resolution functionality. So losing that service can have an
impact on how your servers talk to each other.
In any case, BSM implementations are the easiest when the business already has an underlying
understanding of its internal business processes. The BSM service model should relate more to
those processes than the underlying hardware infrastructure. Though for those whose processes
are not well-defined, the service model creation process often results in a better understanding of
the componentization of those processes.
Relating back to our chapter example where John Brown and his IT team spend hours of time
each month on gathering, crunching, and reporting statistics, the real-time component of BSMs
data gathering goes far into reducing that process overall recurring cost. As an example, lets
assume that FCG implements a fully recognized BSM implementation that includes the service
model and all attached metrics required by the business. In this example, John Browns process
of gathering month-end statistics is reduced from a multi-person, multi-day task involving the
compilation of data from numerous systems in multiple data formats to simply printing off the
desired dashboard from the BSM system.
Obviously this is an extreme example, because our calculations on return dont include the work
involved with setting up the dashboards and configuring data connectors to each disparate
system. But whats important here is the concept that the automation components intrinsic to the
BSM toolset enable this data to be gathered. Once implemented, organizations that use BSM
tools can organically come to understand their data points of interest and incorporate them into
the proper visualization as they see fit.
The Dashboard Audience
As you can see in Figure 2.6, the typical organization of our resulting BSM visualization layers
data on top of each other. Each of the consumers of data in the business is presented with the
information relevant to their level of responsibility. What is critical here to understand is that
BSM centralizes visualization data from numerous otherwise segregated systems into a single
view. Rather than needing to look through two or three or four separate views using separate
tools to get an understanding of the network environment, BSMs data consolidation and
business logic rules allow for a one-screen picture of the network.
Technicians and Administrators
Following on with our description of Figure 2.6, the operational metrics typically received from
device monitoring systems align with our inventory system data for asset management. These
items help IT in the trenches understand the status of device health, location, and composition on
the network. Our individuals in the trenches get the information they need to track problems and
identify trouble spots on the network. This helps them answer the question: What is the
problems cause?
40
Chapter 2
Managers
Managers, both operational and strategic, also gain the luxury of the same data provided for
systems administrators. Adding to that data are the asset metrics that help them determine best-fit
for existing asset classes as well as planning data to help with expansions and future purchases.
Adding BSMs quality-based logic into the mix, managers can be alerted and react to
preconfigured management reaction lines. As the manager, both in IT and elsewhere is typically
responsible for the level of customer interaction and customer service associated with the
business service under management, they can come to understand the question: What is the
problems effect?
Executives
Elevating our discussion to the level of the executive, their desire for device-based information is
typically relatively low. The executive that cares about an individual routers failure in a remote
data center is relatively rare. What the typical executive does care about, however, is how that
routers failure affects the ability of their company to perform business. BSMs built-in logic
capabilities can parse data from numerous management systems to get a holistic understanding
of the networks health.
Where it excels over monitoring systems is in that logics ability to link to a dollars and cents
view associated with the outage. As is the case in our chapter example, the FCG leadership
would have much more information to work with had they known that the minor IT systems
outage linked so directly to the ultimate functionality of their critical B2B system. Some
examples of Key Performance Indicators (KPIs) that may be of value to the FCG executives are
Web site drop rate, number of orders placed or shipped, or customer satisfaction.
Having that information readily in-hand, the executive can take concurrent action with the outage
to maintain continuity of business and business relationships. As we show in the earlier figure,
the desire of the executive is to understand the question: What is the business impact?
Where It Doesnt Work
We cant talk about where BSM works unless we also talk about where it wont work, though
this half of the discussion is really the inverse of our list of potential business services. If your
business organization utilizes its network solely for the purposes of internal workflow
processing, BSM may not be the best fit for you. As stated numerous times, for BSM to be
valuable, a dollar figure must be placed upon a business service (and, therefore, the lack of it). If
the business network is used strictly for difficult-to-quantify internal workflow, BSM will not
provide the same level of value as to one that impacts the corporate bottom line.
41
Chapter 2
Low Risk Implementation

Because BSM implementations traditionally pull data from disparate management systems rather
than attempt to gather data on their own, the risk to daily operations associated with its rollout is
relatively light. Though some BSM instrumentation can be implemented as part of end-user
experience monitoringa topic well discuss in detail in Chapter 5much of BSMs
instrumentation occurs outside the BSM system.
By segregating the data collection functions (from traditional systems management and
monitoring tools) with the data calculating functions (from the BSM tool set), there is no need
for a rip and replace of existing toolsets within the environment. Typical BSM toolsets include
sets of connectors that tie into other networked systems for the purposes of gathering data. The
highest risk component of a BSM implementation is usually the implementation of these data
connectors.
Chapter 8 will include a discussion of some typical connectors and the best practices associated with
their use.
BSM
System
Instrumentation Data
Systems Management Systems Monitoring

System
System
Application Monitoring
System
Asset Management
System
Figure 2.7: BSMs data typically arrives from other preexisting management systems data gathering efforts.
Connectors are configured to pull instrumentation data from those systems into BSM for further calculation.
42
Chapter 2
Cost Containment Aspects
BSM has further benefits associated with cost containment activities. As weve learned in our
chapter example, the cost of even a single server outage can be high. Thus, increasing our total
effective uptime by even a small percentage has far-reaching budgetary implications.
As an example, if we assume that a highly critical customer-facing system has a desired uptime
of 99.5%, that equates to slightly more than 3.5 hours of downtime per month. If BSMs service
model and the service dependencies that link its elements help us arrive faster to a solution when
an outage occurs, we can translate that into a higher uptime. Increasing our uptime percentage
from 99.5% to 99.6% buys the organization an additional 45 minutes per month of uptime. With
highly competitive, high burn-rate organizations measuring loss at many hundreds of dollars per
user per month, this 45 minute gain translates into thousands of dollars of recurring cost
containmentmerely by having a better visual representation of the problem domain.
Governance and Compliance Aspects
Lastly, the additional costs of maintaining governmental and industry compliance records often
overwhelm unprepared IT organizations. The reality of many compliance regulations is that
some auditable technical control must be in place to manage and monitor systems for compliance
violations. Depending on the compliance regulation, those controls may have different objectives
such as preventing the release of Personally Identifiable Information or protecting data of a
financial nature. Incorporation of BSM helps aid the regulators in recognizing that due diligence
has been done on the part of the organization to understand its security posture and rapidly notify
when configurations go out of baseline or security triggers are flagged and noticed by the
organization. The historical reports and real-time dashboards of a BSM implementation easily
shows the security officer and auditors that those due diligence controls are in place and
operational.
The Value of Alignment

As weve shown throughout this chapter, there is value in aligning the goals of IT and the
business. Once weve bridged the chasm of vocabulary, mismatched expectations, and
technocentric metrics, the business gains competitive advantage not enjoyed by those who view
IT merely as a cost center. A BSM solution can link critical business processes to IT services
while bringing IT goals back to the business, creating that critical alignment.
43
Chapter 3
Chapter 3: IT Service Management Evolution

Beep, beep.
Oh, not again, thinks First Class Glass COO Dan Bishop as he rolls over in bed for what
seems like the third time this month. Picking up his mobile device he reads the message on its
screen, Another one? Why do these things always seem to happen in the middle of the night?
DEN-RTR-02B-H, Failed Ping Response, 10/13, 4:23a,
Expected down. TKT 104328 assigned to NET_OPS.
What the heck is a DEN RTR, he thinks as he starts to dial up his IT Director John Brown,
and why do I care if it failed a ping response? These sorts of middle-of-the-night mobile device
beeps must be commonplace for the IT guys. But Im too busy and too near retirement to get
blasted out of bed like this once a week. How do these guys do this all the time?
John answers the phone with an equally bleary voice. Whats a D-E-N-R-T-R and what
happens when it fails a ping response, asks Dan to his IT top gun.
John responds groggily, Gmornin, Dan. That means one of our backup routers in the
customer DMZ couldnt be reached by the monitoring system. Lack of a ping response tells us
that its not talking on the network.
Is this bad?
Not really, John explains, Weve got another router on that network that load balances with
the router that went down. When either one of them goes down, the other has enough bandwidth
to handle the load until we get it back up. Its nothing to really worry about. One of my guys will
fix it when they get in later this morning. That routers been giving us trouble anyway. Its
probably time to replace it with a newer model.
Nothing to worry about, eh? Thats what you said last month when we lost that minor IT
system. Dan thinks to himself as he hangs up the phone and tries to catch one more hour of
sleep.
You see, a little over a month ago that minor systems outage caused FCG a six-figure cost
overrun in their accounting department. And since then thats all thats been on Dans mind.
Outages. All the time, it seems.
Because of that outage and ITs mischaracterization of it, Dan has asked to have his mobile
device paged by the notification system whenever anything shows a problem. But in asking he
never realized just how often things went down. Whats worse is that they always seem to go
down in the middle of the night, and always right before a big customer presentation the next
day. Hes beginning to think that asking for this level of detail was a mistake, but he doesnt
want to back down now.
You see, it was Dans job on the line when he got called into the CEOs office to explain the
situation and the budgetary hit last month. He doesnt want to go through that experience again.
He needs a new router, then? he thinks as he falls back asleep, Well discuss that in the
morning.
44
Chapter 3
Maturity Impacts IT Goals

In our last chapter, we talked about the need for alignment between IT and the business. We
discussed how IT organizations everywhere are going through a maturation process. Through
that process, they are coming to understand their priorities and responsibilities in the business.
Through maturity and over time, IT moves away from a reactive mode of operations toward one
in which they understand their coupling with business financials and operational processes.
In this chapter, well move away from the cultural aspects of ITs maturity and into the technical
concepts associated with ITs monitoring of its systems. Here well be discussing the role of IT
services and the management of those services. As IT becomes more mature, it experiences an
evolution in the way it evaluates the management and monitoring of their business systems.
Along with that, it begins to develop the linkages between traditional Network & Systems
Management and the Process Management that drives the rest of the business.
As you can see from our story earlier, FCG is still going through that maturity process. Early
attempts at aligning IT with the needs of FCGs business failed, and so a single IT systems
outage is causing pain both in terms of dollars and in loss of sleep for at least one executive.
What youll learn in this chapter is that this begs the following questions Does Dan really need
to get paged out of his sleep once a week for every routers or servers missed ping packet? Or is
there a more digestible way for him to keep up with the operations of the network on which his
business runs?
Among others, it is these questions that this chapter, this guide, and indeed the tenets of BSM
attempt to answer. Dans job is to understand and act based on the needs of FCGs customers.
His job is not to manage the status of his companys network devices and applications. But as
COO, he does have responsibility for that networks operations. Properly combining the
requirements of his responsibilities with the needs of his job is one of the goals BSM attempts to
fulfill.
In order to do so, well start this chapter by defining an IT Service and continue by discussing
how that definition has evolved over the years from its beginnings in terms of strict network
availability to todays understanding of BSM service quality.
45
Chapter 3
What Is an IT Service?
According to the IT Information Library (ITIL), an IT Service is defined as:
A service provided to one or more customers by an IT service provider. An IT service is
based on the use of information technology and supports the customers business
processes. An IT service is made up from a combination of people, processes and
technology and should be defined in a service level agreement.
Taking this one step further, we should also define a business service. A business service is an
IT service that directly supports a business process, as opposed to an infrastructure service,
which is used internally by the IT service provider. The term is also used to mean a service that
is delivered to business customers by business units. Successful delivery of business services
often depends on one or more IT services.
Deconstructing this statement in relation to traditional IT managed devices and applications, an
IT Service can encapsulate the processing of data; the transmission of that data through
processing elements; the visualization, manipulation, and administration of that data through
users and administrators; and the data itself.
So in laymans terms, an IT Service is something ultimately recognizable by that services
consumer. If the consumer can see the forest for the trees associated with their description of
the service, then we have done a good job of creating the service as categorizable.
There is one major difference here between the ITIL definition and how a service is internally
defined by many IT organizations. Those organizations not at higher levels of maturity often
leave off the most important component of the definition: The business and workflow processes
that enable the successful use and management of that data as it proceeds from creator to
consumer. In effect, what many lack is the reason for the datas existence.
Figure 3.1: Numerous CIs make up an IT Service. Many IT organizations lack the organizational maturity to
include business process as a manageable CI.
46
Chapter 3
This removal of business relevance from service definitions, especially when dealing with the
monitoring needs of an organization, leaves an incomplete picture. Lacking those processes and
procedures leads to a techno-centric definition of IT Services which inhibits their alignment with
the business processes that rely on them.
Weve discussed at length the need for business-centric definitions in any mature understanding
of a business service. Throughout the rest of this chapter, well justify this need by filling in the
historical gaps of how that datas definition has evolved over time.
Service Management
Identifying services is one thing, but ultimately managing those services is yet another. One
definition of IT Service Management is the implementation of a strategy that defines, controls,
maintains, and enhances the IT Services for the enterprise. It embodies people, processes, and
technology in order to provide quality of service (QoS) for business objectives and operational
goals.
As weve discussed in Chapter 1, one of the major components of either a BSM installation or
really any Network & Systems Management (NSM) rollout is the identification, classification,
and granularization of IT Services into their atomic components. Removing BSM completely
from the picture, think about the steps you take in setting up monitoring within a traditional
NSM environment:
Endpoint IdentificationThe first phase is to identify the manageable endpoints within a

network environment. Those endpoints may be network devices, individual servers and
applications, managed environmental control components, and intra-service triggers of
interest.
Element ClassificationOnce an inventory of manageable endpoints is identified, those

elements are classified into various categories. Often, this classification is based on
endpoint type and again based on the level of notification desired for that endpoint.
NotificationOnce identified and grouped, those groups are assigned notification rules
based on the need for IT to recognize state changes for elements within the group.
Element groups of high criticality are given more stringent notification rules than those of
lesser impact on operations.
Iterative GranularizationAfter notification rules are set into place, a follow-on

improvement period for those rules and classifications is completed. Typically, this
improvement phase adds alerts and notifications that were omitted in the first phase, but
later found to be critical.
Remediation AssignmentOnce the model becomes relatively static and the comfort
level of the organization increases, automatic remediation actions can be assigned to
regular events. Obviously the addition of automatic remediation components has an
element of risk, so rarely does this step in the process see use in full production.
This process relates to a traditional NSMs incorporation of IT Service Management. What we

will find as we go through this chapter and into the next one is that BSM augments these steps
with a few additional onesnamely the steps associated with Service Quality Assignment and
Business Relevance and Metric Linkage.
47
Chapter 3
The Timeline of Management and Monitoring

Lets take a few steps back from our definition of IT Service Monitoring and take a hard look at
how this process has come into being. Doing so will help us understand the relevance of BSMs
tenet of service quality and the metrics that align with that tenet. Once we understand how
management and monitoring has evolved over the yearsgrowing ever closer towards alignment
with the businesswe will better recognize its drive towards a focus on value.
Since the beginning of modern computing, there has been an ever-growing need for systems
management and monitoring. As the number of computers that make up a network has increased,
the number of configurations and manageable endpoints has geometrically increased as well.
Adding even a few more systems into a network means adding dozens or hundreds of additional
management touch points to that network environment.
Over time, the process of managing those touch points has evolved, starting with the early
attempts at network management, through proprietary agent-based tools, resting now for many
organizations with a new focus on agentless management and end user experience monitoring.
What we anticipate to see in the future as we continue down this timeline is a refocusing on
value in the near-term future by organizations that have progressed up the maturity curve. Lets
take a deep look at each of these steps in the timeline to help us understand how yesterdays
technology drives the needs of today.
48
Chapter 3
Early Management
Network Management
Open systems Management
SNMP-based Management
Proprietary Agents
On -System Code
Vendor-specific Agents
Little Inter-Software Integration
Native / Agentless
Open Standards / APIs
Common Collection & Transport
System-based Protocols
Value Focus
End User Experience
Data Quality Trumps Quantity
Improvement Trumps Reaction
Business Value of IT
Figure 3.2: Four discrete timeframes in the evolution of systems management and monitoring.
49
Chapter 3
Early Management
In the beginning, there was the Simple Network Management Protocol. SNMP was originally a
network-focused protocol designed to provide a common framework for devices to relate state
changes to a centralized Network Management System (NMS). SNMP describes both the
protocol that transports state change information around the network and the framework that
defines each network devices remote monitoring and configuration capabilities.
SNMPs original goal was to provide the administrator information about the status of network
devices. Originally relegated to those devices only, the protocol and central NMS could notify an
administrator when a network device such as a router, switch, or firewall dropped off the
network, began losing an excess of packets, or otherwise entered into an undesired state. Early
NMS were, and in many cases continue to be, completely device-centric. Typical heads-up
interfaces display network maps with stoplight charts displaying the status and health of network
devices.
As time progressed, additional components were added to SNMP to manage elements other than
network devices. SNMP Management Information Bases (MIBs) were extended into the server
and application space as well as environmental control devices within the data center. SNMPs
device-agnostic architecture meant that virtually any device that retains on-board state
information had the ability to push that information to a central NMS through the SNMP
transport protocol. In fact, SNMPs capabilities are and were so generically designed that it
continues to this day as a major tool in network management for many devices across the
network.
Proprietary Agents
Over time, some of SNMPs weaknesses eventually came to light as administrators attempted to
use it as a tool for managing configuration change on the part of network devices. Whereas
SNMP is an excellent tool for reading device information, using it as a tool for writing a
configuration was found to have security implications. Also, while many devices eventually
grew the capabilities of pushing information through SNMP, few adapted that ability for
updating on-board information. Moreover, even as servers and applications started down the road
of providing SNMP-capable interfaces for monitoring for state change, not all server and
application functionality was exposed.
In this vacuum of remote management capabilities came forward a set of proprietary software
solutions that could provide the necessary management. Although SNMP was embraced by
virtually all vendors of network devices, those who built systems and applications found
themselves moving towards other third-party tools along with their on-board agents to provide
better access to system and application information.
Its important for us to stop here to talk a little about those early attempts at systems and
application management. The early attempts, with many still being used today, utilized
proprietary agents that are installed directly onto the system of management. These proprietary
agents were pieces of code that needed to be installed onto every system under management.
The agent would regularly inventory the system for configuration and state information, wrap the
result into a transmittable package, and send that package across the network to the NMS. The
use of installable agents for systems management had both its pros and cons (see Table 3.1).
50
Chapter 3
Pros
Cons
Remote management capabilities
Time delay associated with agent poll cycle
Little network impact
Requires network support of agent protocol
Rich capability support
High cost of agent management
New capabilities easily added to agents
New capabilities = agent upgrade
Table 3.1: The pros and cons of proprietary agent-based systems management and monitoring.
It should be noted here that agent-based utilities are not necessarily a bad thing. In fact, many
modern management and monitoring tools continue to make excellent use of agents as a system
component. Agent-based utilities have the ability to provide more management capabilities to the
administrator because additional capabilities can always be coded into the agent. And, all things
being equal, agent updates are often much easier than system updates.
The biggest downside of agent-based tools is the non-value added cost of managing agent
installations. This can be a time-intensive process and many tools dont include mechanisms for
locating rogue hosts on the network. Presence of these non-managed endpoints can skew statistical
reporting for agent-based systems.
Native/Agentless
As OS vendors began catching up to the needs of administrators for centralized manageability,
they began recognizing the value of including some of the agents function within the native OS
code. By providing agent-like code within the system and leveraging open standards for the
categorization, function, and transport of that agent-like data, it grew more and more possible to
manage systems purely from within OS-internal APIs.
Four components of native or agentless systems were required for the proper identification,
storage, and transport of configuration and state information to independent NMS systems.
Those components, pictured in Figure 3.3, are the state collection component, the storage
component, the API, and the transport component.
51
Chapter 3
On-Board
Collection Component
Network
rds
nda tocol
a
t
S Pro
en
Op sport
n
Tra
On-Board
State Database
Management
& Monitoring System
On-Board API
Agentless System
Under Management
Figure 3.3: The four components of agentless monitoring.
Your BSM implementation will likely include connections to agent-based as well as agentless
management components.
One major requirementand arguably the requirement that delayed the large-scale incorporation
of agentless systemsis the need for industry agreement on the API and transport component.
Think about the receiving NMS system. For that NMS to properly work with numerous agentless
systems of different vendors and classifications, economies of scale in terms of APIs and
transport protocols is necessary. Otherwise, the NMS vendor would need to code individual
interfaces for each type of system and each type of on-board APIsomething they are likely not
going to do due to cost implications. So until that industry agreement was realized, there were
few entries into agentless NMS.
52
Chapter 3
Agentless systems also include both pros and cons as components of their architecture (see Table
3.2).
Pros
Cons
Remote management capabilities (similar to

proprietary systems)
Greater processing requirements at the NMS
Agent protocol is often network agnostic
Higher network utilization
Little to no agent management
Not all capabilities supported
Agent capabilities come as part of O/S upgrades
O/S upgrades required to gain new agent functions
Table 3.2: The pros and cons of agentless systems management and monitoring.
As you can see from Table 3.2, the agentless architecture is not necessarily a panacea for solving the
problems associated with proprietary agent systems. Agentless systems introduce problems of their
own as they solve others. The major limitation of agentless systems lies in their inability to rapidly add
new capabilities, relying on those built-in to the system to provide the brunt of the functionality. OS
vendors also typically update their code less regularly than would the vendor that supplies an agent.
Often, an OS vendor will update their APIs only at major milestone code releases. Thus, additional
functionality can take an extended period of time to be realized in the market.
Focus on Value
Our last phase in the timeline actually moves away from evolving the management components
on the individual endpoints. This phase engages itself with the addition of logic for things like
service contract fulfillment, service-based solutions, framework fulfillment, event integration,
service monitors, and end user response time evaluation among others. This phase in the timeline
of management and monitoring concerns itself less with the data collection and more about
improving the quality of the collected data, so its integration is decoupled with individual
endpoint management. Incorporating the logic addition means few if any changes to the
individual endpoints.
53
Chapter 3
On-Board
Service Level
Logic Processing
Visualization
Processing
Service Level
Valuation System
On-Board
Service Level
Logic Processing
Pro
p
nsp rietary
ort P
roto
co l
Proprietary Agent API
Visualization
Processing
Tra
Database
Network
s
dard
Stan rotocol
n
e
Op port P
s
Tran
Management
& Monitoring System
On-Board Agentless API
Figure 3.4: Valuation in NMS arrives as a Service Level Logic Processing component either within the
traditional NMS or separate from it but leveraging its data collection capabilities.
In fact, this addition can operate as a function segregated from the NMS itself, operating as a tool
that leverages NMS-collected data for further processing. The lesson here is that wrapping a
value system on top of existing NMS systems allows for the continued use of those systems
while non-destructively adding the value recognition system into the operating environment.
54
Chapter 3
Lets take a look at each of these example capabilities in turn:
Service contract fulfillmentMany organizations have been using Service Level

Agreements (SLAs) either between IT and the business or between IT and its external
service providers. By incorporating higher-quality, data-centric and user-centric
information into the recognition of that contract, more accurate fulfillment is realized.
Service-based solutionsPerformance metrics can be introduced into existing IT

Services to validate their efficacy to their consumers.
Framework populationIT frameworks are only valuable once they are populated with
data. The introduction of management and monitoring into frameworks brings value to
their end results. Many IT frameworks assist with the troubleshooting and resolution
process but may be cumbersome to implement during times of critical outage. When data
is automatically populated into these frameworks, their use during outage incidents is
likely to provide more value.
Event integrationDisparate components typically provide eventing information to their

consumer, but the segregated nature of these components complicates system-wide
troubleshooting. By integrating the event information from all the components of a
system, a longitudinal or time-based analysis of system health is much easier obtained.
Service monitorsCombining eventing and performance information from multiple

systems into a central repository and applying logic to that data, the administrator can get
an ultimate view of total system health.
End user response time evaluationCompiling response time information for

components across entire systems, when combined with the right programming logic,
enables metrics to represent total system response times. But this comes only from
crunching performance counters for all elements in a system and then relating those
numbers to end user or consumer experiences. Once this effort is completed, notifications
can be issued to alert whenever such response times exceed pre-set thresholds or violate
SLA requirements.
Well talk more about these capabilities and how they enhance operational value of a system in
Chapter 7.
BSM is one tool that enables each of these components to be added into an existing NSM
environment. BSM implementations typically do not involve a rip-and-replace of existing
tools, so their incorporation is a low-risk activity. As well discuss in Chapter 8, BSM software
typically arrives with sets of data collection tools and aggregation functions to pull data from
existing NSM tools, both those that incorporate proprietary agents and newer agentless tools.
Because BSM can ingest data through multiple disparate toolsets, it enjoys the pros of each
toolset while pushing down the responsibilities of each toolsets cons onto that toolsets data
collection system. With this understanding of the timeline of management and monitoring, lets
continue our historical analysis to discuss how the targeting of service management has evolved
over the years towards the need for BSMs concept of service quality.
55
Chapter 3
The Evolution of Service Management Targeting

Over time, business networks have come to recognize more and more of their internal services
and the components that make up those services need to be brought under formalized
management. This evolution started early on with the need for a stable network underpinning. As
system elements grew more reliant on each other and on elements higher up in the stack, those
components too became critical for the successful operation of the overall system. Figure 3.5
depicts the elements of Service Management as it grows in maturity. More mature organizations
focus their attention closer to the middle and view the outer three elements as data feeds to End
User Experience, Service Level Management, and BSM. What youll see in this section is an
organic movement towards higher-level system components and eventual layers of abstraction
over those systems.
As you read through this section, you might want to compare the components managed here with
those currently under management within your own network. The following subsections intertwine
with the levels of the IT maturity model discussed in the previous chapter. Thus, the more recognition
of monitoring needs within your network indicates a higher level of process maturity within your IT
organization.
Network Availability & Utilization

Server Performance
Troubleshooting & Predictive Analysis
End User Experience
BSM
Figure 3.5: The elements in Service Management targeting center towards BSMs service valuation.
56
Chapter 3
Network Availability and Utilization
The network is the common intermediary that all computer systems rely upon. Adding to this is
the multi-server nature of most business systems today. In todays networks, with rare exception,
any form of data processing requires the cooperation of numerous systems connected by the
network for proper functionality.
Thus, it only makes sense that this common touch point was usually the first component to come
under management and monitoring for most networks. When the network goes down, virtually
all data processing comes to a halt. As todays networks require constant uptime in order to
complete their daily tasks, any outage of the network becomes quickly critical.
Going beyond simple availability is the need for proper management of network utilization and
performance. Data processing needs as they move from system to system can require varying
levels of network bandwidth for their completion, and the management of that bandwidth and its
use is highly critical. Highly immature networks do not typically have any measure of network
utilization understanding. Thus, it is common for new services to be added to the network until a
user-noticeable change in performance is realized.
Like the old saying Once youre thirsty, its long past the time you should have taken a drink,
it is critically important for the stability and continued operation of the network for its managers
to understand what kind of data traverses it and at what volume. Understanding these
calculations goes far to ensuring that network augmentation activities occur before theyre
needed.
Server Performance
As the business grows more reliant on its network, each of its individual servers that make up a
data processing thread grows more important. Early businesses leveraged centralized computing
for most data processing, which meant fewer endpoints to monitor for problems. But as
interdependence of servers in a path became more complex, the computing model grew to a
heavy focus on decentralization and those endpoints grew geometrically. Moreover, data
processing requirements can be drastically different based on the needs of its consumers:
Resource-intensive operations can occur at times that conflict with other needs.
An overabundance of users performing similar pipelined activities can flood a system

thread.
The scheduling of network infrastructure activities, which can be very highly resource
intensive, can conflict or interrupt business data processing.
Weve already talked about the complexities associated with the business calendar. Once a business
begins engaging in e-commerce or worldwide operations, the issues associated with multiple time
zones complicate the scheduling of business processing. This scheduling involves complex
mathematics (and therefore software and processes) to ensure conflicts do not occur.
57
Chapter 3
Because of the growing needs of the business for always-on processing, the management and
monitoring of these activities and their impact on server performance grows as the business
reliance on its infrastructure grows. Incorporation of performance management and monitoring
activities on individual servers ensures the visualization of resource use in a way that is
actionable by its administrators and capable of being planned by IT and business analysts.
Relating this need to BSM and our discussion on IT maturity, the problem with server
performance is usually not the recognition that it needs to occur. Rather, the problem often is in
what to monitor. With thousands of potential counters on each system, and differing counters
based on system or device type and installed applications, incorrectly separating the wheat from
the chaff in terms of counter efficacy is an inhibitor to good server performance management.
Troubleshooting and Predictive Analysis
Once an organization understands its network and the servers that reside on that network and
incoming data is tuned such that monitors are watching for valued data, only then can that data
be used for troubleshooting and predicative analysis. If you fully understand the incoming data
that you want to receive, you can relate that to the data youre actually receiving.
Through logical subtraction of these two elements, we can begin analyzing the quality of the
services under management:
When performance data goes above thresholds, we can recognize and resolve a resource
overuse situation.
When availability data consistently goes under thresholds, we can predict when to
implement compensating mechanisms for load balancing or failover states.
As we recognize new consumers of data on-system, we can add processing resources to

handle the load.
It is within this phase that we can begin assigning thresholds to individual system states. Later,
well augment those thresholds with an assignment of quality. That quality assignment
eventually becomes the dollars and cents valuation needed to implement BSM.
58
Chapter 3
End User Experience

Troubleshooting system problems only gets you so far. Sometimes when an application residing
on a server is having problems such that users are unable to complete their necessary actions,
those problems dont necessarily manifest into server-level counters. The network as an
aggregate may not expose enough visibility into the problem to indicate that there is indeed a
problem.
In these cases, End User Experience monitoring capabilities are necessary to relate the users
experience in quantitative terms. They are also critical before committing to formal SLAs and to
the ongoing management of service levels. These tools combine agentless monitoring for
visibility into all users, all the time and synthetic transactions for consistent, proactive
management. They probe deep into attached applications to determine the speed of transactions,
page refreshes, and ultimately the users experience using the application.
These tools are particularly useful when the business begins utilizing e-commerce applications
on the Internet. The typical Internet user is willing to wait an average of 4 seconds between
clicking to accomplish an action and receiving the result of that action. E-commerce sites are
often selling commodities, so results that exceed user thresholds often lead to a loss of business
to the site. If a Web site consistently experiences extended delays or downtime, users lose trust in
the site to successfully complete their business transaction. Businesses that find themselves with
an excess of customer drop rates, or customers who begin a transaction but dont complete it,
often find that End User Experience monitoring enhances their ability to determine exactly
where and why those dropped customers leave the site.
Moreover, End User Experience monitoring speeds the time-to-resolution metrics for critical
performance and outage issues on customer-facing sites. As the business grows more reliant on
e-commerce as a line of revenue, the functionality of these tools becomes more critical to
operations.
Well discuss End User Experience monitoring tools in detail in Chapter 5.
J2EE & .NET Application Performance

As an offshoot of the Server Performance, Troubleshooting & Predictive Analysis, and End User
Experience stages are the specific counters and activities associated with application
performancespecifically those applications residing on J2EE or .NET platforms. As managed
code, these high-level programming languages incorporate specific pluggable APIs whereby
state information and performance data can be measured by specially coded monitoring
applications.
59
Chapter 3
Network Availability & Utilization

Server Performance
Troubleshooting & Predictive Analysis
J2EE & .NET

Application
Performance
End User Experience

BSM
Figure 3.6: J2EE and .NET application performance bridges the layers of Server Performance,
Troubleshooting & Predictive Analysis and End User Experience.
These types of applications are specifically called out because of the nature of their pluggable
monitoring interfaces as well as their custom nature. Many businesses have incorporated custombuilt applications on these platforms for the purposes of business-specific data processing. Often,
these types of applications are coupled with Web front-ends and customer-facing interfaces,
making them highly critical in the eyes of a business customers.
Unlike Commercial Off The Shelf (COTS) software, these home-grown applications are built inhouse and may not necessarily contain needed management interfaces within their proprietary
code. Therefore, the best way to manage these types of applications is often through the APIs
within their residing code frameworks. As an organization leverages more of these home-grown
applications residing on pluggable frameworks, the more their critical operations rely on them.
Mature organizations leverage tools such as these for End User Experience to monitor and
manage them.
60
Chapter 3

Evolving our management and monitoring tools to the modern day, we begin considering the
value of IT service levels as a component of business processes. Once we begin losing our
techno-centric valuation of IT metrics in favor of relating them towards growing and supporting
the business as a whole, we elevate our vision of IT systems towards Service Level Management
and eventually BSM.
Here within Service Level Management, we begin to focus on how IT services underpin business
objectives. The data generated through Service Level Management tools elevates SLAs beyond
individual device availability and towards a holistic view of the system.
Relating back to the earlier example, if an individual network router goes down, the executives
of the business wont likely care about the outage if the total capabilities of the system are not
negatively affected. There, because the router was installed with a redundant, high-availability
configuration, the outage of a single router did not affect the total system health. Thus, in a fully
realized Service Level Management scenario, the routers state change should not have been
elevated to the executives level of digestibility.
Service Level Management data feeds into SLAs by providing the data that allows those SLAs to
be managed and valuated in real time as well as historically through a data-driven approach.
Knowing immediately when SLA breaches occur and tying that information into the
Troubleshooting & Predictive Analysis components of earlier iterations means that an SLA
breach can be quickly remedied through automated troubleshooting tools. Additionally, data
gathered through End User Experience tools can augment traditional SLA metrics such as server
and network uptime with key metrics like service response time and component availability.
These new metrics are beneficial to the business in terms of relating its service quality to the
value of its user experience.
Beginning to be realized here as well are the incorporation of advanced visualizations,
commonly realized as dashboards customized to the level of their consumer. These advanced
visualizations are designed for the purposes of enlightening their reader on the system state
information of interest to the reader. Dashboards and dashboard design are two major
components of BSM as well. Before incorporating BSMs tenets into our dashboard
conversation, we need to recognize that at this step, the business is willing to provide
information regarding system availability and health to multiple classes of consumers.
61
Chapter 3
Business Service Management

BSM lays on top of Service Level Management by fully realizing the tie between the business
bottom line and IT Service states. Augmenting the visualizations initially created by Service
Level Management tools with quantitative information about service quality, BSM in effect
becomes the tie between IT Service state and the corporate bottom line. Within a fully realized
BSM service model are the linkages between individual IT elements, their relation with each
other, the business services they support and most specifically the dollars and cents valuation
associated with a reduction in the quality of the business service.
As you can see, with the ties between individual services being many and complex, the
mathematics involved with calculating service qualitys impact on budgets are also highly
complex. This reason, combined with the necessary state logic, explains why BSM tools are late
to the game compared with the other tools described in this section.
As youll discover in Chapter 4, depending on the level of maturity of your monitoring systems
presently in place, the granularization of services required to fully realize a BSM implementation
may or may not involve a lot of work. The combination of what youve learned in this chapter
with our discussion from the previous chapter on IT maturity should give you some idea as to the
complexity of the full design.
An Example
Lets now take what weve learned and incorporate it into a real-world example of a business
system, exploring how the management and monitoring of that business system can evolve from
network availability through each of the layers discussed earlier, ending up with a BSM
realization. In this example, well look at a simple Web-based customer-facing system, hereto
referred to as the system. That system is comprised of the components identified in Figure 3.7.
Figure 3.7: Our example system includes twelve components. Each components role is critical to the
operations of the system.
62
Chapter 3
As you can see in Figure 3.7, the system encompasses twelve individual components:
The external firewall between the Internet and the DMZ
The internal firewall between the DMZ and the intranet
The Web Server in the DMZ
A directory server
An e-commerce server
A database server in the intranet
And the six network connections that interconnect them to each other and to the Internet
Each of these components must work in concert for a customer from the outside to properly
connect to the Web server, locate the item they want to purchase, create and make use of an
account, and purchase the item. The outage of any of these components will involve a related
degradation to the total service quality to this systems customers.
Network Availability and Utilization
What should be immediately obvious, even to the inexperienced observer, are the linkages
between each of the components in this system. Notwithstanding the single connections between
components shown in the picture, the loss of any single component negatively impacts the ability
of the system to service its customers. If the systems Internet connection goes down in the
middle of the night, the system wont be able to service customers attempting to purchase items.
Additionally, should this business release a new product or product update that causes its
customers to want to purchase the item in large numbers, the network utilization between
disparate components could get oversaturated. Notification during these types of events that the
network is oversaturated will help to identify that there is a run on the product.
Server Performance
Going one step further in the obvious components of our system, it is likely that the owners of
the system will also want to know how utilization of that system by its customers affects the total
performance of the system. As we discussed earlier in this chapter, at this phase, were merely
looking to see overall component performance as a measure of the systems capability to service
its customers. As an example, if the resource metric % Processor Use elevates above 90% for the
E-Commerce component, there is a reasonable expectation that the system may be having issues
keeping up with the demands of its incoming customers.
63
Chapter 3
Troubleshooting and Predictive Analysis

Once we begin understanding the nature of the system and the customers incoming, we can begin
leveraging historical data against a point-in-time analysis to determine the efficacy of the system.
For an example of predictive analysis, if we have recognized that % Processor Time for our ECommerce system tends to operate within 60 to 70% utilization during heavy periods of use, we
can predict that an extended period of 90% utilization may reduce the E-Commerces ability to
perform its function by 25%. We know that when the E-Commerce systems utilization goes
above those thresholds, the approval of our users credit cards will take 11 seconds rather than
the typical 6 seconds.
Moreover, we can add troubleshooting breakpoints into the system. In the situation when the ECommerce server goes above our desired thresholds, we can add a secondary processing
componentperhaps one that is offsite and contracted for use in burst situations onlyinto the
management of our system. Our management system can watch for performance thresholds and
reconfigure the Web server to use the backup system when conditions occur.
End User Experience
Believing that 90% processor utilization for the E-Commerce system is a bad thing may not
necessarily give us the data we need to recognize added pain on the part of our users. Once
weve evolved our monitoring capabilities to a certain threshold point and/or added the necessary
technology to the mix, we gain the ability to feel our users pain by illuminating the experience
from their perspective.
But how is this different than simple Server Performance? In quite a few ways. Server
Performance metrics will provide for us a whole-system understanding of the resource use on a
system, but they will not tell us how many seconds it currently requires a user to complete a page
refresh when switching from item to item. It cannot tell us how long a credit card takes to
process. If 90% processor utilization on the E-Commerce system actually occurs from time to
time due to pipelined credit card processing operations, it is possible that that condition does
nothing to impact the time delay on the part of the user. Our expansion to an expensive
secondary processing component based on whole-server metrics may be costing us revenue that
our users arent worried about.
Managing our Web site from an End User Experience level means that we gain the perspective
of the user without needing to sit in front of the Web site. It provides us with the notification
capabilities to recognize problems the users are seeing and react to them when the users perceive
it is necessary.
64
Chapter 3

Once weve elevated ourselves to Service Level Management, it can be argued that the majority
of the data collection needs for our system have been realized. Now we need to tie that data into
goals that allow us to measure success on the part of the system. Our organization can begin
creating SLAs between the business and the systems administrators (typically IT) to provide
goals and measurements towards those goals in keeping the system operational.
But this is only one way to use our collected data in support of our system. Our system relies on
components out of the scope of control of even our internal IT organizationits Internet
connection and the backup E-Commerce system we contracted for in our Troubleshooting and
Predictive Analysis steps. Even though the organizations that provide these services for us
typically provide their own SLA to us validating their quality of service, it is often up to us to
police those agreements.
Adding the logic associated with those agreements into the data collected by our systems gives
us the real-time capability to recognize failures from our suppliers and take action based upon
those failures. Additionally, the pairing of the business data relevance with the technical data
weve collected to this point enables the building of heads-up dashboards that begin to make
sense to individuals outside the IT organization.
Business Service Management
Once weve reached the level of successful Service Level Management, the incorporation of
BSM into our reporting structure enhances our data visualization with real-time and historical
recognition of the revenue or costs associated with the system. Dollar values are applied to each
of the twelve components in the system, and our dashboards gain financial relevance to the rest
of the business.
What we also gain by adding business goals into our monitoring data are hard statistics that help
us better make informed purchases as to system augmentation. Once our service model for this
system is complete and filled with the appropriate business data, we can determine where the
chokepoints are in our system and compare the cost to upgrade metric alongside the cost of poor
quality metric. This grants us the privilege of making informed augmentation decisions,
justifying additional components and features within the system. Every decision made here bases
its information on data gathered and analyzed from each of the earlier elements.
65
Cost of Poor Quality
Chapter 3
Figure 3.8: The data in BSMs visualizations can correlate the measure of poor quality alongside that
measures cost of poor quality and superimpose the cost to upgrade to help make better management
decisions.
Moving Along the Evolutionary Curve

Obviously, moving along the evolutionary curve that this chapter describes needs to provide
some benefit back to the company for the company to engage in each subsequent activity. As a
company recognizes the positive value in evolving from state to state, they invest the time and
money necessary to incorporate the tenets of that evolutionary state into their production
systems. Lets look at four ways in which moving along the evolutionary curve directly affects
the bottom line.
Speeds Troubleshooting
More data does not necessarily help with troubleshooting. In fact, more data often inhibits
troubleshooting processes as administrators find themselves awash in a sea of raw information. It
is higher-quality, business-centric and user-centric data that goes far in speeding troubleshooting.
That higher-quality data arrives as a component of each evolutionary jump. Reaching the higher
stages means that the data feeds into more refined models, and those models are more understood
by their consumers.
66
Chapter 3
Improves Performance
Knowing the performance on a system means you can impact change in that performance.
Starting with network and working into whole system performance, the administrator
immediately gains a very rough insight into the inner workings of that machine. But trying to
look at whole system data to track down a deeply application-based problem is like cutting your
grass with a chainsaw. The solution works, but its not adapted specifically for the problem.
Moving along the evolutionary curve results in data that is more tightly focused towards the
problem domain. Adding to this is the nature of monitoring data itself. As monitoring data and
the tools that visualize it grow in quality, it gains drill-down capabilities to narrowly define the
issues that are affecting performance.
Fills Out Systems Vision
Computer systems are unlike physical systems in that it is not possible without tools to see the
electricity as it goes by on the wire. When building a physical system, it is easy to visualize the
machinations of the system because theyre right in front of its operator. But computer systems
require specialized tools just to see into the system.
Incorporating the correct suite of tools alongside the right process framework with which to use
those tools helps the operator better see into the inner workings of the system. Adding advanced
dashboards provides for better management decisions by non-technical users as well.
Enables Proactive Management
As stated in the previous section, its difficult for non-technical users to fully manage a system
when they dont have vision into the workings of that system. Leveraging instrumentation and
effective data processing of that instrumentation data means that managers can manage best.
Computer systems are meant to ease processes in the physical world, and not complicate them.
Thus, making the use of those systems as easy as possible for the individuals who need to
manage their interface with the physical world enables IT to make better, more proactive
management decisions.
67
Chapter 3
Summary
In this chapter, weve added a technical understanding to what we learned in the previous chapter
on the culture of IT organizations. Here, weve discussed BSMs relation to the bigger picture of
service management and monitoring as well as provided a historical basis for how that vision
came into being. Weve talked about the evolution of service management and the targeting of
service management to elements within the business system. Along with that, weve taken what
weve learned and provided an extended example detailing how the movement along that
evolutionary curve improves the quality of information coming into the business on the health
and quality of their critical systems. Throughout all of this, weve talked about how each element
in the history of service management eventually makes its way to BSM.
In the next chapter, well move away from our introductory conversations on BSM and dive
straight into the process of implementing it into a production environment. That discussion will
involve the eight steps of a BSM implementation: Preparation, Selection, Definition, Modeling,
Measurement, Data Analysis, Improvement, and Reporting. For each step, well discuss the
necessary tasks and elements to complete to ensure a successful implementation.
68
Chapter 4
Chapter 4: Implementing BSM

Its 11:15am the very next day and John and Dan find themselves sitting across the desk from
each other in Dans office, each thoroughly annoyed with the other.
If you dont want the alerts, we can take you off the notification list, suggests John, Or even
better yet, we can configure the notification not to alert you in the middle of the night. Youll still
get all of the accumulated alerts at 8:00am.
Dan realizes that John just isnt getting his point, Thats not where Im going with this. There
are two problems here. First, we shouldnt be getting these alerts in the middle of the night
anyway. If you need a new router to replace the aging one, well get you a new one. But know
that we should have a better capability to plan and budget for when these things occur. Second,
if we take me off the notification list, were back to where we were before the outage a month
ago. Something goes pop in the middle of the night and I dont hear about it.
but if we get an outage on a redundant system, were still operational! The site is still up. Do
you really care about every single device outage, John queries. Hes similarly irritated by this
conversation. For the past few weeks, every time his pager beeps, a phone call invariably follows
from Dan. He himself wouldnt mind a weeks worth of uninterrupted sleep.
Dan fires back, I just dont want to take that risk right now. The outage a month ago cost us so
much in overtime pay and customer givebacks that we blew our numbers for the quarter. Were
thinking now it might impact the annuals too. I dont want to end up in McWilliams office any
more than you do, he says, referring to FCGs CEO Mike McWilliams, but I will agree with
you that all these alarms are interfering with the other work I need to do. You know, the COO
work.
Dan and John stare at the wall opposite each other for a minute, both unable to think of what
next to say. The problem here is evident in the minds of both individuals. Back about 2 years
ago, FCG recognized the need to know when systems went down. They spent a not insubstantial
amount of capital that year to implement a monitoring system intended to provide them with just
the data theyre getting paged on now.
What theyre at the same time also realizing in Dans office that day is that monitoring data is
just thatdata. Alerting on the outage of every device is doing more towards deluging them with
notifications and less towards understanding the needs of their customer-facing systems.
Dan sits back in his chair, What we need here is some way to turn that data into useable
product. You need to know when the router goes down. I need to know when were not servicing
our customers. If one of our major customers cant get their supplies, its my head that ends up
in Mikes office and not yours. I want to know when their buyers are cursing our online system
instead of loving it.
69
Chapter 4
Heck, heres what Id love. Get me another little monitor I can sit right here on my desk that
just shows me how our systems are doingwhat our customers are feeling when theyre doing
business with our Web site. Something thatll give me the warm fuzzy that our systems are up,
were still meeting our numbers, and our customers are happy. Can you get me one of those?
John groans silently to himself, Performance data? Now he wants systems performance data
too? How am I going to get that to him as useable product?
Dan sees Johns concerns with his line of thinking, but he also recognizes the need for both John
and his IT department to start thinking strategically. Maybe he can turn this challenge into an
advantage for FCG, Heres what Im going to do. Its just about time we start setting
performance goals for next year. Im going to set a goal for you for next year to figure out the
answer to this problem. Ill take care of finding the funding and any business analysts you need.
You just figure out the technology.
John stands up to leave the office, wondering how hes going to figure this one out. Dan stops
him with a grin, Oh, and John. Do it fast. My wife hasnt been too happy either with your 4am
D-E-N-R-T-Rs.
BSM Provides a Business Focus to IT Operations

As you can see, our continuing story on First Class Glass (FCG) gives us a glimpse into the
maturation of their organization. That maturity aligns with their need for ever-better data quality.
Two years ago, FCG found that by implementing a monitoring system, they would be notified
when systems go down rather than waiting for customers to tell them about it. After 2 years of
middle-of-the-night pages, they have gotten better with parsing through the data provided by that
notification system. But one thing is still missing from that data: business relevance.
This is validated by the way notifications are arriving into Dans mobile inbox. He is receiving
data that doesnt make sense to him. Dan embodies the business side of the business, while
John embodies the technology side. John recognizes FCGs need for a new router, but his focus
on the tight IT budget and cost avoidance has led him towards running that device in production
well past its useful life.
Throughout this guide, weve discussed how a well-defined BSM service model with links into
the correct business systems can augment monitoring data with value. BSM provides a
quantitative measure of the quality of a service by measuring it against financial rules specific to
the business. Chapter 2 discussed how IT organizations must endure a process of maturation for
them to recognize the need for data quality. Chapter 3 analyzed how that organizational maturity
links to the evolution of IT Service Management and service management targeting. Our
historical look there helped us better understand the gap filled by BSM.
In this chapter, we begin the process of implementing our BSM solution and its surrounding
framework. We will begin by assuming that the implementing organization has the will and the
way to incorporate the necessary software and processes to successfully complete the
installation. The evolution of the implementing organizations service management has elevated
them to recognizing the need for BSMs data quality measurements within their organization. All
that is left is laying down the structure. At the conclusion of this chapter, you should be fully
cognizant of the tasks and activities necessary to implement the BSM solution that is right for
your environment.
70
Chapter 4
Three Reasons to Implement BSM

Before we get into the technical discussion, lets frame this chapters discussion around three
critical reasons why an organization might want to implement BSM. Weve talked about these
reasons in generalities up to this point, but it is important for us to restate them here so that we
recognize the underlying reasons BSM is important to our implementing organization.
Understand the Critical to Quality Services
A BSM implementation provides data to an organizations decision makers on how individual
elements affect the whole. In our chapter example above, this is shown by the single server
whose outage impacts the total operational standing of the customer-facing system. These
elements we call Critical To Quality, meaning that the quality of our overall ability to service our
customers is impacted by their reduction in service. As well learn later on, this is an important
differentiation for us to properly scope which services should and should not be a component of
our BSM service model.
Manage Daily Risk and Improve Business Decision Making
Also in our chapter example is the struggle of the business executive to make sense out of an
unnecessarily complicated system as it is presented to them. There, Dan was unable to make the
best business decisions in relation to servicing FCGs customers because the system wasnt
presented in a way that makes sense. By providing digestible data to business leaders, BSM
alleviates them from tactical decision making and focuses them on forward-looking, strategic
initiatives. Conversely, Johns daily operational focus on the IT infrastructure simultaneously
requires data that helps him shift resources as necessary to manage systems based on business
impact and solve problems. He requires a different set of data to help him reduce the daily risk to
operations. BSM provides that data to him and his administrators.
Initiate Service Improvement Activities
Lastly, as was identified by the problem with the new router purchase, a well-designed BSM
service model assists with the budgetary and planning process. That forecasting process, when
fully recognized and populated with pertinent data, will facilitate better decision making for
improving existing services. BSM is not intended to be a static implementation, but a rolling set
of constant data review and remediation. The data provided through the BSM system aligns
expenditures with where that money can be best allocated.
71
Chapter 4
The Seven Steps of a BSM Implementation

Throughout the rest of this chapter well be discussing the seven steps involved with a BSM
implementation. Youll note that we also include a first, eighth, step, titled Step 0 Preparation
that is needed to set up the teams, stakeholders, and project plan associated with the project. As
youll see in the sections below, the processes necessary to implement a BSM framework in an
organization are non-trivial. Incorporating BSM into existing business processes is not an installthe-software-and-go procedure. In fact, installation of the enabling software doesnt even occur
until Step 4 Measurement. Lets begin with setting up the necessary teams and project plans
associated with the preparatory step in the process.
There are two important cautions regarding the information in the rest of this chapter. First, here were
attempting to show the process involved with the incorporation of a non-specific BSM system. There
are multiple enabling software packages available on the market. Some incorporate the steps below
and some may use different steps. The steps used here are included as an example of one way to
set up BSM within your environment.
Secondly, in relation to the first point above, the information listed below is in no way comprehensive
to the entire process. As youll soon see, this process will likely take an extended period of time to
complete and in some ways is never truly complete. You will find tools and techniques that work
well in your organization that may not in others. So be aware that these processes are malleable and
should be customized to fit your particular organization.
Step 0 Preparation
Before beginning any project, the identification of team members and stakeholders is critical for
the division of responsibility within the project. Here within this step are a few key points
necessary to ensure that the project begins down the correct path.
Identify Key Project Members
Firstly, as was discussed in Chapter 1, one of the most critical components of identifying a
project team is the assurance of non-technocentrism. Although at first blush a BSM
implementation can involve much impact from the IT organization, BSM in and of itself is a
process-centric tool. The incorporation of too much technical input into the project team at the
outset can have the tendency to turn a BSM implementation into little more than an IT Service
Management implementation (e.g. with an inappropriate focus on IT elements).
72
Chapter 4
That being said, a BSM project team should include the following members:
Executive SponsorThe role of the executive sponsor is to fund the project and ensure
that that project stays within scope, budget, and relevance to its needs within the
organization. The Executive Sponsor will likely not be a regular contributing member to
the team, other than to provide overall guidance.
Business Service ManagerGenerally also the project manager, the Business Service
Manager is tasked with ensuring the overall success of the project as well as reporting its
status upwards to executive management. From a technical standpoint, the Business
Service Manager is responsible for defining the business services of relevance and
assisting with the development of their requirements.
Business Service Analyst(s)In conjunction with the Business Service Manager, any
Business Service Analysts assigned to the project team have the responsibility for
identifying and isolating individual business services, their requirements, and linkages
between business services. Their job here is to create the business service model and
populate that model with the necessary risks, linkages, and controls. Once the service
model is built and implemented and Step 5 Data Analysis has begun, the role of the
Business Service Analyst is to monitor and interpret the data being generated vis--vis
the model. This individual need not be of technical background, but rather a background
with deep understanding of underlying business processes.
IT ManagerSimilar to the Executive Sponsor, this individuals responsibility lies with

overall guidance as well as to provide the liaison between the IT Specialists below and
the Business Service Analysts above.
IT Specialist(s)Once the service model is identified, that model must be connected into
data gathering and service monitoring tools. This function may be a part of the BSM
system itself or more likely may be components of existing monitoring and management
tools. The role of IT Specialists is to facilitate the proper connection of those tools into
the BSM system.
Identify Stakeholders and Build the Project Plan

Also important in this first step is the identification of the ultimate stakeholders for the project.
Often, when executive leadership is driving the implementation of the system, they become the
stakeholders. As BSM tends to layer on top of existing monitoring systems, its incorporation
tends to add greater value to leadership than IT itself (which may already have the necessary
device monitoring tools in place). Looking at BSMs tenet of service quality, one very obvious
silent stakeholder in the project is the customer of the systems under management and ultimately
the business.
Once stakeholders are identified, the project leaders must begin by creating a project plan that
outlines each phase in the project. The next seven steps will assist with creating that project plan.
73
Chapter 4
Step 1 Selection
Step 1 embodies the identification of services that will ultimately become a part of the service
model. Here the analysts on the team will analyze business services from a process focus and
identify the lines of demarcation between individual services. Important here is that Step 1 is
merely an inventory and identification function. We are not yet defining services and their
representation. Here, we are merely getting our hands around those services that are in-scope, of
value to us, and out-of-scope for this iteration of the project.
The lead-in to this section mentioned that the BSM implementation can be a process that is never
truly complete. The project team must be very careful at the outset of the project indeed at this
phase to keep the initial scope aligned to services that are low hanging fruit.
You need not necessarily identify all the services and processes in your organization during your first
pass through the seven steps. Greater success is actualized by running through the steps more often
with fewer elements in the model than the inverse. Iterating through the steps with a smaller model,
especially during the initial adoption, provides early wins for the implementation.
Identify Critical and Measurable Business Services

The team must identify those services that are critical to quality. These will be core business
services that provide measurable value to the company. Measurable such that components
within the service can be quantified and a dollar value to the value of the components
functionality can be assigned. At this phase that assignment is not yet done. However, services
that have the capability of assignment are inventoried here in Step 1.
Shown as Figure 4.1 is a copy of the two service models we originally looked at back in Chapter
1. Here we see examples of a correct and an incorrect categorization of services as what must be
accomplished during this phase. Recognize that services to be inventoried at this phase should
align with a function of a business process rather than a function of the IT department. Later, the
team will link these services to the resources that provide functionality to the service. A critical
juncture here is not to get too technically deep during this activity.
74
Chapter 4
Figure 4.1: Copied here from Chapter 1 are our representations of a good service model breakdown on the
left based on the interrelation of business processes. On the right is an incorrect model breakdown focused
on individual devices.
Assess Services
While inventorying the services that the business provides, the feasibility of each services
ability to be easily categorized and quantified is also completed. Some business services are only
tangentially related to the established Key Performance Indicators of the business, and so will be
more difficult to quantify during early passes through the seven steps. The idea for our first pass
is to find those services that are most critical to the business and yet are easy to incorporate into
the service model.
Priority one here is to pick services that are most central and most critical to the business. Priority two
is to choose those easiest to work with. The reason for this is that the action of quantifying easy
services within an organization iteratively reveals new touch points for the later quantification of the
hard services in later cycles.
Assess Cost to the Business

Once the rough inventory of gross business services is completed, a ranking of those services
based on their business dependency is next. Here, the team will rank each of the business
services by the level of impact to the business that could occur based on an outage, failure, or
reduction of the service. This identifies that services Cost of Poor Quality.
75
Chapter 4
Step 2 Definition
Once the inventorying of services is complete and the selection of candidate services for initial
inclusion into the model has been made, those services need to be defined in terms relative to
BSM. Within this, Step 2 Definition, the team will identify and solidify the boundaries of the
services of interest.
Define Services
The first step here is to gain as much knowledge as possible about the structure, behavior,
necessity, and relevance of the service. This service might have ties into other services unknown
to the project team or may have elements that make it more or less difficult to define as later
steps begin deconstructing its dependencies. So by defining each service as comprehensively as
possible in this initial step, much is learned about their inputs and outputs.
One mechanism for best documenting the service characteristics is to use a spreadsheet. Identify
categorizations of interest about the service that will assist in later plugging this service into the
BSM model. Some of those categorizations could relate to those in Table 4.1 below.
When creating this spreadsheet, ensure that each cell within the spreadsheet is atomic. Hybridizing
data within an individual cell means that that category has not been defined as elemental as is
necessary.
Another handy tool to use in helping to visualize individual services is to draw up use cases and
the associated data flow or process flow associated with that use case. As an example, for a
purchasing system, the use case might include each of the components of a purchase, from
browsing, to inventory validation, to shipping cart population, to checkout, to item delivery.
76
Chapter 4
Categorization
Description / Utility
Service Name and Description
A unique identifier for the service, specifically one that can be

easily identified both within the BSM system and in external
documentation. Also, a short descriptor associated with the
service and its functionality.
Business Purpose
What is the reason for the existence of this service? What is it

intended to do?
Users
Who are the individuals who use this service? This

information will help identify the business impact later on.
Service Hours
The hours of operation of the service. Does this service only

run during business hours or must it be continuously
operational. This information will help build out the business
calendar. Also, what are the hours of support for this service?
Is staff on-site continuously to support it? Are people on
pager notification? This will help identify resolution time
metrics.
Location of Service
Also useful for the business calendar, this information

identifies the time zone and location of operation for the
service.
Code Ownership
Is this is a home-built service or one purchased from another

vendor? For customized services, what is the underlying code
that drives it (Java, CORBA, C++, .NET, etc.)? This
information is useful for pluggable end-point monitor devices.
Outage Impact
If this service goes down, who is impacted, why are they

impacted, and what are they unable to accomplish. Are there
mitigating factors (such a redundancy or lack thereof) to an
outage that either exacerbate or alleviate the outage? This
information later helps quantify the cost of the outage.
Abnormal Operation Impact
Much more difficult to measure, but what is the impact to the

business if the service incurs non-nominal operations. What if
it is slow? What if occasional hiccups in service cause the
service to operate in an unpredicted manner yet without a
failure? This information further quantifies the impact to the
business when these situations occur. Part of this category is
also the identification of which abnormal operations are to be
measured.
RTO / RPO
This category identifies the services Recovery Time Object

and Recovery Point Objective, or how soon must it be
brought back on-line and how much data can be lost as part
of an outage. This information helps feed into disaster
recovery metrics as well as individual service data loss
metrics.
SLAs and OLAs
Have any Service Level Agreements been assigned to

elements that make up the service? Or have any Operational
Level Agreements been assigned across platforms or teams
to legislate expectations?
Dependencies
Lastly, what other services does this service rely upon? This
information will be heavily used in creating the service model.
Table 1.2: The table above provides list of possible characteristics that could be used to identify a service in
the model.
77
Chapter 4
Define Service Requirements
Now that weve come to understand the nature of the service in a narrative format, we need to
translate the requirements of that service into quantifiable metrics we can use to measure its
quality. This component may be one of the most important activities to be done for each
individual service as this activity identifies the numbers by which the BSM systems
mathematical logic uses to translate a loss of service quality into a numerical result. Three
elements must minimally be identified and values assigned:
AvailabilityThe easiest of the three to quantify, during what hours of the

day/week/month must this service be available for consumption by its users? Is it a
7x24x365 service, or is this service required only for operation between the hours of 8:00a
and 5:00p? For some systems, like those that perform report generation or occasional data
manipulations this number may even be merely a few minutes or hours per day though
at certain highly specified times. Metrics to use here include: Hours of operation, days
and times of operation.
ReliabilitySlightly more difficult is the scoping of how often this service can become
inoperative or undergo a loss in service performance. Some services have a greater
tolerance for an outage. Some services include redundancy features that limit the scope of
an individual element outage. Depending on how the service was categorized, those
redundancy features may or may not be included in this calculation. Be aware that this
information will be used heavily in identifying and measuring the quality of the service as
compared with the desired level of service. Metrics to use here include: Acceptable mean
time between failures, acceptable mean time to repair.
PerformanceThe most difficult to identify, much of this quantification involves a

qualitative look at the services ability to perform to the needs of its consumers.
Performance metrics identify the bars by which the operational service is measured. If
reliability identifies how often the service can go down, then performance measures how
poorly the service can operate and remain within desired specifications. Metrics to use
here include: Per-action or per-transaction measurements of time delay, acceptable and
unacceptable time-wait.
78
Chapter 4
Define Problems and Opportunities

Related to the metrics identified above, this service was chosen for incorporation into BSM for a
reason. Some component of its operation in some way causes pain to the organization. That
reason should be identified in order to populate the project plan with data about the potential
future success of the project in relation to solving a business problem.
Some potential questions to ask to help with the population of this information: Does the outage
of this service cause an unexpected increase in cost to resolve (similar to the unexpected
problems caused by the outage in our chapter example)? Is the outage of this service a locus for
the outages of other services? Does this services outage cause pain on the part of customers who
may go elsewhere for their business? The problems and opportunities associated with this
service, quantified into specific financial guidelines, helps further identify the service and its
characteristics when built into the service model.
Define Critical Success Factors
Lastly for this step is the identification of what we hope to achieve by augmenting the
monitoring of this service with BSM. Here, we want to provide metrics that document the
improvements to service quality or availability we hope to achieve by going through this activity.
As we discussed in Step 1, our first run through the model will likely be with those services that
involve the most pain in our environment and thus the most potential for return. The
information here will be used later on in Steps 5 and 6 to help us improve the process and
recognize benefit from the activity.
Step 3 Modeling
Once an inventory of the desired business services has been collected, the connection of those
services can begin. Looking above in Table 1.2, each service should have a list of dependencies.
Those dependencies will go far in helping the team identify the connections between services.
The resulting service model will be a top-down decomposition of the business service in relation
to its constituent components. One artifact of this process will be the creation of ever more
detailed hierarchical diagrams identifying business processes in relation to the processes and
resources that support it.
79
Chapter 4
Model Defined Services and Dependencies

In Figure 4.1, the image on the left shows a series of 10 disparate system components that make
up the top-level Mission-Critical B2B Web System. Each of these disparate elements feeds
components above it, while each also requires information, processing, or resources from
elements below it. In generating this model, we create a hierarchy of dependent components that
(along with accompanying metrics) will eventually be plugged into the BSM system for logical
and mathematical processing. To create this top-down diagram, four formalized tools are often
used to assist with the visualization:
Failure Model Effect Analysis (FMEA)This is a tool used to identify and categorize
the risks associated with potential failures within a system or a process. This tool
identifies the possible failures that can occur within a system and prioritizes these failures
by the seriousness of their potential consequences, their frequency of occurrence, and the
ease in detecting them. FMEA is most often used as a bottom-up approach to failure
detection. This augments the top-down approach to generating our BSM model. Here, the
FMEA tool assists with identifying how the failure of a dependent component can impact
the quality of the top-level service.
Component Failure Impact Analysis (CFIA)This tool, a component of ITIL, identifies

individual ITIL Configuration Items and their potential for failure. Many of the same
characteristics as FMEA are examined through its analysis. The difference here is that
CFIA specifically analyses the potential for backups and redundancy between elements in
an attempt to find flaws in a system design.
Fault Tree Analysis (FTA)A more top-down approach is the completion of a Fault
Tree Analysis against the system. This thorough system for documenting the probability
of fault amongst various logically linked situations helps in categorizing the risk of a
system and where that risk may manifest. FTA is handy for adding numerical values into
the BSM system.
CCTA Risk Analysis & Management MethodologyCRAMM is yet another formalized

mechanism, specifically suited for identifying security issues within a system. CRAMM
breaks down the analysis into three stages: Asset identification and valuation, threat and
vulnerability assessment, and countermeasure selection and recommendation.
Complimenting the fault-based nature of the other two tools, CRAMM assists the team
with identifying where security-based issues may impact the system.
80
Chapter 4
Model Associated Metrics

As the tools are utilized, the service model picture begins to take shape. Once the picture has
evolved to the point where it is realized to the satisfaction of the team, metrics associated with
model elements are then attached to that picture. Attached Key Performance Indicators may
relate to user wait time, business metrics, or IT metrics like systems and transaction
performance. Mature organizations will likely already have many of the IT KPIs in existence,
though their data population may not be automated. Business metrics may be similarly available.
For organizations that dont yet have KPIs in place to measure success, this may require an
additional activity to find and document relevant metrics that make sense and provide value
specific to the business.
This metrics assignment is another of the highly important steps in our process. One implements a
BSM system because they are interested in obtaining these metrics and notifying when they go out of
specifications. As we discussed in Chapter 2, organizations at lower levels of maturity have more
work to do in implementing BSM, primarily because they need to develop the necessary Key
Performance Indicators.
Build the Service Model

The process above continues the process of filling out the picture, adding metrics of interest to
its structure. The concluding step in Step 3 is the finalization of the service model with all its
requisite components. Earlier weve discussed how the service model shouldnt look like the
graphic on the right of Figure 4.1. But at some point, the business processes that make up the
model must link into the data-generating tools that feed each process. It is here in the final step
where individual processes and functions are mapped into IT functions. So where an actual
database may drive the Inventory Database function or where a set of network devices may
enable the B2B Extranet function, those linkages are created at this point.
Depending on the enabling BSM software chosen, this process may be an automated process or a
manual one. It is not unreasonable to expect that some automation of the service model creation
can be handled through the enabling software. After all it is that software that inevitably runs
the model, so auto-discovery features are a good starting point for model creation. At the same
time, it is unreasonable to assume that the software will be fully able to realize and draw the
model without operator input. There are just too many possibilities for processes and their ties
into network devices and applications.
81
Chapter 4
Figure 4.2: Once the service model is fully realized, the next step is to connect its processes to the IT
functions that drive its data. This mapping is used by the BSM system to populate the model with metrics
information.
82
Chapter 4
Step 4 Measurement
Once the modeling is complete, our next step is to link the designated monitoring and measuring
tools into the service model. It is within this step where much of the effort within the enabling
BSM software tool begins. Here, for those services and their associated metrics previously
identified, categorized, and modeled, well begin the process of actually measuring the metrics
we aim to obtain. In later steps well take this information, analyze it for gaps in service, and use
it to drive change within the environment design.
In Step 0 of this process, we discussed how much of the BSM implementation process does not
necessarily require heavy involvement with the IT organization. However with Step 4 comes
much of the work needed by IT specialists. Here, the team will be implementing or otherwise
coding the necessary connectors that pull data from disparate systems into the BSM software
platform. Those skills are often highly-specialized and often are specific to the type of software
platform the BSM system attempts to connect into. It is important for IT to be part of the process
up until this point so they can prepare the systems from a technical standpoint for the monitoring
plug-ins necessary to begin measuring.
Remember too that BSM is not intended to rip and replace existing monitoring systems. Nor in many
cases is it intended to be a systems monitoring system of its own. The organization likely already has
monitoring tools in place that leverage technologies like SNMP, NetFlow, WMI, WS-Management,
and other management protocols on which monitoring data is already being collected and stored. The
BSM implementation can simply pull from that data for its metrics needs.
83
Chapter 4
Implement Data Collection

The first step here is to tie the BSM system into infrastructure monitoring. This tie-in may be
with built-in connectors that ingest into the BSM system or regularly output from the monitoring
system. These data collection tools are configured to query for and collect the metrics identified
in the model as designed in the previous step.
Some examples of existing data collection aggregators and elements that may already be present
in your environment include:
Java Message Service
WS-* / WS-Management
JDBC Data Access
Log File Reading and Scripts
Command-Line Interface Tools
Messaging and Messaging Connectors
File Transfer Statistics
Port Services (HTTP, HTTPS, DNS, LDAP, etc.)
Windows Management Instrumentation
Storage Management Tools
Enterprise Monitoring & Eventing Tools
SNMP and other Network Monitoring Tools
Databases and Database Monitoring Tools
Inventory and Inventory Management Tools
CMDB & Service Desk Connectors
Recognize that this step is non-trivial. Whereas previous steps involved moving around boxes on
paper to find the correct representation of the model, here we are actually manipulating computer
code to enable the connections. This process should involve the same levels of Configuration
Management hopefully already within the system the process of which requires substantial testing
and validation before integrating into a production environment.
The project team can eliminate one very important gotcha by ensuring enough time is built into the
project schedule to acquire the proper talent and properly build, test, and implement these
connectors.
84
Chapter 4
End user experience monitoring tools may additionally be attached into the customer-facing
interfaces of externally-facing systems. Well talk at length on these types of tools in the next
chapter. But for now know that the code frameworks that typically drive these customer-facing
tools often have built-in monitoring toolsets that allow for the integration into the BSM system.
Processes like synthetic queries and scripted actions can simulate the load of a particular user
and determine their wait time (e.g. their experience) while using the system. This information
ties into KPIs associated with customer satisfaction.
The BSM environment may also tie into Service Desk applications to get a time-based
understanding of how user experience drives incoming requests and complaints. One effective
KPI for measuring the quality of an external service is to monitor for incoming tickets alongside
end user experience monitoring. By doing this, an organization can discover what the pain points
are with their particular brand of customer. Some customers may be more or less willing to
handle elements of pain within systems. The rate of generated ticket workloads can drive a better
understanding of how those users are experiencing the system.
Measure Services & Gaps
As the team begins to implement data collection tools around the network, the BSM system will
begin measuring the quality of each listed service that makes up the model. Areas where data is
not yet incoming will show as gaps in desired metrics. We are not yet to the step where we can
begin implementing reporting and dashboards to visualize those metrics, so careful attention to
system data as it arrives into the system will identify where KPIs are being measured and where
gaps still exist.
At this point, a review of the existing data coming in as related to KPIs currently in place or
desired to be in place by the organization is an excellent double-check against the service model.
The service model, though considered frozen in its first iteration by the project team, may
require additional work to pull the necessary data required by the system. This can also manifest
as calculations that are lacking necessary data to properly represent loss as a measure of service
quality.
85
Chapter 4
Step 5 Data Analysis
Once the initial connectors are in place, the BSM system begins collecting data from the various
systems throughout the network. The BSM system, when configured with appropriate metrics
and logic associated with those metrics will apply cross-device and cross-application
computations to determine health and quality status.
Within Step 5, we will begin the process of analyzing the incoming information and trending that
information to see if the data weve expected to receive aligns with the data we intended to
receive. Once this process is complete and we are ensured the validity of the model, we can
begin to analyze the system to see where gaps in service occur. This may be based on bad service
quality or customer ratings, system overloading, element response time, or transaction
throughput.
Two tools used to find these gaps that well discuss later in this section are Fault Trees and
Impact trees. A component of the service model, these two tools identify where the root causes
and overall impacts to a service degradation or outage occur.
Analyze Returned Monitoring Data
Initial incoming data arrives in a relatively raw format. This raw data often needs to be converted
into a format useable by the calculations required elsewhere within the system. The process of
converting this data may involve multiple steps as data may require multiple refactoring based on
the target metrics required for it. As an example, performance data may arrive in a binary format
and need to be refactored into a numerical format for analysis in comparison with outage metrics.
It may require additional refactoring into a third format based on measured time to compute its
data in comparison with performance data from other kinds of systems.
Validate Measurements & Costing Assumptions
During this learning mode for the model, it is also important to validate that the measurements
as identified by the project team are to scale and include correct units. Converting KPIs into
measurable statistics can be a complex mathematical and logical task which can involve unit
comparison and conversion between multiple systems. This depends on each individual systems
capability to supply the data in the correct units. Its also essential to validate measurements
taken against the original data to make sure they are computationally correct.
One function of validating necessary measurements is the ratification of costing assumptions
made during the model generation. The initial determination as to the cost associated with a loss
in service quality or a loss of service altogether may have been based on faulty or misleading
information. Or the data arriving into the BSM system may not confirm the assumptions. It is
important here for the reliability of the system that the earlier costing assumptions are related to
the actual data arriving in-system.
86
Chapter 4
Build Fault Tree Analyses
Once the model is validated as correct the data within the model can be analyzed in comparison
with desired metrics to help identify where individual components of the system are not
performing to specifications. Now that the model is in place and functional, the reduction in
quality of each individual element can be related to how that element affects the whole. For
example, in Figure 4.3 the completed service model now shows how a change in performance of
the Inventory Database impacts each of the services that rely on the Inventory Database.
Here we can see that the performance of our Inventory Database has gone above our desired
specification of 5000 transactions per second. That reduction in quality directly impacts the
Inventory Processing Systems capability to respond to inventory requests fast enough. It also
affects the Order Processing Systems capability to fulfill orders as a fulfilled order will change
the level of inventory. Ultimately, each of these metrics directly relates to the customers
satisfaction or dissatisfaction with their experience.
Weve discussed thus far about how a major component of the reporting step involves the
digestibility of information specific to its consumer. Here we see how this information is
immediately of value for multiple consumer classes, depending on how it is presented. Nontechnical executives and business leaders can get a high level representation of the system and
associated (lack of) service quality. They can use this data to make decisions about the business
in general or additional purchases to augment the design. Administrators gain necessary
information to help them quickly troubleshoot the problem.
Its worth noting here that prior to having this model in place the organization may not have been able
to trace how unhappy customers were impacted by a loss of performance in a down-level system.
BSMs built-in fault tree analysis tools provide both the IT department as well as the business
leadership the data they need to make the right decisions. That decision may be to purchase a
second load-balanced Inventory Database or vertically scale the existing one.
87
Chapter 4
Mission-Critical
B2B Web System
Customers
Unhappy!
Customer Account
Auth. System
Inventory
Processing System
Order Processing
System
Inventory Delay
> 400 ms
Customer Account
Database
Inventory
Database
Orders
< 50/sec
Credit Card
Auth System
Transactions
< 5000/sec
External Credit
Service Proxy
B2B Extranet
Credit Card
Extranet
Figure 4.3: Our completed service model begins to show how a fault in the Inventory Database in this case,
the number of transactions per second going above specifications can impact the systems above that rely
on that database.
Build Impact Analyses

In concert with the generation of fault trees are impact analyses. This tool illustrates to the
viewer the anticipated impact to the service based on a component outage or reduction in service.
Notice in Figure 4.3s fault tree above that the information provided helps in troubleshooting the
problem down to the individual element level. From the perspective of an impact analysis, Figure
4.4 shows another representation of our service model with the same failure state. Here we gain a
perspective on how that fault is impacting our operations.
You can see that the same out-of-spec transactions rate is impacting the Inventory Processing
System to the tune of 37 inventory changes or updated being delayed beyond specifications. The
Order Processing System similarly cannot keep up with the load of the B2B Web System, setting
17 orders to a wait state prior to approval and final purchase due to the problem. Ultimately, 20
Users elect to cease their transaction (e.g. the drop rate) due to this problem. Relating the drop
rate further to lost revenue allows the BSM service model to recognize the loss of revenue
associated with the Inventory Databases problem.
Whereas the fault tree visualizes the troubleshooting component of fixing the problem, the
impact tree assists with recognizing how the problem affects users.
88
Chapter 4
Mission-Critical
B2B Web System
20 User
Drop Rate
Customer Account
Auth. System
Inventory
Processing System
37 Inventory
Changes Delayed
Customer Account
Database
Inventory
Database
Order Processing
System
17 Orders in
Wait State
Credit Card
Auth System
Transactions
< 5000/sec
External Credit
Service Proxy
B2B Extranet
Credit Card
Extranet
Figure 4.4: Relating the information within our service model to an impact analysis, we begin to see how the
out of specification performance of our Inventory Database is directly impacting other functionality.
Ultimately, our system is seeing a 20 User Drop Rate due to this problem.
Step 6 Improvement
Step 6 in our process asks the question, So, now what do we fix and why? Up until this point
our process has been driven by the need to populate the framework with data. Also a component
up to this point is the analysis of the collected data to find where gaps exist in the best design of
our system. Here in Step 6 we finally get to take what weve learned thus far and turn it into
productive change for the service under monitoring.
89
Chapter 4
It's worth mentioning here that it is entirely possible that no improvement may be needed. Monitoring
is, by nature, a long-term activity. Thus, the time delay between Step 5 and Step 6 may in actuality
involve a period of time. Two elements can characterize this time delay.
First, there may be no problems whatsoever with the service. Though we all wish this was the case in
our systems, the selection criteria for our first service was to find the one that was already causing us
the most pain. So, the likelihood of this occurring is low. What is more likely is
Second, finding no actionable data within our system means that something within our model is likely
missing. That may be an unknown connection to an IT function, a missing step in the process, or a
metric that is either missing or mischaracterized. If the project team finds themselves reaching Step 6
and finding little to action upon, circling back to Steps 2 or 3 may be in order for additional discovery.
Locate Problem Domains

Assuming that the service model in place is accurate and the metrics being collected provide
value to the organization, visualization tools within the BSM system similar to the fault trees and
impact trees discussed above will help the Process Analysts and the Systems Administrators
locate problem domains. It is important here for these two halves of the team to work hand-inhand in this process. Often the problems found are IT-related problems that manifest in a total
reduction of service quality.
Fault trees and impact trees are not the only visualization tools provided by BSM to aid in
troubleshooting. Chapters 7 and 8 will both touch on additional tools that BSM can provide to assist
with this step.
Identify and Resolve Gap

Less often, but arguably more critical, is the identification of process gaps. In these cases, the
process itself is actually causing the problem. In these cases, however, the incorporation of BSM
data into the justification provides quantifiable evidence of the need to change a process. For
example, if the BSM system begins to find that an unnecessary step in the Credit Card
Authorization System is consuming an out-of-spec time delay in completing a transaction, the
team may decide that the solution is to remove that step from the process.
Revise the Service Model
Lastly, in some cases the service model itself may be incorrectly represented. It is probable,
especially during early run-throughs of the model, that the service model may be lacking some
necessary component, metric, or connection, for it to properly process incoming data.
Recognition of this problem is arguably the most difficult to resolve as incoming data may mask
the true problems with the model itself. Earlier in this chapter we discussed how the model itself
should not be considered a static and immobile construct, but instead an organic representation.
Networks change. Applications and services are added and removed to the environment. So a
continuous review of the model in relation to its connection with various IT functions is critically
necessary as its IT underpinnings or processes themselves may change over time.
90
Chapter 4
Step 7 Reporting
Our final step in running through the seven steps is setting up the dashboards and associated
reports. This step is set to last as it gives the project team time to analyze data and refine the
model before tying down the model to specific reporting functions. Once reporting is configured,
it has the tendency to freeze the model. This is the case because model and characteristic or
metric changes will impact reports.
Once reports are enabled for consumers, they typically grow reliant on them. So, be aware that
making reporting available is often a step that involves strong configuration control.
We will actually discuss in detail the process of creating reports and dashboards in Chapters 6
and 7. So for this chapter well review only from a high level the steps necessary.
Implement Dashboards
Dashboards are often skinned web sites that dynamically update data as necessary. We say
skinned because a visualization image is often used as the base layer. Atop this base layer are
overlaid data representations like Stoplight charts, Control charts, Pareto charts, and other tools
that represent numerical data in a graphical format. There are some best practices for developing
dashboards that well discuss in detail in Chapter 7. But for now know that this step will involve
equal measures of graphic design, data manipulation and visualization, and knowledge of web
platforms.
Implement Notification
Notification elements can also be added into the BSM system. Similar to the problems we saw
within our chapter example, the executives and business leaders in the organization want to be
notified when situations occur that they can understand and resolve. They arent necessarily as
interested when individual IT functions (like Johns extranet router) have problems that they are
unable to act upon. Once the BSM implementation is in place, notification components can be
enabled that provide valuable data.
As a continuation of our chapter example, once FCG implements their own BSM system, they
may consider removing Dan from all the device notifications he is currently receiving. Instead,
he may want to know when the drop rate for the web site goes beyond a certain metric. The
failure of an individual router means little to him and his position and skill set cant add value to
its resolution. But when the web sites drop rate goes below acceptable parameters, that situation
impacts FCGs bottom line, which is a problem that he can identify with and help to resolve.
91
Chapter 4
Hand-off to Operations
The final step in this process is the ultimate hand-off to operations. This step involves fully
documenting the configuration, entering that configuration into the organizations configuration
control tool, and dissolving the project team or redirecting it to another spin through the seven
steps for another service.
Another component of this hand-off is the procedures necessary to monitor and maintain the
model. Spinning up another project team for minor updates to the model can be a waste of
resources. Thus, one final task for the project team will be to document those procedures in
enough detail that individuals later on can affect change.
A Carefully Planned Implementation Is a Successful Implementation

This chapter has attempted to provide some real-world ideas and information on how best to
begin the process of implementing BSM software and methodology within an organization.
Weve talked about how each of the seven steps iteratively brings the product team from an
empty sheet of paper to a fully-recognized service model with attached metrics, IT functions,
reporting, and notifications. Throughout the last three chapters weve talked about BSMs intent
to providing information that is understandable to its consumer. In this chapter weve shown how
implementing it actually realizes that expectation.
One element of BSM that weve glossed over to this point is the measurement of experience
from the perspective of the end-user. Looking at a services endpoints provides valuable
information not typically gathered by other monitoring tools. The information gathered there
specifically populates certain metrics within the service model. In Chapter 5, our next chapter,
we will sidestep into this discussion. Well talk about how to set up End User Experience
Monitoring, how best to manage it, and where it ties into the greater BSM picture.
92
Chapter 5
Chapter 5: End User Experience Monitoring

Its 4:05pm, late on a sunny Friday afternoon, and Dan finds himself staring longingly out the
window of his office. Hes thinking about his upcoming weekend golf game as he finishes up
what he believes to be an easy tag-up call with his biggest customer. The conversation starts off
well, both executives chatting about recent vacations, swapping golf handicaps, and discussing
the health of their respective companies.
But then his fellow executive, Joe Gear of Glass Emporium, lobs a grenade right into the middle
of Dans impending weekend, Oh, and one last thing. You know, Dan, weve always been frank
with each other. Weve known each other for years. Heck, GEs bought millions of dollars of
inventory from FCG.
Uh huh..., mutters Dan as he waits for the other shoe to drop.
I really hate to have to say this, says Joe, But my people have been complaining more and
more to me about the quality of that Web site youre running over there. When it works, it works
great. But sometimes it just seems to freeze on us. Since you moved most of the purchasing over
to this online store, our people are kind of trapped when your site freezes or we get error
messages.
Dan groans silently to himself as Joe continues, Dan, theyre starting to complain to me about
it. If the news is getting up to my level that the sites got problems, then you and I both know that
your sites got problems.
Stunned, Dan sits back in his chair. He and John have been working hard on this monitoring and
alerting project for just this reason. Six months ago his pager was going off all the time, waking
him at all hours of the night and irritating his wife to no end. He now sports this new computer
monitor on his desk that shows him the status of his systems. It notifies him when areas of the
system go down, so John doesnt have to report to him when problems occur.
Whats odd is that other than the occasional blip, that monitor hasnt shown anything major in
weeks.
Tell me more about these problems youre seeing, Dan asks concernedly, You guys are our
biggest customer. If your people are waiting around for our Web site to catch up, then that just
begs the question of how many of our other customers are justgoingelsewhere
Joe hears his pain, Well, you know Im no computer expert, but from the reports Im getting
from my people down in purchasing, the site will work just fine for an hour, or a day, or even a
week. But then sometimes itll just freeze. People will be entering in orders and the site will take
3 or 4 minutes to refresh a page. Sometimes instead of an order complete page we just get an
error page. More often than not, the transaction is still there when we come back into the
interface, and thats good. But all this extra time spent waiting for your system is costing me
money with my guys sitting on their hands.
Now, none of this is intended to be any kind of threat. Were friends. Our business relationship
has been great for many years. But Dan, youve got to do something about that Web system. My
people are pulling their hair out. Check it out. You should experience it for yourself some time.
Dan thanks Joe for the candid information. They agree to get together soon for another round of
eighteen and an answer to this problem. Dan then picks up the phone and calls John for a report
from his IT Manager.
93
Chapter 5
Nothing to report here, sir, John reports, Since we updated our monitoring system to watch
for performance metrics on the servers, we found a few servers that needed extra memory or
another processor. Those were all upgraded months ago. Since then, other than the occasional
processor spike, we havent noticed much in the way of problems. You should be seeing this too,
now that weve got that monitor on your desk. Youre seeing the same info that Im seeing.
Dan responds, But Im getting reports from our customers, just one today in fact, who say that
their experience with using the Web site has been really bad. Freezes. Lockups. Error pages. The
whole experience isnt good sometimes.
Well, theres nothing that I can report from this side. Were running one of the best monitoring
platforms you can buy, John proudly exclaims, Were monitoring dozens of performance
counters on everything from routers to switches to the servers themselves, and I cant report on
anything thats out of the ordinary.
Dan finds himself a bit ruffled by Johns flippant response. Hes concerned about the experience
his customers are feeling when they interface with his company. Hes purchased some very
expensive systems monitoring equipment. His other monitor shows a happy, healthy system.
But somehow all of that monitoring equipment still isnt capturing the essence of his customers
experience.
Meet me in my office now. Dan instructs John, Weve got more strategizing to do.
System Counters Alone Cannot Fully Represent the End Users

Experience
One of the key tenets of BSM is to provide to the business an understanding of its systems and
how those systems relate to profitability and the satisfaction of its customers. BSM does this
through the concept of Service Quality. In Chapter 1, we talked about how the quality of a
service is impacted not only by that service but also its supporting services:
A loss in a sub-system to a business service feeds into the total quality of that service. A
reduction in the performance of a system reduces its quality. And, most importantly, an
increase in response time for a customer-facing system reduces its service quality.
We talk so heavily about this measurement of service quality because it is that quantitative
measurement that helps the business come to understand its customers experience with the
services it provides. If those services are not of high quality, customers may take their business
elsewhere. If services do not perform appropriately, a business public relations can suffer.
Ultimately, if a business cannot service its customers to the level that they require, that business
cannot continue to operate.
One problem, however, with quantitatively measuring that level of quality lies within the devicecentric nature of IT itself. In Chapter 2, we talked at length about the maturity of the IT
organization and how a maturing IT organization finds itself better aligned with the needs of its
business. As well find out in this chapter, another benefit of a maturing IT organization is that
they have a much better understanding of the types of metrics that are critical to correctly
measuring their services quality.
94
Chapter 5
% Processor Time
Available MBytes
% Disk Time
Web Cache Hit Rate
Rejected Requests
Current Connections
GetRequests
% Processor Time
Available MBytes
% Disk Time
Database Reads / Sec
Database Writes / Sec
Log Writes Average Latency
Database Cache Size
Table Opens / Sec
External B2B
Web Cluster
Kerberos Auth.
System
Java-based
Inventory System
ERP System
LDAP Database
Oracle Database
3rd Party Credit

Proxy System
B2B Extranet
Router
Credit Card
Extranet Router
Packets Inbound / Sec

Packets Outbound / Sec
Average Latency
Dropped Packets
Frame Errors
Discards
Processor Utilization
% Processor Time
Available MBytes
% Disk Time
Pages / Sec
Bytes Sent / Sec
Page File % Usage
Interrupts / Sec
Context Switches / Sec
Figure 5.1: FCGs monitoring system is watching for system counters at multiple levels, but those counters
arent telling the story of their users experience. This figure highlights typical counters often enabled on
many systems. But these counters alone dont show the entire picture of what their customers are seeing.
95
Chapter 5
Looking at the Wrong Set of Data

Now what do we mean when we talk about the types of metrics that are critical? Take a look at
Figure 5.1 and think about the problems seen in our chapter example. In the situation described
earlier, John and Dan have worked hand-in-hand to set up system performance counters on their
systems. Those counters are watching the servers, network devices, and interconnections to see
where performance goes below set thresholds. But, as we see from the phone call with Joe,
measuring those counters at the systems level is not necessarily telling the true tale of the users
who log in and make use of the external Web system.
In our chapter example, the performance for individual processor and memory use may show
that individual systems are performing as normal. We may have plenty of available memory. Our
processors may never spike to more than 80% utilization. Our network switches and routers may
show a mere 10% utilization of the underlying network. But for some reason, the Web site still
slows for users from time to time.
The Egg Timer Problem
Obviously something is missing from the equation. What we have in the situation described
earlier is much like an egg timer. An egg timer performs a very useful task when cooking. This
little tool dings when a preset number of minutes has elapsed. That noise notifies us when our
eggs are ready to be taken out of the boiling water.
But does that timer actually tell us about the quality of those eggs when the proscribed time
arrives? Is the tool looking into the egg to verify that the contents are truly hard-boiled or is it
merely telling us that the expected amount of time that an egg should be done has elapsed?
Both of these questions relate to the problems with our chapter example. FCG has implemented a
comprehensive monitoring platform that has the rich ability to peer into multiple classes of
devices and report on their performance data. They can return and report on metrics that show
the health of individual systems and network devices. They can alert when that performance goes
above preconfigured thresholds. But are all those metricslike the ones that Figure 5.1 shows
actually representing the experience of the users on the system? The intent of this chapter is to
show that they are not.
96
Chapter 5
System Counters Are Critical to the Systems Administrators and End

User Experience Is Critical to the System Users
First and foremost, it is not the intent of this chapter to argue that these metrics are unnecessary.
In fact, quite the opposite is true. These metrics are critical for the best possible operations of the
servers that run the computing environment. Systems administrators everywhere will tell you
that being alerted when a processor is spiked at 99% utilization is necessary to quickly resolve
that problem. In the same vein, administrators need to know when a server is running out of
memory. Maybe a runaway process is consuming more than its fair share of memory and needs
reconfiguring or patching. Maybe the system itself needs additional memory installed to support
its workload.
These metrics are critical to the administrators of the system. However, what is also necessary is
an additional set of metrics that can represent the experience of a user on the system. Those
metrics can be gathered typically through one of two ways:
First, it is necessary for some automated mechanism to simulate the experience of

completing key business transactions. This automated instrument is referred to as an
agent, and can be installed onto one or more systems to complete its task.
Second, it is also critical to get a big picture of the entire environment. In order to do
so, a tool can be installed into that environment that watches for all the traffic that passes
by in that environment. This tool watches for situations in which contentions for resources
may be causing problems. Or, it may look for individual transactions that dont complete
or take extra time to complete. It may also recognize when externaland otherwise
unmonitoredforces may be contributing to the problem.
In either of these two cases, the concept of a transaction is critical to recognizing what this sort
of system is looking for. This end user experience monitoring tool needs to look for business
transactions or completed interactions between business systems, and how those interactions are
behaving. If transactions arent behaving properly, there will likely be an impact to the overall
operations of those systems on the network. Those delayed or failed transactions may not
necessarily impact the overall performance of the server, but they do manifest into the users
experience.
In Chapter 3, we talked about some of the different mechanisms by which monitoring data can
be obtained. Over the years, these different types of data-gathering mechanisms have evolved to
provide ever better quality of information through different vectors. Each different mechanism of
collection provides data that the others cannot.
For example, an agentless solution can more easily monitor the interrelation between systems
over the network than an agent-based solution can. However, due to their installation onto an
individual server, agent-based solutions typically have more access to the inner workings of a
system. Agent-based solutions can also repetitively execute synthetic transactions to a system to
judge their overall performance over time.
Lets take a look now at these two types of End User Experience (EUE) monitoring classes and
how they work. Each can work with the other to get a holistic picture of the system along with its
interrelation with the rest of the computing environment.
97
Chapter 5
Agent-Based Monitoring
The goal of agent-based monitoring is two-fold. First, by installing agents onto individual servers
that make up a business service, the agents can look deeply into the processes and activities that
make up that service. The agent can analyze behaviors within the server to look for individual
transactions, the success or failure of those transactions, and the quantity of time elapsed to
complete those transactions. Because the agent is installed directly to the specific server of
interest, that agent can be configured with relatively unrestricted access to gather and report on
the information it needs from within the server.
This is a very important pointagentless monitoring mechanisms can only query a server through
APIs that are published and enabled for external interfacing. These externally facing interfaces do not
typically expose all the data within a server, usually for security or functionality reasons. Thus, the
addition of agent-based monitoring improves the overall level of information to be processed by the
EUE system.
Second, agents can also be installed onto clients throughout the network. The agents on these
clients are then programmed to emulate an end-user performing key business transactions
throughout the day. Depending on the maturity of the EUE system in place, those instructions
may be capable of
Logging onto a Web site, completing a transaction, and logging off.
Interfacing with a third-party packaged application such as SAP, Siebel, or other shrinkwrapped software to complete a common task.
Working with a thin-client application such as Microsoft Terminal Services or Citrix

Presentation Server to identify the quality of the server-to-client experience. These same
agents can at the same time compare the server-to-back-end server traffic alongside the
client-to-thin client server traffic to look for anomalies.
Synthetically manipulating records within a database or through a middleware software

package to identify areas of performance lag.
By installing these agents on systems across the network, the EUE system can compare the
results of each transaction with those of other agents to see where individual sites may be
experiencing problems. In many ways, the idea with agent-based solutions is to determine the
total time necessary to complete a transaction from multiple locations to help identify the
characteristics and locations where poor application performance is experience.
Agentless monitoring, which well discuss in the next section, can require very little setup time to
configure. However, as you can see with agent-based monitoring, there is a period of configuration
necessary to identify the transactions of interest and record them into the agent. For mature EUE
software packages, this recording process can be relatively easy. The hard part is in identifying the
applications and transactions that are of monitoring interest to the business.
In Chapter 4, we discussed the seven step process to implement a BSM solution. Many of the same
processes that are used to build a BSM service model can be leveraged to assist in the process of
identifying the right transactions and service components to monitor. As with the service model
creation process, this transaction identification activity will be an organic, iterative process.
98
Chapter 5
Agentless Monitoring
Much different than agent-based monitoring is the concept of agentless monitoring. Here, code is
not installed to the individual servers that make up the business service nor are any transactions
synthetically generated to the systems under management. Instead, we leverage a central solution
that is configured to watch for all the traffic across the network. Once installed, the service
begins to look for a series of known metrics that can occur across the network:
When did a particular transaction start? When did it stop? Between what two servers,
services, and applications did it occur?
Did the transaction complete?
If the transaction did not complete, was it because the user cancelled it or was it due to a
network problem or poor performance?
If the transaction did complete, did it do so within an acceptable amount of time? How
much time was spent on the server, the network, and the desktop?
What are the network conditions across all hosts? Is one host consuming inordinately
more bandwidth than normal? Why is that occurring? Is that consumption affecting
transaction completion?
A concern in some networks is the promiscuous nature of agentless monitoring. An agentless EUE
tool is indeed watching for many (and sometimes all) traffic types in a particular network segment. In
some environments, this may go against established security policies. Thus, there may be political
pressure not to incorporate an agentless tool due to the type of collection it is performing. That being
said, the benefits associated with an agentless monitoring solution must be placed against the
security liability associated with allowing it on the network. In addition, although the monitoring is
promiscuous, many agentless monitoring tools operate by inspecting just what they need in the
network traffic and retaining only the information necessary to classify the results of that inspection. In
addition, the agentless monitoring tool should have the ability to mask out sensitive information such
as passwords. In many cases, the benefits to the organization far outweigh any perceived security
risks.
This agentless solution, in combination with the business logic programmed into the BSM
service model, will determine the business impact associated with any transactions that did not
complete properly or within a proper amount of time. When a transaction does not complete
properly or timely, the services quality is reduced. Wrapping this idea into the greater picture of
BSM, the reduction in service quality directly impacts the dollars-and-cents calculations
provided by the BSM system.
Well talk more about the interconnections between EUE and performance logic and BSMs financial
logic later in this chapter.
99
Chapter 5
Obviously, in order for this system to do its job, it has to understand the traffic it is receiving. If a
Web server is communicating with a Web browser client, that traffic needs to be understood as a
Web request followed by a Web server response. It can also be a series of requests and responses
that make up a complete business transaction. This type of communication is programmatically
easy to understand. Where mature EUE systems provide extra added value is when those systems
can additionally translate non-Web application traffic.
For example, if the EUE system understands the communication that occurs between an SAP
system and an Oracle back-end database, it can watch the traffic between those two systems and
look for individual transactions. The same holds true with any packaged application. When
considering an EUE system, consider one that includes the special decodes that can translate
traffic as necessary between the systems that ultimately make up the business service model.
As you can probably guess, for a system that is watching traffic all across the network, the sheer
mass of traffic that system needs to process is huge. One of the most critical parts of an agentless
EUE system is merely to know what kinds of traffic to process and which to discard.
Understanding the CNS Spread

In the end, the most important part of this system is in converting this huge quantity of data into
something that is useable by its users. In many cases, individual issues within a business system
can be related to one of three potential locations:
The clients that are attempting to make use of the system.
The network that allows those systems to communicate with each other.
The servers that process and respond to requests
For any issue that is raised by an EUE system, the problem most often can be related to one of
these three elements. As an example, for a transaction to complete, there is a quantity of time
required for the client to make a request, the network to transmit the request, the server to receive
and respond to the request, the network to return the response, and the client to process that
response. A correctly implemented EUE system should be able to provide a spread of timing
information for each of these elements.
Total Transaction Time
Time for Client to Make

a Transaction Request
Time for Client to Process

Transaction Response
Time for Network to Transmit
Transaction Response
Time for Network to Transmit

Transaction Request
Time for Server to Process

Transaction Request
Figure 5.2: The total time necessary to complete a transaction is comprised of multiple steps in the process.
The CNS Spread identifies each of these elements and their relation to the total transaction time.
100
Chapter 5
This information on the spread can be used in multiple ways. As a troubleshooting tool, it
comes in handy for isolating where the problems with transaction processing are occurring. As a
component of a notification system, it can alert administrators when individual components of
transactions are not completing within specifications. As a Help desk mechanism, it can be used
to assist users with identifying why their experience is not at their normal level. Most important,
this information can be used as a first step in understanding the true nature of the users
experience and what elements are driving that experience level.
Figure 5.2 shows only a very simple example of the spread. This example shows the interaction
between a single client and a single server. Most business systems and their transactions involve the
communication between multiple entities to complete a transaction. It is that interrelation that can be
captured by EUE monitoring and is one of its greatest value propositions.
Watching How Users Interact with the System

Even more important is that EUE need not necessarily be strictly a tool for troubleshooting and
remediation. Once its elements are set into place, an additional level of visibility is gained into
how the users of the system are interacting with the system. Think of the situation weve set up
already in this chapter. An EUE system with both agentless and agent-based tools is monitoring
the users experience and the overall network health from the perspective of individual
transaction. That system is reporting on the health of each transaction as well as its status.
This information can also be gleaned for useful metrics data as well, telling the business how
customers are making use of their on-line systems. For example, maybe customers tend to steer
clear of certain areas of the system. This may be due to a design decision or a challenge in the
users workflow that they naturally do not use. Maybe the users arent interested in particular
system areas.
A fully-realized EUE implementation can help the business in learning which areas of the system
are interesting to users. The business can then strategically recognize and exploit those interests
for additional financial gain or better customer satisfaction. Conversely, the same system can
look for areas when users bail out of the system. At how many seconds of delay does a user go
elsewhere? When a particular page or screen is sent to the user, does that user interact with the
supplied page or screen, do they navigate away from it, or leave the system entirely. This kind of
user-specific information can be critical towards improving the users experience with the
system, and ultimately the bottom line of the business.
Thus, an EUE system can be as useful for the marketing department as it can be for the IT
department and business leadership.
101
Chapter 5
An Example
So implementing EUE doesnt necessarily replace typical system counters used by IT in
measuring the total performance of their systems. Instead, it adds a new class of counters that
watch for individual user interactions with the system. As users interact with the system, an EUE
system can measure those interactionson a click-by-click basisto ascertain a feeling for what
the overall users experience is with the system. Though much of this measurement is involved
with the measurement of elapsed time and time delay, this is not the only tool.
Time tells the tale of how much time users are waiting on system elements, but the experience
also relates to individual transactions that dont complete or only partially complete. The true
tale of the users experience is the aggregation of all these metrics.
Lets take a look now at what might have occurred had FCG implemented an EUE system to
augment what IT Director John called the best monitoring platform money can buy.
Visibility
With traditional systems monitoring tools, the counters being measured are based on the
performance of the entire system. So those counters may not necessarily pick up problems when
they arent of a nature large enough to affect the system as a whole. System counters typically
watch for resource overuse. But the timing delays that EUE is watching for typically dont result
in that level of resource overuse. So, the visibility into the specific type of problem FCGs web
site is experiencing is not being measured by their whole-system counters.
Had FCG implemented an EUE system to measure timing delays, their system would have
picked up on the individual transaction delays that caused users to wait multiple minutes between
clicks. That visibility would have alerted them to look for problems at a lower level of the
service model. Perhaps a piece of un-optimized code within the purchasing system was causing a
counter to time out in certain circumstances. The delay associated with that counters timeout
could have been at the root of the problem. Only an EUE system can peer deep into the
individual transactions to see the precise conversation in which that counters delay occurred.
Prioritization
Because FCGs system didnt include EUE monitoring, and because the problem didnt impact
whole system counters, they were unaware that the problem was even occurring. FCG was
unable to prioritize resources towards fixing the problem because of their lack of visibility.
In other examples, EUE monitoring may identify multiple locations in which problems are
occurring. But they also provide data as to which systems are truly affecting users. If a dozen
open problem tickets are created by the help desk associated with issues on the web site, EUE
can help identify which of those problems are actually affecting the user population. This grants
IT the ability to assign resources first to the problems with the greatest business impact.
102
Chapter 5
Resolution
IT administrators cant fix a problem when they cant see the problem itself. Lacking the tools
that dig deeply into each individual transaction, it is challenging to identify problem root causes.
Because whole system counters do not necessarily completely describe the workload being done
on a particular network device, it is necessary to use tools that can.
EUEs agent-based tools have the ability to simulate transactions between a representative user
and the system itself. Those transactions can be run automatically throughout the day and from
multiple locations to form a representative understanding of how a sample user might be
experiencing the system. Lacking this capability, administrators would need to regularly and
manually click through the system to get a feel for its health.
Specific to each measured transaction is its spread of timing information between ownership by
the client, network, and server. This spread is an excellent starting point for locating deviances.
Drilling down from that point, additional debugging information specific to the transaction can
be viewed by the administrator to further isolate the problem. Deconstructing the problem in this
way speeds resolution because it helps to focus troubleshooting efforts to the specific issue at
hand.
Improvement
Lastly, once the problem is known, it is easier for IT to identify how best to resolve that problem.
ITs typical response for many problem is to add additional hardware to the environment to
support added load. But in many cases with complex systems this is not the most effective fix.
Where traditional monitoring shows no problem but EUE monitoring shows a delay, the problem
may not be attributed to a hardware resource shortfall. It may be attributed to a code fault or a
misconfiguration. EUE tools allow IT to more correctly improve the system without defaulting to
costly hardware expansion as its only tool for resolution.
103
Chapter 5
Impacted Technologies
Among other elements, the value of an EUE system is directly related to the types of service
classes that system can interact with. For example, an EUE system that is limited to web traffic
only will lack critical visibility into the packaged applications and legacy programs that typically
interact with back-end servers When an EUE solution cannot translate the communication that
occurs on the back-end, then a complete vision into each transaction is not fully recognized.
Lets take a look at five classes of business services that are typically part of a typical business
computing environment. For each, well analyze how an EUE system can impact their
operations.
Figure 5.3: A fully-realized EUE system should tie into multiple classes of business services as well as the
network they reside upon.
104
Chapter 5
Web Front End

Arguably the most visibly useful for external-facing services, web front ends stand to gain
substantial benefits through implementation of an EUE system. The web is nearly exclusively the
mechanism for external e-commerce, and thus is the greatest candidate for inclusion in a BSM
system due to its impact on business profit and loss.
From a technical perspective, web-based front-ends are also relatively easy to monitor for traffic.
This is due to the fact that web traffic is highly transaction-based and easy to translate into a
usable form. Due to these paired benefits web front-ends were the target for early
implementations of EUE solutions.
With web front-end systems EUE has the ability to see into each users interaction with the
system. Due to the fact that nearly each change of state in a web system involves a click on a
web page by a user, the result of these clicks can be monitored and reported on. The interaction
between the client, the network, and the web server can be analyzed to look for problem areas or
locations where delays are experienced. Those delays can relate to the server not responding
quickly enough to a request for more data, or they can be related to a user reading a page or
trying to locate the next click in the interface.
This meta-analysis of the entire process of navigating through a web front end can assist the
systems administrator with the task of managing the system itself. It can also assist the
webmaster with building a web site that is friendly to its users. By analyzing the click-through
patterns of the sites users, the webmaster learns where additional effort or redesign is needed to
improve the overall experience.
As web front ends are typically the face of the company for e-commerce, they are typically the lowhanging fruit for an EUE implementation. They also closely tie into the BSM model that EUE feeds its
data into.
Chapters 6 and 7 will go into more detail on achieving this management and operational value out of
a BSM implementation. EUE and the data it provides is one component of the overall BSM service
model.
105
Chapter 5
Packaged Applications
Most business systems dont stop with just the web server. Web front ends typically require
additional data from one or more enabling back-end services. In many cases, those services are
packaged applications like SAP or Siebel for ERP data, or Oracle plus Oracle Forms for database
connectivity and customized business applications Unlike web services, where all web traffic
relies on the common HTTP protocol for data transport and rendering, these packaged
applications may have their own protocols for getting data from the client to the server. These
applications may not necessarily use a web browser as their data rendering tool at the client side.
They may have their own desktop clients that have additional and/or different functionality.
Thus, the EUE system used to watch the traffic for these sorts of applications needs to
understand the traffic that occurs between client and server.
An effective EUE system will come equipped with the translations or special decodes
necessary to see into the traffic between the servers and clients that make up these packaged
applications. For packaged applications that use multiple servers for distribution of various
workloads, the EUE system will also need to understand the server-to-server communication as
well. This is necessary because not all issues are directly related to the first-tier client-to-server
traffic. Some issues may occur between the individual servers that work together to make up the
total service provided by the packaged application.
When considering an EUE system, look for those that can support the packaged applications
typically enterprise-level applicationsthat are components of your BSM service model. Good
EUE solutions should support easy-to-use connectors that allow for the direct listening for traffic
between all elements of your packaged applications in relation to clients and any web front ends.
Be careful with some EUE solutions. They may only include monitoring of web transactions. This
limitation will restrict the level of information you may require out of your packaged enterprise
applications.
Thin Client
One class of packaged applications that requires special attention involves the delivery of
applications through a thin client interface. These applications such as Microsoft Terminal
Services or Citrix Presentation Server are positioned in front of applications to reduce the overall
effects of network latency of bandwidth required to deliver that application to its users.
Consider the situation where the network trafficthe conversationbetween an applications
server and its client is particularly chatty. In this case, positioning the client far away from the
server in terms of network proximity means that that applications response time is negatively
affected. Because of the network distance between the two halves, the traffic takes a longer
amount of time to get from client to server. This increased time means that the client will operate
much slower than in the case where the client is close in proximity to the server. Thin client
applications relieve this problem by positioning the client next to the server and passing only
screen updates and mouse/keyboard movements between client and server.
106
Chapter 5
The use of EUE for thin client applications is multifold. First, in situations where applications
are experiencing poor quality, an EUE systems CNS spread can be used to determine where the
delays are occurring. If it is determined that the client and server would perform better when they
are closely positioned, then EUE can justify the move to a thin client solution for the problem
application.
Also useful with thin client applications and the analysis of EUE data is the determination of data
problems for existing thin client solutions. Due to the aggregation of multiple users onto a single
server in most thin client solutions, the actions of one user can impact the experience of others.
For example, one user whose activity on the server uses too many processor resources will cause
a slowing down of performance for all others on the server. A fully-realized EUE
implementation can be used to determine if the problem relates to the thin client server, the
application server, the network between them, or the network between the thin client server and
the client itself. In another example where only a single server in a farm is experiencing a
problem, EUE can assist with isolating the problem server to help with a quick resolution.
Effective EUE systems should also be able to align the traffic in such a way to isolate userspecific traffic not only from client to server but also thin client server to back-end server. By
isolating traffic in this way, an end-to-end understanding of the traffic patterns can be used in
troubleshooting and remediation.
Middleware
Although middleware is not always an easy win for an EUE implementation, its incorporation
can benefit from many of the same factors associated with packaged applications. As end users
do not necessarily work directly with the environments middleware tools and code frameworks,
their incorporation into the total environment analysis can be challenging. However,
incorporating middleware monitoring into the overall EUE system ensures that the end-to-end
transaction is being monitored. An effective EUE system will include modules that allow
connection into the pluggable frameworks that make up most middleware.
Databases
Databases are similarly challenging as are middleware applications. Though they can be a critical
component to the overall performance and experience measurement process. As databases
contain the whole of the data needed by the business system, their inclusion can be critical in
determining the overall health and performance of that system.
Databases that are overloaded in terms of raw performance can specifically impact the delay
associated with all other members of the system. This is due to their nature near the bottom of
the service model. Inclusion of necessary database monitoring capabilities will help ensure that
the full transaction measurement includes client to server to data store, and back if necessary.
In addition to all these, it is also worth stating that the network itself and the devices that make up that
network are an impacted technology. Individual network components and their performance can have
a net effect on the overall measurement of user experience.
107
Chapter 5
Importance to IT Goals
Thus far in this chapter weve talked about the utility of an EUE implementation and how it
relates to the business as a whole. But there are specific benefits to IT that can be gained as well.
Traditionally, IT has relied on systems management and monitoring tools to provide them with
the necessary information they need to troubleshoot their environments. However, as we
discussed earlier with our egg timer metaphor, those tools provide shallow levels of data. A
mature IT organization will recognize the need for deeper levels of monitoring data to assist with
the administration of its systems. That same IT organization will see how the concepts of EUE
can provide that data by digging deeper into the individual transactions associated with a
business services operation.
In this section, lets take a look at a few of these benefits specific to IT that can be gained by
implementing EUE. From aiding in problem identification and prioritization to augmenting prefailure warnings, EUE provides a framework for problem isolation. From an organizational
standpoint, its information also helps in speeding the troubleshooting process by eliminating the
finger pointing problem and aiding in inter-team communication. Most importantly, these
work together to enhance vendor accountability and ultimately customer satisfaction with the
system.
Problem Identification
Traditional monitoring systems have the capability to alert when a problem situation or SLA
breach occurs. However, the alert that these systems provide is typically limited to the individual
situation that tripped the alarm. Digging deeper into the problems root cause is limited, because
an alerted problem can be comprised of multiple, individual sub-problems, or can be one that is
buried within another layer of the system. It is due to these limitations into visualizing the
problem that the major time element associated with many problems is simply identifying what
went wrong.
As we discussed earlier in this chapter, EUEs focus on transactions means that a users issue
with the system can be understood from many different levels. The spread of an applications use
of client, network, and server resources is an excellent starting point for the identification of a
problems root cause. This spread provides the troubleshooting administrator a more defined
starting point for tracking down the resolution to the problem.
Moreover, digging deeper into each individual transaction allows the system to alert the
administrator when problems occur at every step along the path of the transaction.
Deconstructing each individual mechanism that makes up the business system helps with the
atomization of each service element. This process of deconstruction is very similar to the process
used in generating the BSM service model.
Incorporating the necessary thresholds for this alerting is a necessary component for the
administrator to complete. Determining what those alerting thresholds should be can be a timeconsuming process. However the benefits of knowing when transactions are not within specifications
often outweigh the effort.
108
Chapter 5
Prioritization
Even in mature IT environments there are situations where multiple alerts go off at once. When
this occurs it can be problematic to understand which of these alerts are important to the
functionality of the business and which are of lesser importance. For example, there may be a
dozen alerts active within the management system, but eleven of those alerts are actually minor
problems that do not require immediate attention. One of those alerts could be one that impacts
the entire user base for the business. The process of understanding the true nature of the alert and
prioritizing its remediation can be augmented with the information brought forward through an
EUE system.
Due to an EUE systems tie into BSM and the BSM service model, each element that makes up
the business service under management has an impact assigned to it. Those impacts relate to the
number of affected customers and the amount of dollar loss associated with a reduction of
service quality. When the situation occurs where multiple alerts are presented, EUE and its tie
into BSM helps the IT department understand the business impact of each alert. With this
information, IT then has the resources it needs to resolve the most critical and impacting issues
first, while de-prioritizing less critical problems.
Pre-Failure Warnings
It is common that a user interface experiences a period of pre-failure before an actual failure
occurs. This pre-failure period may relate to an increasing load on the system or a component
that trending shows will soon not be able to keep up with the demand placed upon it. What is not
common is the recognition of this condition occurring before the failure actually appears. In
these situations, only comprehensive trending and historical analysis can assist the IT department
with finding these issues before they happen and augmenting the system with additional
resources as necessary.
Too often with IT organizations at lower levels of maturity, service failures occur because IT
does not have enough information available to recognize when a system requires additional
resources, more computing power, or a reconfiguration. EUE can provide that information by
continuously monitoring the environment for transaction timing. Trending analysis can be done
for service and individual component performance related to transaction speeds. When that
analysis points to an impending failure at some point in the future, IT is better prepared to add
additional resources as necessary. This also enhances the budgetary process, as fewer surprise
purchases are necessary for IT to maintain the environment.
Consider the following non-IT example as a metaphor for this situation. What if the power company
didnt monitor power usage in various parts of the grid? Lacking pre-failure and trending analysis of
power usage could mean that building and expansion in certain areas could cause a major loss in its
ability to serve power.
As IT organizations mature, their services grow towards a utility status similar to the power company.
In these cases it is possible for IT to maintain always-on service, planning for expansion rather than
being forced into it by external forces.
It can be further argued that as the IT organization matures, the business matures with it. The
business ultimately grows to require this always-on capability as IT discovers the ability to provide it.
109
Chapter 5
Finger Pointing Prevention
When critical situations occur with a business system, business revenues are on the line until the
problem is fixed. Every second counts in these situations, so solving the problem quickly is
critical to operations. The problem within many of these situations, however, is that the typical
response by IT is to get everyone into a room and break down the problem.
This isnt necessarily a bad mechanism for isolating a complex problem. IT individuals typically
have experience within a single component of IT. Im a network person. Youre a server person.
Over there are the database people.
Few individuals truly understand the entire system from end-to-end with the technical know-how
to understand problems as they occur. Thus, the circling-the-wagons approach in many
organizations is the only way to get enough experience in one location to track down the
problem.
The problem here is involved with IT personnel ownership of their piece of the computing
environment. Professionalism on the part of individuals means that each person in these meetings
can default towards proving why the problem does not lie within their scope of management.
Individuals in these meetings are incentivized by professional pride to find the problem in other
areas of the computing environment. This, combined with the stress of the problem itself, can
lead to finger pointing within the group, each person trying to find the problem in other areas
of the environment.
EUE assists with the finger pointing problem first and foremost through the information
gleaned through its CNS timing data. Here, when a problem occurs that is critical to operations,
the first step can be to look for where the transactions client, network, or server times vary from
the baseline. The timing information across multiple systems and multiple platforms assists the
troubleshooting team with more quickly tracking down the problem.
Even more important is the expensive nature of the group meetings themselves. Considering the
per-hour cost of bringing together large numbers of people to identify the problem domain costs
the organization in time and money. The opportunity cost of bringing key members of IT
together for problem resolution is the effort spent on either fixing the problem or performing
other necessary critical work. In organizations with lower levels of maturity, these major
problems can occur often. Here, IT finds itself in a state of perpetual firefighting, which limits
its ability to move towards higher levels of maturity.
A fully-realized EUE system can free these senior-level resources to enable them to work towards
strategic, maturing activities rather than tactical, firefighting activities.
110
Chapter 5
Clear Problem Communication

Aligned with the section above, EUE additionally assists with providing a clear distinction
between problem domains. IT individuals typically grow deep skills within their scope of
management. Areas outside their scope of management have a different vocabulary as well as
processes for administration and troubleshooting. Thus, when problems occur that span across
multiple scopes of management, the conversation between IT individuals grows complex and
adds to the problem. For example:
The Cisco administrator doesnt speak Windows Server.
The Windows Server administrator doesnt speak Oracle databases.
The Oracle DBA doesnt speak SAP.
The SAP administrator doesnt speak Cisco.
Very few single individuals, especially in enterprise environments, speak the language across all
the layers of a business computing environment. Thus, a centralized framework is necessary that
can speak some elements of all the necessary IT languages. That framework assists for locating
and isolating issues as they appear, but more importantly it is recognized as one that can talk to
each individual in their primary IT language. An EUE system is a potential framework that can
support this functionality.
Vendor Accountability
Another issue entirely is involved with holding feet to the fire for vendors and their
applications that the IT organization must support. In most organizations the computing
environment is made up of a number of individual applications that work together to provide the
business service.
One common problem with this cross-pollination of applications is the tendency of individual
vendors to throw an issue over the wall when support is requested. As an example, the
database vendor suggests that the problem is related to the middleware component. So, a call to
the middleware component is necessary. The middleware vendor believes the problem lies
within the operating system. So, a call to the operating system vendor support is necessary.
Getting all three of these vendors on the phone at the same timeand more importantly the
correct people within the vendors support organizationis challenging if not impossible.
111
Chapter 5
Support technicians associated with many vendors are often incentivized by closing cases rather
than fully completing them. Thus, some vendors will tend to throw issues over the wall rather
than work them through to completion. This is particularly cumbersome when multiple
components of multiple vendors are part of the same business service. It is often functionally
impossible when large levels of business-specific customization are done with the vendors
product.
The data provided by an EUE system provides easily-transferable documentation about the
behavior of a vendors application. Data from the EUE system can be provided to the vendor as
clear documentation that the problem lays within their product. In some cases, this information
can be used to assist with directly pinpointing the problem within their code. When code issues
and custom vendor patches are necessary to fix a particular problem, this documented evidence
is essential.
This data helps in convincing the vendor that the problem does indeed require a code revision.
This same data can also then be used by the vendor in identifying the area in which the fix is
necessary.
Customer Satisfaction
Most importantly, all these elements tie into the BSM tenet of customer satisfaction. A service
with a high level of quality directly relates to improved customer satisfaction of that service.
When IT can proactively identify issues and resolve them without attracting the notice of the
user, then they are working at a high level of maturity. That high level of maturity helps IT align
better with the needs of the business, and ultimately drive business profitability.
EUE Ties into BSM

It can be argued that for the business leaders of an organization, EUE is all that matters. These
individuals care most about the servicing of their customers and how their customers are
impacting the bottom line. When customers are not feeling the best possible experience through
a business systems, then it is the business leader who gets the call. At the same time, when
customers enjoy working with the business, they tend to keep coming back. It is due to this
prioritization that EUE is in many ways an inseparable component of BSM.
112
Chapter 5
BSM Service Model

Financial Logic
Traditional Device
Monitoring
End User Experience

Performance Logic
Availability Logic
Figure 5.4: Data from three components come together to fill the BSM picture: financial logic from the BSM
service model, availability logic from traditional device monitoring, and performance logic associated with
end user experience.
Necessary for a Complete Picture of BSM

Weve talked in this chapter how the EUE model for monitoring a users experience draws much
of its framework from the same elements used to build the BSM model. This is not done by
accident. As we discussed in Chapter 1:
[BSM] ingests availability and performance data and outputs quality-related metrics to
the business on the health of the networks business services. BSM applies a dollar value
to the reduction in quality of each identified service and serves up the information in the
form of dashboards viewable and understandable by both IT and business leaders.
As you can see here, and as pictured in Figure 5.4, the incorporation of EUE into BSM provides
a major portion of the data that feeds into the service model. The service model itself is a
construct of BSM, and it is that service model that feeds the financial information associated
with a business services quality to the right people. But, there must be a set of driving data that
feeds quality information into that model. The combination of the financial logic of the BSM
model, individual device monitoring that comes through typical systems monitoring solutions,
and the experience state logic associated with EUE is what ultimately fills out the entire solution.
113
Chapter 5
Importance of Using Both Methods for Monitoring

Though not absolutely critical to the overall functionality of the tool, combining the use of both
agent-based and agentless methods is what illuminates the best picture of the environment.
Agent-based solutions are necessary for driving simulated accesses into the system along with
measuring the resulting behavior.
Agent-based solutions are important so that critical timing information specific to users
individual transactions can be measured from multiple points across the network. Also, agentbased monitoring provides a level of on-system data collection capability not possible through
external means. For many applications that make up business services, the exposure via external
means is simply not available for data collection. Thus, an external source cannot probe for some
kinds of essential data like an on system source can.
Additionally, the process of installing agents and their synthetic and simulated transactions
across the business network allows for a measure of user experience at multiple points. By
depositing agents in multiple locations, the experience specific to that location can be measured
in comparison with other locations in the computing environment. These types of measurements
assist the CNS spread with isolating issues that may be geographically related.
Agentless monitoring is also crucial to obtaining the entire picture within the environment.
Whereas agent-based monitoring can simulate user load and approximate the users experience
through an automated function, it does lack the visibility of the big picture. An agentless
monitoring tool will gather network traffic statistics from all across the network. Its utility is in
helping to find when other elements within the network may be the source of the problem.
For example, an agent-based simulation can show that a delay is occurring in a particular
transaction. But if that delay is occurring within a system not in the scope of management of the
agent, it may not be able to ascertain the root cause of the delay. Agentless monitoring can wade
through the network traffic to identify a solution outside the scope of the business service.
Lacking agentless monitoring, it is difficult to get this big picture view of the entire computing
environment.
114
Chapter 5
Proactive Awareness
Lastly, all this data is useless unless the business acts upon it. Knowing that a particular business
service is experiencing a loss of quality is only useful when IT knows what to do. Proactive
awareness is a function of higher levels of IT maturity due to ITs enhanced knowledge of the
components that make up the computing environment:
IT has the historical data with which to understand how the system and its users evolve
over time.
IT is more prepared from a technical perspective to impact its bottom line from a
budgetary perspective.
IT has more capability for measured and planned growth rather than 11th hour funding
requests when emergency resource needs arise.
IT grows more capable of working with the business on strategic activities like business
growth and service expansion rather than merely fulfilling service requests.
All of these relate to ITs ability to better service the business and the customers of the business.
By being more proactive with the resources under its care and feeding, IT grows more capable of
making better business decisions.
EUE Drives BSMs ROI

As has been shown in this chapter, EUE is a major component of a successful BSM
implementation. It provides one very important facet of data needed to make decisions about the
efficacy of the business service under management. It illuminates the plight of the services
users, giving the business an automated understanding of how effective they are servicing their
customers. And it ultimately provides the necessary financial data to business leaders to help
them understand the impact their systems have on the business bottom line.
Our next chapter will start a series of chapters on achieving the best value out of a BSM solution.
First up will be a discussion on achieving management value out of the system. There, well
further discuss the ROI associated with a BSM system. Well dig deep into the managementlevel dashboards that provide decision-enabling data to business leaders. Most importantly, well
come to understand some best practices in setting up visualizations for business leaders, and
enterprise IT, as well as the unique requirements of service providers and outsourcers.
115
Chapter 6
Chapter 6: Achieving Management Value

Its 2:23p on a Thursday afternoon, a little more than 2 months after Dans last call with Joe
Gear of Glass Emporium, one of his biggest clients. Its a perfectly uneventful day in what seems
like a string of them over the past month. Dans mobile device sits quietly on his desk. Right next
to it also sits the new monitor John had installed about a month ago.
Its not so much that the monitor itself is new, even though it is. Its a little bigger than the old
one. The screen seems a little brighter, and Dan thinks it does a better job of working with the
executive look hes been building for his office. Whats really important about that monitor
isnt its size or what it looks like. Whats important is the data that it shows Dan.
You see, Dan used to have a monitor on his desk, quite a lot like this one. It was around the same
shape and size, maybe a little bigger actually. That monitor gave Dan a heads-up display about
all sorts of elements on the network. He could see when network devices went down or when they
showed too many Dropped Frames. He could learn the Processor Queue Length being
experienced by his servers. He could even drill down into specific problems to find out the status
in relation to their resolution.
But there was a problem with that monitor. He didnt know a Processor Queue Length from a
hole in the wall. And there was that embarrassing incident when he called into the Help desk
demanding to know who dropped the frames, and whether the devices directly below that one
would be impacted. He still chuckles a bit when he thinks about that poor Help desk person. The
kid slowly explained to him that dropped frames were a network problem and they didnt have
anything to do with calling in the Facilities department.
This new monitor Dan likes a lot. Rather than showing him a bunch of information that only the
kids in IT knew and cared about, this one shows him things that he has the ability to act upon. It
shows him a set of graphs that represent the number of people currently making use of FCGs
online system. It shows him some nice dials that point to the right when his services are available
to users. They point menacingly to the left when the system is having problems. He can see in
real-time the number of users currently experiencing a delay in completing their transactions, a
problem that Joe was very concerned about in that painful call 2 months ago. For any of these
metrics, he gets an estimated dollar impact to revenue associated with every problem that
appears. Those numbers he understands.
Best of all, he can click on any of those charts to delve deeper. If his dial that shows service
availabilitythe one he looks at the mostmoves to the left, he can click on it to drill down into
why that dial shifted leftward. That second-level information is fairly understandable too, but too
many clicks down into the detailed information gets his head swimming like it did with his old
monitor. The information is there if he needs it. Usually he doesnt.
116
Chapter 6
As hes pondering a shift in one of those needles, the phone on his desk rings. Picking it up, he
finds his buddy and customer Joe on the other line. Theyd finally gotten around to that golf
game last week, and Dan figured Joe was calling to gloat about his unbelievable shot on the 16th.
Hey, Dan, Joe starts, Hows that short game of yours coming along?
Just as good as it was the other day, responds Dan. He let him win last week, or at least thats
what he keeps telling himself, Or as bad, if youre calling to gloat.
Joe laughs, Not at all! Actually I was here to talk a little about that Web site of yours again.
Ive been getting some more reports from my people down on the first floor.
A shiver goes up Dans spike, but just for a moment. Not again, he thinks as his eyes shift to
his monitor and its dials, all of which are pointing in the right direction, I trust your people are
having a good experience with it? Weve been putting in a bunch of new equipment designed
specifically to help us understand when you guys are having slowdowns or other problems.
Actually, thats the reason why Im calling, Joe continues, Our guys are reporting great
responses in the past month. I just got out of our monthly tag-up meeting with the people down in
purchasing, and they asked me specifically to thank you and your team for whatever youve done
over the past month. Our productivity down there is up 20%.
Well thanks. After that phone call a couple of months ago, we made it our first priority to figure
out what was causing the problem and get it resolved, Dan explains, In fact, we went quite a
bit further than that. We found out that the things we were watching for werent really telling the
true story of what you guys were experiencing. So, we implemented some new technology that
helps us understand your experience a little better.
Interesting Joes voice trails off for a second. He continues, Well, thats the other reason
behind my call. Theyve been raving so much about the changes over a short period of time. Im
here to pick your brain as to what we could do with our own Web sites.
Dan beams, Well, let me tell you about this new stuff. First of all, youve got to see this new
monitor on my desk
117
Chapter 6
Obtaining and Maintaining Value in a BSM Implementation

Thus far in this guide, weve discussed topics associated with device and application monitoring.
Weve talked about how monitoring for deviations in behavior across the network can manifest
into reductions in the overall quality of business services. At their most elemental, those
deviations can come in the form of devices going down. When devices go down in a business
system, that system loses the ability to perform some or all of its functionality.
In our last chapter, we elevated that conversation away from individual devices and their
up/down status. There, we talked about how loss of a system subcomponent is really the easy
part. Understanding how a reduction in the service quality of a subcomponent impacts the system
as a whole is much more critical to the overall health of the business service. This is because
finding and measuring those reductions in quality are difficult without proper toolsets in place.
Whole system counters are the immature IT organizations only barometer towards measuring
the health of its services. But, as discussed in Chapter 5, whole system counters provide an
incomplete picture at best. Because business services often involve the intercommunication
between multiple systems in a thread, slowdowns and loss of services can be experienced by end
users that dont involve a noticeable change to whole system counters. As an example, if a
coding error in an application causes an application to regularly enter an inappropriate wait state,
that situation will not show up as a noticeable change in a whole system counter. As we learned
in Chapter 5, the only way to truly understand the experience of the user is to measure from the
perspective of the user.
Figure 6.1: This and the next two chapters will discuss the achievement of value along three axes associated
with the implementation and use of a BSM solution.
To this point weve been looking at the technical aspects of Business Service Management, and
its complement that is End-User Experience monitoring. Weve talked about the technical and
process-based aspects of implementing such a system to the benefit of the organization. This
chapter as well as the next two chapters will deviate from those discussions a bit to consider the
value returned back to the business by implementing such a system.
In this chapter well discuss the value associated with managing business systems. Here, well
talk about the potential return that can be obtained by enterprises, outsourcers, and end users
themselves. Well show some examples of management dashboards that enable that return, and
how the information gained through those dashboards improve business leaders ability to better
service their customers.
118
Chapter 6
In the following two chapters well continue the conversation on value, delving into the
achievement of operational and IT value. In Chapter 7, well focus on how BSMs information
can reduce operational expenditures to an organization. Well also talk about how BSM can be a
management umbrella, under which management controls can be housed. There, well revisit the
topic of dashboards, discussing best practices in building effective ones.
Chapter 8 will conclude our conversation on value, focusing our discussion back onto IT.
Business leaders like Dan in our chapter example gain higher quality information through a
fully-realized BSM solution. But IT gains as well. IT gets the incorporation of a toolset that
assists service desks with problem identification, administrators and developers with resolution,
and IT managers with data that drives and justifies future purchases. In that chapter well talk
about the connectors available to many BSM solutions. These connectors enable BSM to plug
into various applications and frameworks.
Lets focus our discussion now on obtaining and ultimately maintaining value in the
implementation and use of a BSM system. That value comes from a set of potential drivers,
which benefit deployments in enterprises, with outsourcers and solution providers, as well as
value to the customers of a system.
Obtaining Value
We first need to break down the value obtained by an organization into two sets of categories.
First, there are some elements of value that arrive through the implementation of BSM. Others
arrive through the use of that fully-realized solution. As you can see through our chapter example
above, Dans new monitor arrives along with a whole new set of data. Though the monitor is
new, the data that comes with it has been reformatted such that it is now much easier to digest.
His previous vision included a set of data that had value, though not to him. That information
was useful to the developers who code as well as the administrators who maintain the system.
The information his new monitor shows him originates from End-User Experience monitoring
agents that are looking at individual transactions between users of the system and the system
itself. Those EUE agents are also looking at the transactions between subcomponents of his B2B
system. When those transactions drop below set thresholds, his dial moves to the left. When
transactions remain within desired levels of performance, his dials stay firmly on the right. His
use of the system means that he has a persistent heads-up display that provides him with a vision
of the overall health of the network.
When his system begins experiencing problems that would affect his customers experiences, he
can be proactive and communicate with them immediately. He can maintain critical business
relationships as problems occur rather than after theyve lingered for a period of time. This
advance information prevents calls such as the one he experienced in our last chapter where his
customers are forced to notify him when problems occur.
This brings us to the second set of categories by which value can be obtained. There are both
tangible and intangible benefits associated with BSM. The tangible benefits align with an
improved capability to see where problem spots are within the system and resolve them. The
speed in which those problems can be resolved is a direct and tangible impact. The enhanced
situational awareness Dan gains through his monitor feeds into the intangible benefits of the
system. Dans ability to maintain relationships with his customers is affected through that
capability and can be considered an intangible benefit.
119
Chapter 6
Table 6.1 lists a few more of the benefits an organization can obtain through BSM. These
benefits are broken down by category:
BSM Implementation
BSM Use
Tangible Benefits
Managing the impact of IT risk onto the

business
Capability to better fulfill compliance
regulations
Ability to better quantify risk
Potential for user self-service
Prioritization of IT activities based on

business impact
Reduction of time-to-resolve and problem
isolation/identification time
Potential of shift to utility computing
capability
Reduced business impact associated with
outages
Quantification of impact associated with
business service outage or loss of quality
Intangible
Benefits
IT and business goal alignment

Breakdown of IT silos
Increase in level of IT automation
Improved user communication
Increase of IT level of maturity

Increase in business service availability
and performance
Prioritization of IT activities
Enhanced communication between
disparate IT groups
Table 6.1: A non-exhaustive list of value gained by an organization through the implementation of BSM. That
value is broken down into various categories.
Maintaining Value
Obtaining that value is one component, but maintaining it over time is yet another. One
component of a fully-recognized BSM implementation that complements this has to do with the
metrics provided by the system itself. What we mean by this is that the job of a BSM system is to
provide metrics validating the health of systems and the quality of services. Those same metrics
can be used to simultaneously validate the value of the BSM system itself.
More than anything, BSM is a tool to crunch complex monitoring data. Thus, a snapshot of system
metrics prior to its implementation can be compared with future snapshots to validate the value it
provides.
Lets look at some examples of how this is the case. Recognizing value over time for a BSM
system involves the continued measurement of that value. A BSM implementation does that
through a series of metrics, the most relevant of which is called the Cost of Poor Quality
(COPQ). This metric, which we first talked about in Chapter 3, measures the quantity of lost or
deprecated transactions that occur over a period of time. When an Average Revenue per
Transaction metric is related to that measurement, this provides an overall understanding of the
total revenue opportunity lost over that unit of time.
This metric can be an excellent starting position from which to determine how a system that fails to
meet desired specifications impacts the business. When BSM gathers this metric, it is then related to
the amount of business being lost associated with poor quality. That data as it changes over time
provides an excellent measurement of how well a BSM system is impacting a business ultimate
bottom.
120
Chapter 6
Also useful is the nature of BSMs collection and calculation mechanisms itself. Once in
production a BSM system is unlike other management systems in that it automatically begins
creating metrics associated with its own value. This occurs naturally as a part of BSMs
calculations of revenue impact. As it goes through its calculations over time identifying and
categorizing system characteristics, it concurrently calculates value measurements of its own
worth.
By monitoring these metrics over time, an organization can track the improvement of their
managed services related to BSMs involvement. Some metrics that assist in this validation
include:
Problem time to resolution
Number of unsuccessful transactions per day (historical)
Frequency of an unsuccessful transaction per unit of time (historical)
Target IT transaction improvement & Rate to target
Unsuccessful transactions per unit of time after resolution
Total minutes elapsed in processing transactions (historical)
Each of these metrics can be used concurrently in measuring the quality of the identified
business service along with determining the value of the system itself. As an example, if the
metric for Problem time to resolution decreases over time after the implementation of BSM, it
can be argued that BSMs data assisted with the resolution of those problems. To further validate
that assessment, one can align that metric with others such as Number of unsuccessful
transactions per day (historical) or Rate to target IT transaction improvement. Combining these
metrics further justifies the rationalization that BSM is improving that business services
capability to serve customers.
Calculating ROI
Specific Return on Investment data can be challenging to calculate. So, this section will not
attempt to build complex calculations based on cost and anticipated benefit. Instead, in this
section well review some of the cost and benefit metrics that can be merged together to
illuminate potential investment return. In order to calculate a proper ROI, three elements are
necessary that merge together to provide a complete picture. Those three generic elements are
the cost to implement, the anticipated cost savings associated with the addition of the new
technology, and the revenue benefits expected with its use. In the sections below, well discuss
each of these in turn.
Figure 6.2: Adding together the cost savings benefits with the revenue benefits and subtracting the cost to
implement gives a good representation of a BSM implementations ROI.
121
Chapter 6
Cost to Implement
Implementation costs for a BSM system relate in many ways to the cost of the software itself.
That cost includes the evaluation process, its installation, consulting and training services needed
to properly train internal staff on its operation, and hardware resources.
Of the three metrics that make up our ROI, those that relate to the cost to implement can be
considered the easiest to measure. They involve hard dollar expenditures needed to find the bestfit BSM system and bring it in the door for the company. In addition to the elements noted
above, it is important when considering these numbers should be recurring costs associated with:
Software maintenance. A good rule of thumb for the costs associated with software
maintenance is 18% of the initial purchase price. That estimate typically runs across
multiple vendors as the expected amount of annual expenditure necessary to keep the
software under maintenance.
Technical support. Depending on the vendor, additional costs may be required for
technical support. An important contractual element that should be considered when
making a purchase is involved with the inclusion of technical support as a component of
annual maintenance costs.
Hardware refresh. BSM systems are intended to be long-lasting solutions. Thus, a proper
ROI should additionally include hardware refresh costs at intervals, usually three or five
years. This ensures that as technology changes, hardware is regularly purchased to keep
up.
Cost Savings Associated with Implementation

Cost savings associated with a BSM implementation are usually related to the savings associated
with improvements in application-related problems. BSM enhances troubleshooting and
resolution activities through its deep monitoring into business systems. Thus, when those
systems incur problems, technicians and administrators have an improved suite of tools to bring
them to resolution. The rate of that resolution is an excellent metric to be used in an ROI
calculation. Some additional metrics that relate to or feed into this metric include:
Target problem reduction. This is the anticipated reduction in time-to-resolve for

problems with BSM-monitored systems. Conservative estimates here are useful in
estimating not only the amount of time saved in solving problems but also a reduction in
the number of service breaches per unit of time.
Service desk load. Concurrent with the reduction in problems is a reduction in case load
to the service desk. By reducing issues that have risen to visibility by users, they are less
likely to need the services of the service desk. With mature organizations knowing their
metrics for the cost associated with each service desk ticket, reductions in load can be
directly related.
122
Chapter 6
Another potential metric relates to the burden of systems management tools operational within the
environment. When the functionality of systems management tools can be aggregated, the number of
redundant tools in the environment can be reduced.
For many management tools, the highest cost of ownership relates to client management, or the
activities associated with managing clients on-system. When those tools can be reduced through the
implementation of a BSM system, this incurs a cost savings to the organization.
Revenue Benefits
Revenue benefits associated with the implementation of a BSM system typically relate to the
quality of transactions within the monitored system. When transaction quality can be measured
and compared historically, this provides a basis by which added revenue realization can be made.
Improved transaction quality directly relates to the overall quality of the service itself. Below are
some metrics used in the calculation of that quality:
Average revenue per successful transaction. This metric is the basis for many of the
calculations recognized in this section. When revenue can be related per transaction, this
gives us the bar by which revenue loss or gain can be related through improved
transaction quality.
Number of unsuccessful transactions per day. This metric is doubly useful during the
implementation of a BSM system. Prior to the EUE monitoring that arrives with a fullyrealized BSM solution, it can be operationally challenging to measure the number of lost
transactions. When that monitoring is enabled, the organization gets a first look at how
many transactions are actually being lost. This first look can then be compared with
others over time as BSM drives improvements to the business service.
Average time per transaction. This metric can be the primary measurement of transaction
quality when not related to a failure. The time elapsed to complete a transaction bears
directly into the users ability to complete that transaction. When users are unable to
complete transactions within an appropriate amount of time, they may leave the system
rather than complete the transaction.
Total minutes used processing successful/unsuccessful transactions. This metric relates to

a period of time where a count of minutes used in processing either completed or noncompleted transactions occurs. Non-completed transactions are a waste of system
resources, doubly so as unsuccessful transactions often must be rolled back out of the
system. Between these two metrics, a shift in time from unsuccessful to successful
transactions relates to a more efficient use of available system resources.
User drop rate. Related to the above, when users grow frustrated with an un-optimized
system, they will eventually give up on their interface with it. BSM enhances revenue
when improvement activities related to its information reduce this metric.
It is helpful when calculating the ROI associated with these numbers to include a target improvement
rate associated with the BSM implementation. This target rate is the desired level of improvement the
organization wishes to achieve by implementing the system. When creating the ROI calculations, it is
helpful to use the target improvement rate as a lever for visualizing how its change relates to a
change in overall return.
123
Chapter 6
Management Visibility
Getting the most management value out of a BSM implementation also relates to the information
that system can provide. The information collected through traditional device and EUE
monitoring is only as good as its presentation to its consumers. Considering this, it is critical that
good dashboards be built that are suitable to the individuals that require their information. As we
discussed back in Chapter 1, one of BSMs central tenets involves the digestibility of the
information provided.
In the following sections, well discuss how management visibility is obtained through the
implementation of effective dashboards.
The dashboards shown in the remainder of this chapter are intended to be used as examples of how
dashboards can be configured. Depending on the BSM solution chosen, dashboards may look
different or use different widgets to display data. Those used in the following sections show a broad
sample of how dashboards from any BSM solution can be used.
Visibility & Dashboards

Back in Chapter 1 we talked in generalities that once the service model was fully realized and
configured into the interface, and once appropriate agents are installed around the network, the
final task is the creation of dashboards that provide information specifically tailored for their
users. Through the remainder of this chapter and into the next, well dive deep into the dashboard
creation process. In the following sections, well show some examples of dashboards for use by
executives, the IT department, and even with end users. To the customers of a BSM system,
information contained within dashboards can be customized to their needs. So an outsourcer will
prefer contract fulfillment information, while enterprise IT will likely want device status
information.
The beauty of BSM is in its rich calculation engines that enable data to be factored in any way
necessary. The limitation is only on the designers skill in pulling and formatting the information
in the way that best suits its consumer.
124
Chapter 6
Figure 6.3: An example of an IT management dashboard for a financial institution that shows system status
as well as business metrics(credit card transactions, costs).
What to Display
The hardest part in designing good dashboards is finding the best-fit quantity of data to include
as part of its main page. Dashboards typically run within an Internet browser window such as
Internet Explorer or Mozilla Firefox. So ensuring that dashboards are sized in a way that works
with those browsers is also critical.
Figure 6.3 shows an example of a dashboard of interest to a financial institution. This dashboard
is specifically tuned towards the executives of that institution. Here, EUE agents are looking at
transactions across multiple sites and aggregating that information into a single view. The
executive gains a single-screen view of the health of the environment, while at the same time
getting financial information that relates to the health of transactions going through that system.
125
Chapter 6
Youll immediately see that most of the information in Figure 6.3 is graphically related. The
screen can be considered relatively busy, as it is full of information. However, the graphic
nature of the information makes it easy for the consumer to follow over time. Important in
creating dashboards is finding the correct elements of information, and presenting them in a way
that the eye naturally is attracted towards information of interest. Green is a color typically used
to show health while red is a color used to show unhealthy elements. In the same vein, upward
trending data typically indicates improving health and revenue while downward trending data
indicates declining health and revenue.
The incorporation of widgets such as dials, heat charts, spread charts, and maps along with color
coding further helps the dashboard consumer.
The goal with any dashboard is to create a picture whereby its consumer does not need to look
closely to understand what is going on. Similar to how dashboards work in automobiles, its consumer
should be able to merely glance at the screen and immediately recognize health or problems.
What Not to Display

Initial dashboards are meant not to be static instruments. Rather, they are intended to be the first
layer in a series of data presentation screens provided to their consumer. If a consumer sees an
element of concern within the dashboard, they should be able to click that element to drill down
into additional and more detailed data.
That being said, it is critical not to overexpose information to the consumer at the top-most level.
Any overuse of textual information reduces a top-level dashboards efficacy. Related to BSMs
tenet of digestibility, the information at that first layer should remain graphical whenever
possible. Reading information involves a higher level of processing on the part of the consumer,
and should be relegated to down-level views.
The content of the dashboard must similarly be relevant. As was discussed in our chapter
example, Dans first monitor included an incredibly rich interface showing which elements were
up, which were down, and which were being worked upon. But none of this information was
relevant to Dans job as COO. This sort of information is better suited to John the IT Director
rather than Dan. Dan is much more interested in financial information like what is shown in
Figure 6.3 with a minor representation of service health such as what is displayed on the centerright of that image.
Access Control
Securing the information presented within the dashboard is as critical as securing the systems it
monitors. Typical with business systems, information about transaction health, rate, or issues can
be considered sensitive information, the disclosure of which could negatively impact the
company. Thus, effective access control elements must be put into place so that individuals can
see and work with the dashboards that relate to their jobs. But information that does not relate to
them should likely be kept secured from their view.
As an example of this, service desk employees usually do not need access to real-time financial
information. That data is useful to business analysts and executives. So it is a good idea to prevent
service desk employees from seeing this information.
126
Chapter 6
Trend & Reaction Lines
Figure 6.4 continues our discussion by showing another representation, this one an executive
summary of various systems within an organization. Here, we also see additional widgets that
display historical information associated with the business service. In Figure 6.4, we see that
service quality over a period of time has trended to a down state. The current user impact
associated with that outage is shown on the lower-right.
Reaction lines are visual elements that let the consumer know when a situation has progressed to
the point where some action is required. It is possible using dashboard generation tools to
graphically represent the points at which those situations occur. By creating reaction lines with a
graphical representation, consumers do not need to monitor textual data for problems.
This dashboard can be an example of a first-level drill down screen. When problems occur,
consumers want to know what they are related to. This dashboard in the upper-right shows
metrics for SLA fulfillment as well as the trending of monthly quality. This dashboard can assist
an executive with ascertaining when problems occur and the impact associated with those
problems while not being deluged with the technical details associated with the problem.
Figure 6.4: This Executive Summary dashboard shows some trend lines based on user impact time and
service quality. Reaction lines notify dashboard consumers when a problem has hit a critical state and some
remedial action must be performed.
In Chapter 7, well review a comprehensive list of widgets that can be added into dashboards for
various reasons. These widgets are configured such that data feeds their actual positioning. Some
widgets work better in some situations than others. In Chapter 7, well talk about the best practices
associated with their use and in building dashboards in general.
127
Chapter 6
Management Control
In addition to providing visualizations of the business service environment, dashboards enable an
improved sense of control. When management is empowered with information at their fingertips
they are given the ability to make more informed decisions about their business. Depending on
the need of the consumer, dashboards can be configured with data that enables the executive or
business analyst with the powers to change the environment.
Figure 6.5: An example of a control dashboard, this operations details view shows detailed information about
the state of various locations and business services. For each, more detailed information is provided, giving
the consumer a specific view of what areas may require attention.
Control Dashboards
Control dashboards can exist at the top level or be configured as drill-down elements. The idea
with control dashboards is to provide enough information to their consumers (for example, IT
and executive management) that they can make effective decisions regarding the operation of the
environment. Good control dashboards also help with augmentation decisions. As environments
grow they inevitably require purchases and upgrades to support the needs of their users. By
enabling the consumer with information regarding performance, activities, and behaviors, the
consumer can enact change to the environment as necessary.
128
Chapter 6
Figure 6.6 shows an example of a second-level dashboard that presents more detailed
information about multiple business services over a spread of multiple locations. Service quality
for any particular service is listed in the dials in the center, while history and business calendar
information is presented in the upper-left. Important here is the inclusion of textual explanations
of situations occurring within each business service and/or location.
The presentation of this information provides its consumer a more holistic view of the details
associated with a failure condition. This enables them to make better decisions in terms of
problem resolution or customer relationship development.
What to Display
These types of dashboards typically include Key Performance Indicators (KPIs) that show the
health of services within the network. Whereas top-level visualizations are best served using allgraphical representations, lower levels require the addition of textual information that validates
the images at the top level.
The typical consumer use case with these sorts of dashboards becomes involved when business
services go out of specifications. When thresholds are breached, the top-level dashboard will
elevate an indicator showing the situation. At best, a dashboards users need only a single glance
to recognize problems, and start working toward their solutions. The consumer than can be given
the ability to drill down into that problem to see its cause, information about its resolution, and
any impacts that are occurring.
What Not to Display
Important here to realize is that the same drill-down linkages that occur from top-level to
secondary-level dashboards need not stop with the first level. At the point of secondary control,
the dashboard designer should often remand highly specific data to a third-level dashboard. This
allows the same dashboard to service multiple classes of users. Those with the technical
experience to understand and action upon specifics can drill down to third-level information.
Those without the experience or the job-related responsibilities can remain at the level of detail
of use to them.
Management Impact on Operations
The elevation of information to the level of business management provides transparency between
business management and IT operations. In organizations with technical components, business
leaders often suffer from a technology gap, where their experience with business concepts dont
align with the level of technology being used in service of their customers. This gap in
knowledge and experience can be especially problematic when executives are unaware of the
activities within their business technical employee base. They may make decisions that dont
make sense from a technical perspective.
By enabling a reconfiguration of traditionally technical information into revenue targets and rates
understandable by the non-technical executive, this goes far in aligning the goals of IT and the
business. That alignment is a central tenet of Business Service Management.
129
Chapter 6
SLA Measurement & Fulfillment
One specific type of dashboard useful for both management and IT is associated with Service
Level Agreement measurement and fulfillment. Back in Chapter 2 we talked about how
immature IT organizations have a tendency to set SLAs that relate only to individual device
health rather than the overall status of the business system. Immature IT organizations also tend
to set SLAs that are complicated or operationally unfeasible to quantitatively measure.
BSMs data gathering and calculation tools allow for SLAs to be assigned to IT and outside
organizations that are measurable. Most specifically, BSM allows for real-time collection of
data. When SLA counters can be collected and reported on in real-time, this allows for a much
better recognition of fulfillment.
Figure 6.6: A dashboard widget, showing SLA measurements and their targets.
Consider the situation where an immature IT organization has laid SLAs in place with the
business. When those SLAs are only measured at months end or at the end of each quarter, it is
operationally challenging for IT to meet their goals. When goals are not met, long periods of
time must elapse between measurements. This inability for IT to see their status in relation to
their goals makes difficult the process of meeting those goals. It is impossible to see how the
efficacy of individual activities relates to the improved or worsened accomplishment of that goal.
Only by providing regular updates through interfaces like dashboards can the completion of
remediation activities be easily related to the ultimate goal.
Figure 6.6 shows a representation of a dashboard widget that shows four specific SLAs and the
SLAs associated with each. These SLAs relate to availability targets, and as is shown by the
example three of the four goals have not been met for the period. The visualization shown in this
widget provides IT with a real-time rather than a monthly or quarterly measurement of its
success or failure in meeting its required SLA goals.
130
Chapter 6
Purchase / Upgrade Decisions

Another useful tool that is enabled through the incorporation of dashboards and visualizations is
the ability to see how assets are performing in relation to the goals of business. The value of
assets in terms of their net return back to the company is a challenging calculation when
completed using manual tools. Conversely, when automated systems are in place that constantly
measure individual assets, it is easier to see their relation.
One way BSM augments purchase and upgrade decisions is through views into the health of
various assets within the system. In Figure 6.7, we see a dashboard widget specifically tuned
towards failure rates of particular systems. Drilling down into these failure rates can provide
additional information about the individual assets that make up that system. Mean Time to
Restore and Mean Time Between Failure metrics are shown on the left. The Pareto Chart on the
right shows failure locations by quantity and percentage. This illustrates that the top problem
causes that are impacting the services come from IT or it can be read as 20% of the root causes
are impacting 80% of the overall business service. This helps practitioners prioritize
improvement where impact is the greatestin this case, IT.
Figure 6.7: Dashboards can also be used to show the utility and failure rates of assets. This information
provides insight into the need for future purchases or upgrades.
These metrics dovetail into those discussed above relating to transaction health. When
transaction health can be related to the inability for assets to keep up with the load, this is a key
indicator that additional purchases may be necessary. This quantitative information in the hands
of management helps justify new purchases. It can reduce the cycle time associated with
purchase requests, as purchases are made when they are required.
Even more useful, when predictive analyses are made against existing transaction and health
trend lines, it is possible for managers to begin the procurement process before failure situations
occur. As asset procurement lead times can be extended, predicting the need for additional assets
before they are required allows for graceful scaling of existing services without costly downtime
associated with overuse.
131
Chapter 6
Process Integration
Each of the above topics relates to the iterative improvement processes that are enabled through
the visualization of necessary information. Immature IT environments exist in that mode
primarily due to the lack of information at their fingertips that shows them where bottlenecks and
other problems exist within their environments.
Process improvement frameworks such as Six Sigma and ITIL assist with this activity. But
alone, these are frameworks little more than instruction sets. Data is required to make correct
improvement actions within an environment. That data can come from the elements and
transactions monitored through an EUE and/or BSM system.
Figure 6.8: A dashboard widget showing individual business services and their level of deviation from
thresholds. This information can be used along with process improvement frameworks like 6 Sigma or ITIL in
improving technical and personnel processes.
Figure 6.8 shows an example of how data can feed into a process improvement framework such
as Six Sigma. In this example, the dashboard shows the status of individual business services and
major components of those business services. For each of these, a sigma is valuated to the
service. That sigma relates to the amount of deviation from desired values is present within the
system.
When the level of sigma for transaction performance goes beyond established thresholds, as is
the case with the Bad Debit subsystem of the Financial Planning service, a Cost of Poor Quality
value is assigned. In this case, the cost associated with going out of specifications is $258K. This
heads-up display provides process engineers and business analysts a view into the transactions
within a system, and helps them identify where deviation impacts corporate revenue.
132
Chapter 6
Fitting BSM into the Overall Operational Scheme

Alone and without attention, any of these dashboards provides little value to the organization. If
the data they provide is not watched and used on a constant basis, dashboards are little more than
pretty pictures. BSM is a tool that provides information to consumers that allows them to make
better business decisions. Fitting BSM into the overall operational scheme requires the
incorporation of procedures so that key personnel are monitoring and know what actions to take
when problems occur.
In addition to this Pavlovian approach when red lights appear on screen, it is similarly critical
for business analysts and process analysts to make use of this data. These individuals can analyze
data within the context of existing business activities to provide additional insight into business
activities. As was explained in Chapter 4 when we discussed the team members necessary for a
BSM implementation, those same analysts over time will find additional ways in which BSMs
data can benefit the organization.
End User Visibility & Control

Individuals within management are not the only consumers of dashboard information. Others
within and without the organization can similarly make use of the information gathered by a
BSM solution. In the next few sections, well discuss three classes of users that also gain from
the implementation of a BSM system.
The first of these are end-users themselves. These individuals are the ultimate consumers of our
systems. They can be classified into two different groups:
Technical. Technical end-users often are those that still exist within the organization, but
are users of the business systems under management. These insider personnel often
have a high requirement for transaction information in order to perform their jobs.
Transaction information can be factored in ways that enables them to see trends in use
and environment states.
Non-technical (business users). Non-technical end-users are often those completely

outside the organization. They may be customers of the company or individuals who
make use of the data provided by that company.
As an example of each of these, think first of a large mortgage brokerage. The consumer of their
loan origination system is likely highly technical. They likely will want rich information about
the status of mortgage metrics, their location, their processing status, and information about the
industry as a whole.
133
Chapter 6
Conversely, an example of a non-technical individual can be the customer of an external B2C
system. If an individual wants to purchase products from a companys web site, they dont
require industry and trending data. But they may want information about the status of that web
site. The scope of data they require is less than in the example of the mortgage broker. To them,
simple status information involving their individual order and the status of the web site is what is
useful.
Figure 6.9 An end-user dashboard for a technical consumer. This visualization aggregates end-user
transaction data for an example government system.
134
Chapter 6
System Status
As you can see in Figure 6.9, end-user consumers are predominantly interested in status
information about the services they consume. The visualizations there show sample data
associated with the level of repair of city elements, housing data, and rate payments. End users of
this system in this example are less interested in the quality of transactions going through the
system. Instead, theyre interested in the transactions themselves.
One specific example of service quality that is of interest to end-users is the ultimate availability
of the system in total. BSM solutions are better than static The System is Down screens for endusers because they can provide more accurate information about the status of the outage and the
expected time for services to be returned. These types of data can be categorized into:
Projected time to repair. When problems occur with a business system that impacts the
end user, those users more than anything are interested in the quantity of time the system
is going to be down. If a system they rely on for regular transactions goes down, knowing
that it is expected to return in 60 minutes is more valuable to them than attempting to retry accessing the system over and over again until it returns.
Automated incident information. Providing information to users about the scope of an

outage is of particular value to the troubleshooting administrator. When systems go down
or partially stop functioning, a natural next step by users is to repeat accessing that
system until it returns. When many thousands of users hit Retry repeatedly, it causes an
unnecessary stress on an already problematic system. By providing users with
information about a systems return to service, they can go about other duties until that
time has elapsed.
Scheduled outages. Systems also typically have known outage windows. Those windows
can occur during slow periods for one time zone, but the global nature of the Internet
means that one time zones slow period is the middle of the day for another. When
businesses work globally, informing users of scheduled outages allows users to change
their use patterns so they are not impacted by the outage.
Outsourcers & Service Providers

Outsourcers and service providers are another group of specialized users for BSM systems. They
will make use of all the functions weve discussed up to this point with a fully-realized BSM
solution and its visualizations. But they have an additional need associated with contract
fulfillment that is a primary focus other groups do not have.
Within outsourcer relationships, budgets and costs are typically created with very narrow
margins for error. When errors occur of even a small impact, this can negatively impact the
overall contract between the outsourcer and their customer. Even a minor deviation can cause
entire contracts to become loss situations for the outsourcer.
It is because of this nature of outsourcer contracts that BSMs data helps with reconfiguration of
contract line items and workflow on the fly. Similar to how BSMs data helps IT organizations
recognize the efficacy of their activities in relation to their SLA, outsourcers use BSMs data to
validate the fulfillment of their contracts and remediate any breaches.
135
Chapter 6
Figure 6.10: A dashboard widget that shows critical Operational Level Agreements and Underpinning
Contracts for an outsourcer. This information helps the outsourcer adjust resources in real time as contract
conditions change.
Cost & Risk Reduction

Outsourcer contracts are highly scrutinized as they are laid in place. Outsourcers are typically
highly mature in their models of contract fulfillment, having highly-refined calculations in place
to assist them with developing profitable contracts through very low margins. Of particular
interest to outsourcers is the data used to fulfill those calculations. As outsourcer calculations are
typically complex, the maintenance and regular updates of reports associated with those
calculations can be unwieldy. Utilizing BSMs internal calculations and logic as well as its data
collection capabilities with pre-existing management toolsets makes BSM solutions an easy
implementation.
Contract Compliance
Along with the reduction of cost and risk in these relationships is the defined need to identify
where breeches in contracts occur as well as the source of such breeches. Contract breech
situations with outsourcers can be highly expensive, and relationships are often complex with
outsourcer IT and customer IT co-mingling activities within the same environment. So
recognizing the true root cause of a breech can assist the outsourcer with pointing the finger in
the correct direction when problems occur. This ability to locate problem domains translates into
a significant reduction in risk to the outsourcer.
136
Chapter 6
Enterprise IT
Enterprise IT is yet another consumer of the information provided by BSM. Although much of
our conversation to this point has to do with the movement away from a device-centric approach
to monitoring, IT still has the job of care and feeding of individual devices and applications. As
BSMs data collection tools have the ability to ride atop existing traditional device monitoring
systems, the information gathered by those systems can be brought into BSM for IT
consumption.
This setup has the advantage of unifying the tools employed by all branches of the organization
to manage their business systems. Enterprise IT can make use of the same suite of visualization
tools used by executives, application developers, and business analysts. This allows all groups to
speak the same language and leverage the same toolset in identifying problems, finding
solutions, and ultimately managing the environment.
Figure 6.11: An example BSM dashboard for business and IT executives illustrating each business function
area for a bank e.g. Claims handling, telephone banking service, e-commerce etc. The traffic light colors
show the quality performance for the services supporting these banking areas.
137
Chapter 6
Cost & Risk Reduction

The integration of toolsets by all entities within the organization helps reduce the overall cost of
support to the environment as measured in terms of redundant utilities. Also enhanced are the
abilities for IT to work hand-in-hand with other organizational teams during augmentations or
problem solving events. The ability for all teams to work together within a single interface means
that many eyes are looking at the same data. This ultimately feeds into a data-driven corporate
culture.
Customer Satisfaction
All of this relates back to the central tenet of BSM, which is improving customer satisfaction for
systems under monitoring. When revenue is positively impacted through the incorporation of
BSMs data and the subsequent analysis of that data, the management value associated with its
use is validated.
BSM Enables an Ongoing Measurement of Management Value

This chapter has attempted to show how management value can be both obtained and maintained
through the implementation of a BSM solution. Weve talked about the potential return on an
investment in a BSM solution. That ROI is related to the cost of the solution combined with the
cost savings and revenue benefits it provides. We then continued that conversation by showing
what can be considered the most important part of BSM its data visualizations. Here, weve
shown some sample visualizations and how those samples can be used by various consumers to
drive their decisions. With each class of consumers, the level of data they require is different.
BSMs calculating engine enables each class to see the data they require with the calculations
they require.
Our next chapter will dig even deeper into these visualizations. There, well talk about achieving
a post-implementation reduction in operational expenditures. Well discuss how the
incorporation of BSM can become an umbrella for managing all of an organizations business
services. Well then dive into the service model itself and how that model feeds each dashboard
and its visualizations. Well conclude with a talk about effective dashboards, how to build them,
and the components available in BSM solutions today to make that happen.
138
Chapter 7
Chapter 7: Achieving Operational Value

Its 11:25am, about 6 months after the arrival of Dans new monitor, and he finds himself in
FCGs cafeteria, attempting to beat the noon rush. Things at FCG have been relatively quiet
over the past few months. With much of the operational care-and-feeding of the IT department
behind him, Dan has found himself working on strategic initiatives in other places to widen the
scope of the company.
Dan has always liked FCGs cafeteria. They brew a decent cup of coffee, and he usually makes
his way down for seafood day on Fridays. As COO, one of his early responsibilities was in
finding just the right catering company, and he always felt pride in having chosen one of the
best.
Picking up his tray and working through the line, Dan runs up upon John Brown. Johns
attention appears divided between the potato-encrusted tilapia and the company announcements
displaying on the television mounted on the wall above the cafeteria.
Long time, no talk to, eh John. Howre things going in IT these days? asks Dan to his
apparently distracted IT Director.
John looks deep in thought, taking an extra second to respond to Dans question, Not bad. Not
bad at all... Hey, have you been paying much attention to these new TVs that Facilities has
mounted all around the company? Theyre using them as a tool to show company
announcements and other news. You know, like quarterly revenue goals, he motions to the
chafer holding todays fish entre, and todays tilapia special.
Dan beams, Actually I have. They were originally my idea. I was having lunch with one of our
customers a couple of months ago and saw something much the same there. Thinking about how
tired I was getting of seeing all those company news emails, I realized what a great idea they
were for getting that kind of info to people. So, I bought about a dozen flat-panels and had them
strung up all over the place. What do you think?
I love the idea, John says, staring at the news of the day, but Ill admit they are distracting
when Im trying to decide between the sandwiches and the special o the day.
Try the tilapia. Its my favorite. Hey, have you seen the last set of profitability reports? Were
up 15% over last quarter, comments Dan.
John looks at Dan suddenly, the look of a light bulb going off in his eyes, No I havent Dan, I
just had an amazing idea. Weve got to sit down and think about this a little. Have you got a few
minutes?
Dan blinks, Sure. An amazing idea, eh? Alright, Ill do even better. If this idea is as good as
you say, then lunch is on me.
139
Chapter 7
Both men grab their lunch and sit down at the tables close to the cafeteria line. John makes sure
to position both men so that they can see the television showing the news of the day.
The system we in IT put together to relay that information to the televisions is pretty
extensible, John says, pointing to the television, All we need to provide it is a network location
of something we want to displaya PowerPoint deck, a Web site, or even a video streamand
itll play it in any order that we want. Most of the time were just rotating the PowerPoints HR
puts together with the news of the day. Theyre set to update every 6 seconds.
OK, Dan comments, thats why be bought the system. Whats your idea?
John sits back in his chair, still with that light-bulb-going-off look on his face, Heres my idea.
That BSM system that weve been using for a while, weve pretty much got the configuration of
that system down pat. Were using it as a tool in IT not only for monitoring but also as a central
location for many of the otherwise disparate management toolsets we used to use. You and the
other executives and financial types are using its data to keep you up-to-date on our financials.
Even our customers are now getting to see parts of it, what with the new status and outage
notification screens that it automatically drops onto the Web site when were having issues.
Go on, Dan urges.
Well, with these new TVs, weve got a new tool whereby we can keep the whole employee base
informed about the status of our company. What if we started providing up-to-the-second info on
our financial status from BSM? How well were meeting our goals. How well were doing with
sales. Those sorts of things, on fire now, John continues, Kind of like a tool for keeping
morale up when were doing well financially. When were not doing so well, it can serve as a
reminder that we need to buckle down. Fiscal transparency to the employees, and all that.
John continues, All that data is already in BSM in real-time. All we need is to create a new
dashboard to display it. Wed have to be careful about providing too much info so were not
giving away any secrets. But we could put together some dials and heat charts with generic
fiscal health info. Wed just rotate that Web site with HRs info and the special-of-the-day.
What do you think?
Interesting. I think I owe you lunch, says an impressed Dan as he picks up the check.
140
Chapter 7
Post-Implementation Operational Achievement

As with any investment, getting the most return is critical to determining its worth to the
business. A BSM systems internal calculations engine itself is involved with the determination
of value and return. This determination works not only for the systems it monitors but the same
calculations associated with return can be used to validate the BSM system itself. As such, BSM,
through its measurements and internal financial calculations, is capable of determining that
value.
As you can see in Figure 7.1, at the highest level, a BSM system is intended to be a sort of black
box. Being input into that box is a set of raw data arriving through its own End User Experience
instrumentation or through various connectors that plug into other management systems.
Back in Chapter 5, we talked at length about how EUE augments traditional monitoring with a
validation of how the user is experiencing the system. Other connectors that comprise BSMs data
ingest were discussed at a high level in Chapter 4. In our next chapter, well discuss those connectors
in much more detail.
On the right side of Figure 7.1, we see the output of BSMs calculations. These are a series of
visualizations that can be used to validate system health, understand the financial impact of IT
systems, and ultimately make decisions based on data that has been formatted into a digestible
format.
Connector Data
Visualizations
Dashboards
Information
Actionable Data
System Information
Service Status
Raw Data
Figure 7.1: In many ways, BSMs internal computational logic is like a black box. Raw data from connected
systems goes in one end. Visualizations of that information in digestible formats are output on the other end.
141
Chapter 7
Our chapter example involves a story whereby FCG is using the internal financial logic within
their BSM installation in new and unique ways to provide value to their general employee base.
As youll see in this chapter and hopefully throughout this guide, BSMs ability to gather and
integrate data is comprehensive and covers many areas within IT. The only limitation is in your
own imagination to develop dashboards and other heads-up displays that are useful to their
consumers.
In this chapter, well discover some of the ways in which added return can be gained through the
implementation of a BSM system. Much of that return comes through the reduction of
operational expenditures on the part of any dashboard consumer. As Figure 7.1 shows, well
focus our attention in this chapter on the ingest and output portions of a BSM system, and how
BSMs involvement with those linkages enhances its return to the business. Specifically, well
talk about BSMs capabilities to operate as a management umbrella, consolidating many
traditional management consoles under its unified interface. Well then discuss BSMs
visualizations and how their extensibility allows them to be used for many different classes of
users. Well conclude this chapter with a look at the various data blocks that can be made part of
a dashboard.
Reducing Operational Expenses (OPEX)
As is explained in the narrative that makes up our chapter example earlier, BSMs black box
makes it highly useful for the formatting of raw data into formats that make sense to multiple
classes of users. Executives and financially based users can leverage the financial information to
gain a real-time understanding of the role of IT-based income to the business bottom line.
At the same time, other users can benefit from this information as well. As well see in a second,
BSMs functionality can become a management umbrella, under which many common
management tools can be unified. BSM includes the ability for IT service measurement and
reporting, both from the perspective of IT as well as the customer and the business. By unifying
disparate management tools and adding a business-oriented layer to IT service management, this
reduces operational costs especially around problem resolution processes and the impact of
problems on the business.
Also possible in consideration of this black box is the extensibility of the information that can be
provided to users. With the horizontal scaling of a BSM systems user baselike what was done
with the televisions in our chapter exampleBSMs toolsets have the ability to reduce the
overall operational expenses associated with managing IT systems.
142
Chapter 7
BSM Correlates and Consolidates to Make Sense of the Data

Lets start by looking at the management tools that are necessary in the operation of any IT
environment. In order to properly understand the communication between individual elements on
the network as well as to quantify the health of systems, various types of management tools are
often used. As Figure 7.2 shows, these tools can be comprised of systems management and
monitoring systems, application monitoring systems, and others. Each of these has a role to play
within the environment:
The application monitoring system watches the applications in an environment, compares

them with baseline performance, and reports when applications fail or perform outside
their expected ranges.
The systems monitoring system ensures that systems are running properly and verifies that
servers and networks are running to pre-established baselines.
The Service desk to tie in incident logging, categorization, prioritization, etc.
The CMDB as a repository of information that represents the authorized configuration of

the significant components of the IT environment. A CMDB helps an organization
understand the relationships between these components and track their configuration.
The systems management system augments all of these by enabling the baseline itself,
providing for policy-based and centrally controlled changes to the environment as well as
the maintenance of system configuration.
143
Chapter 7
Presentation Layer /
Dashboards
Service
Model
Instrumentation Data
Systems Management Systems Monitoring Application Monitoring

System
System
System
CMDB
Service
Desk
Figure 7.2: The information from and activities associated with disparate management toolsets can be
centralized through the BSM implementation where business rules are applied to make sense of all the data
BSMs black box allows for the aggregation of instrumentation data from each of these systems
into a single location. More importantly, its calculation engine allows for the relation of
information between individual management systems. As BSM comes equipped with its own
suite of tools for acting on the information it receives, it uses the service model to apply business
rules taking data from disparate source and turning it into information that is meaningful to the
business. It is possible to use the BSM system as the overarching umbrella for the management
of many facets of an IT environment. In the next few sections, lets take a look at how this can be
done through the connection of BSM to the other management systems in an environment.
144
Chapter 7
Unifying Management Controls

The number of individual management controls within any IT environment is often large. As
individual toolsets, they can be challenging to work with in combination. Though each of these
tools can be segregated in function and by product, the types of activities desired out of each
control are often relatively similar across all IT infrastructures. The daily administration of an IT
environment, no matter what the industry or company size, is usually the same. Many of the
same types of actions are necessary in order to best administer the environment:
Management tools. Most organizations have several management and monitoring tools
already in place. While a BSM system doesnt replace these existing tools, it makes it
easy to integrate data from other IT infrastructure management products, service desk
software, configuration management databases and other applications across multiple
platforms. BSM compliments management and monitoring tools by filtering out the
noise and turning multitude of IT data into information that makes sense to the business.
Notification. In the same vein as with management toolsets, notification elements can
also be segregated. Network element notification can be enabled through one protocol
and service while server and application notification are enabled through an entirely
different one. When notification systems are segregated from each other, it grows
challenging to identify root causes in an environment as notifications associated with one
part of the environment source from one location, while other notifications source from
elsewhere. When BSM ties into segregated notification systems, this centralizes the point
of notification and eases the pain of discovery.
Reporting. Lastly, reporting on the activities within the IT environment is usually

different in form and function based on the system performing the reporting. When
multiple systems are used to manage reporting in an environment, problems are likely to
surface associated with the desire to relate items identified in one system to those in
another. BSMs internal calculations provide a mechanism for system data to be related
to each other for this purpose.
Figure 7.3: An example of how BSM calculations can be inputted into the system. Here, the calculations are
used to validate network SLA compliance. The information gathered to fulfill these calculations can come
from many different sources
145
Chapter 7
Operational Visibility
BSM also provides return in terms of the overall visibility and control into IT systems. As IT
systems are based on intercommunication between hundreds or thousands of disparate elements
all residing on a common network, it is not possible to see the environment in a physical sense.
Instruments are necessary to do the seeing for its operators. These instruments, provided by the
point monitoring tools and the data aggregated into the BSM solution, provide a human-readable
representation of the health and operation of services on the network. This representation
provides value to the business in a few key ways:
Situational Awareness. When instruments are provided with the best possible data in
which to perform their calculations, the users of an environment get the best possible
understanding of the function of that environment. Situational awareness refers to the
ability for a systems users to recognize what is going on within the network. BSM
provides this through the translation and reduction of huge quantities of data into levels
that are consumable by its users. Later on in this chapter well talk about some of the
individual data blocks that are used in BSM visualizations to enable this.
Prioritization of Resolution According to Business Impact. Associated with the visibility

of systems is also the recognition of how best to impact change upon them. When issues
occur, it is often difficult to understand which are causing the most impact to the user
base. A problem that at first blush appears to be of greatest importance can actually be of
little impact to users. BSMs calculations of user impact associated with revenue impact
helps IT prioritize problem-solving to the issues that need attention the most.
Characterization and Resolution of Problems. Along with the point above is the proper
understanding of the problem itself. BSMs incorporation of data from multiple types of
monitoring systems as well as its own EUE instrumentation gives the troubleshooter the
necessary awareness of the problem itself.
Identifying Fault Domain & Root Cause. Because problems at the outset might be
masked by other factors, the largest time in component problem resolution in IT is
typically spent in problem identification. Finding the location where the problem exists,
the fault domain, as well as the actual source of the problem itself, the root cause is
challenging without proper situational awareness. Lacking these capabilities, IT can
spend too much time tracking down a problem in the wrong location. BSMs toolsets,
especially with its agent-based and agentless EUE monitoring goes far into identifying
the individual transactions related to the issue. Reducing the time spent in problem
identification mode can significantly reduce the operational costs of that problem, as its
time-to-restore metrics are greatly enhanced.
146
Chapter 7
BSM as an Extensible Visualization Tool

Thus far weve talked about the use of BSM as a tool to centralize the management of its ingest
connections, those to the left of the black box in Figure 7.1. But, as weve discussed numerous
times before, BSMs primary mission is to provide a way in which digestible data can be viewed
to multiple classes of users. This capability enhances the decision-making powers of those that
make use of this data. It provides measurable and quantifiable metrics by which IT and businessbased decisions can be made and ultimately justified.
The chapter example at the beginning of this chapter was written specifically to illuminate how
those visualizations can be leveraged by multiple user classes. As John says to Dan, Were
using it as a tool in IT not only for monitoring but also as a central location for many of the
otherwise disparate management toolsets we used to use. You and the other executives and
financial types are using its data to keep you up-to-date on our financials. Even our customers
are now getting to see parts of it, what with the new status and outage notification screens that it
automatically drops onto the web site when were having issues.
What we see in this interchange is the extensibility of the data that comes out of BSMs internal
calculations engine. The same raw data that provides financial specifics to Dan can be
reformatted into more generalized information for consumption by the employee base. This
illustrates how BSMs calculations and its visualizations work hand-in-hand to provide the
necessary situational awareness for each user class. In the section following this one, well talk
about some specific examples of visualization data blocks that can be used to describe data as
part of these calculations. But first lets take a look at three example classes of users who can
recognize return associated with BSMs data.
For each class of user, well talk about a short set of metrics of value. Management users
appreciate financial information, while IT wants to know the status of the services theyre
chartered to support. Customers of the business service want to know the status of that service,
when it may be down, and when they can expect it to return to service.
147
Chapter 7
For Management
Management and executives find value in BSMs summarization of the transactions being made
within monitored business services. This group of people is incentivized to ensure the current
and future profitability of the company. When companies make use of business services and IT
as a function for bringing income into the company, any situational awareness associated with
the rate and movement of that income is of value to this group.
Moreover, as the level of resolution increases for the data provided to this group, they become
increasingly better capable of making decisions about the products transacted through the
business system. They can make business decisions regarding changing those products. As an
example, they can alter the products presentation, or market them in different ways. When those
events occur, this group through the data presented to them can see immediately how those
activities relate to the rate of sale or other factors of importance.
Its important to mention too that products are not the only focus for BSMs financial and management
visualizations. If BSM is tied instead into services managed by the business system, the same kinds
of monitoring and visualizations can be provided to management.
If you take another look through the example visualizations shown in Chapter 6, youll see that many
of the sample implementations there relate to service-based industries and the continual
improvement associated with their activities.
KPIs. Key Performance Indicators are a central and critical meter for this group of
people. At the management level, KPIs often measure the performance of the business as
a financial unit. Without delving into specific kinds of KPIs, in traditionally manual
systems these valuable indicators may be presented to management only at intervals.
Decreasing the quantity of time between these intervals increases the resolution of the
data. It increases the quality of the information being provided to management. BSM
does this by measuring and reporting on KPIs in real or near-real time.
Overall Service Quality. Related to business metrics and KPIs is the overall measurement
of service quality as a whole. This single metric provides to management a singleglimpse understanding of the functionality of the business system of interest. As well see
in the data blocks later on, visualizations associated with overall service quality can be
created in ways to make it very obvious to management the exact point when service
becomes degraded and when it again returns to acceptable service. Knowing this
information assists them with performing their management duties.
Business Impact. Related not only to the quality of the service being provided to a
system, but also to the area in which problems are resolved is the idea of business impact.
When overall service quality is reduced for a service below acceptable levels, some
activity or element is the cause of that change. Often, multiple problems are present
simultaneously on a system. It is important that available resources be assigned to fix
those problems with greatest business impact first. The measurement of business impact
provides management with critical information to this end.
148
Chapter 7
For IT
ITs needs can be much different than those of business management. Whereas business
management concerns themselves with the viability of the company as a whole, ITs
responsibilities are scoped towards management of the computing environment. As such, IT will
be interested in validating the health and functionality of systems that drive business services. In
stating this, it is important to recognize that the same types of ingested data that fulfill the needs
of management are often used to populate metrics for IT. In terms of providing return back to IT,
some of the following metrics are valuable to ITs daily activities and long-term planning:
Service Level Statistics and Compliance. Mature IT organizations should have mature
SLAs in place for managing their relationship with the business. As stated in previous
chapters, the problem with many SLAs is in determining the quantitative measurement of
their fulfillment. As a manual task, this can be highly time-consuming and provides
results only at intervals. The calculations within a fully-realized BSM instance can do this
automatically and at regular intervals. By providing proactive SLA information at more
regular intervals, IT can better gauge how changes to the environment directly impact
service quality.
Mean Time to Restore. Problems within IT environments are a fact of life. Issues with
complex computing equipment happen all the time. Managing how those problems are
resolved is one of the major tasks of IT management. The fulfillment of this metric helps
IT understand how well-positioned are their resources. It also helps them reposition
resources to better fulfill problem resolution.
Application/System Performance and Availability. Performance and availability in terms

of both applications and individual systems assists IT with identifying bottlenecks within
systems. When either of these components experience low performance, hardware
upgrades may be likely. Measuring this performance over time helps IT plan for these
needs ahead of time, preventing costly short-fuse upgrades. In terms of availability, ITs
long term measurement of availability helps justify purchases as well as validates the
work being done in support of their systems.
Affected Users. As discussed above, identifying the affected users and the level of
affected users assists in positioning troubleshooting resources in the best way possible.
When multiple problems occur simultaneously, relating the problems to the number of
users being affected by the problem means that higher-impact problems are resolved first.
Application Processing Provisioning. Lastly is the identification of levels of resources

needed by individual application processes or threads. When throttling is enabled on
system resources, BSM provides data to assist IT with identifying the correct levels of
resources to provision.
149
Chapter 7
For Customers
Customers are a different group entirely than the other two discussed in this section. As external
Customers are typically non-trusted or semi-trusted members of the computing environment, the
level and type of data provided to these people should be much less than internal employees. As
discussed in our chapter example, any data released to televisions around the corporation will
likely need to be highly scrutinized to eliminate the probability of disclosure of sensitive
information.
That being said, customers of a business system are often outside the organization. Thus, as the
ultimate end-user of the system, they are most likely to want to know information about overall
system status. Three good metrics are helpful to end-user customers that provide this level of
information:
Scheduled Outages. When a BSM system is integrated with a notification system for
scheduled outages, this enables end-users to get rich alerting. Consider the problem of
being an end-user when this information is not available. When a scheduled outage
causes the system to be unresponsive to the end-user, if they have no information about
the timing of the outage, they are forced to re-attempt entry at regular intervals until the
system is again responsive. This involves a time cost for the end-user. By providing them
with a notification that shows when the system can be expected to be again available,
they can attend to other tasks until the expected return-to-service time.
Outage Notification. Outage notifications are similar in concept to the metric above, but
also include unscheduled outages. Similar to the problem outlined above, when a
business system experiences an unscheduled outage due to a problem, being able to
provide users with a notification about the problem lets them determine their next course
of action. Ultimately, providing more information of this type to the user means greater
user satisfaction during non-nominal periods.
Infrastructure Status. When simple outage notifications are augmented with additional
data regarding individual components of the business system, this status information can
also be of value. Providing information about individual system status helps more
technical users with better explanations about the activities theyre seeing currently onsystem.
With rare exception, any data provided to end users regarding the status of their experience helps
increase their level of satisfaction with the system.
150
Chapter 7
Example Visualization Data Blocks

In this chapter, weve talked in general terms about the types of return associated with some of
BSMs potential user classes. Here, weve discussed how BSMs visualizations help in providing
valuable information to those users to assist them with making decisions and managing the
system as a whole. But pictures are each worth a thousand words. The hard part with the general
terms used to this point is in truly understanding the types of visualizations that can be used
within a dashboard or other interface.
In this section, lets take a visual tour of some examples of data blocks that can be part of a
dashboard. These are intended to be examples only. Depending on the BSM solution chosen,
these data blocks may look slightly different. Others may be available or those shown here
unavailable within the software package. The intent with this section is to provide a
representative sample of the various ways in which data can be represented visually through the
interface. For each sample, well include a short description of the type of data block and in what
types of implementations it may be found useful.
Availability Charts
Figure 7.4: A representative sample of an availability chart.
Availability, either in terms of user or service availability measures the quantity of time over a
period in which a service can be used by its consumers. Depending on whether were measuring
that based on the service itself or the ability for its users to make use of that service, we may
want to include two metrics.
The charts above measure this timing over a period of time, in this case 24 hours. By providing
this measurement, the viewer can see immediately the overall health of the system.
151
Chapter 7
Control Charts
Figure 7.5: A representative sample of a control chart.
Control charts are general tools for relating numerical information. They are used to measure the
value of a metric over a period of time. The metric in the chart above is irrelevant to our
discussion. What is important is that any metric of interest can be measured over a preconfigured period of time using these types of charts.
Dial Charts
Figure 7.6: A representative sample of a dial chart.
A generalization of the chart used above to measure availability is the dial chart itself. Dial
charts are handy for both financial as well as IT-based metrics because of the very obvious way
in which they relate their data. Typically bad levels associated with metrics are associated with
the left side of the dial. Good levels are put on the right. We say good and bad here rather
than low and high because with some metrics the high value may represent a bad
condition. As you can see in the example above, the metric for service value is used in this
chart, but any bounded metric can be used with this type of data block.
152
Chapter 7
Metrics Charts
Figure 7.7: A representative sample of a metrics chart.
In some cases, actual values may be of interest to the consumer rather than a graphical
representation. In these cases, metrics charts can provide actual numerical values associated with
the metrics being gathered by the system. In the case above, we are measuring the Mean-Time
Between Failures and Mean-Time to Restore metrics for a series of elements. For each, a
value associated with the meeting or failure to meet the SLA is also positioned.
Metrics charts are generic in that any values that relate to each other can be used in the same
chart. Also handy is the addition of rules to metrics charts for particular columns. As an example,
in the chart above the value of the SLA column can be configured not as a direct measurement
but instead as a test based on the values of the other columns. This feature allows the data block
author to provide text values for numerical measurements when added clarity is needed.
153
Chapter 7
Pareto Charts
Figure 7.8: A representative sample of a Pareto chart.
Pareto charts are a type of bar chart that measures multiple values and plots them in descending
order from left to right. In the case above, the bar charts starting from the bottom are read left to
right with the instance of highest value on the left. Values are often related to percentages, but
this is not required.
Pareto charts also show for each value the cumulative percentage associated with the metric
measured. In the case above, the actual value for IT approximates 47 to 48 units of outages. That
integer value represents 40% of the total quantity measured across all elements in the chart. As
the line graph moves from left to right it represents the cumulative percentage associated with
each elements bar and those to its left. This depicts the top sources causing problems to the
overall service with IT being the first, intranet being the second, and so on.
Pareto charts are most often used to measure quality. Their main purpose is to highlight the most
important factors among a set of factors.
154
Chapter 7
6 Sigma Charts
Figure 7.9: A representative sample of a 6 Sigma chart.
When success and failure of processes can be measured quantitatively, the use of 6 Sigma charts
can be helpful in providing a visual notification of process quality. Well talk more about 6
Sigma in Chapter 9, but for now know that these charts show the level of successful and failed
processes instantiations over a period of time. They are handy in finding areas in which
processes are failing at inappropriate levels.
155
Chapter 7
Outage Impact Charts
Figure 7.10: A representative sample of an outage impact chart.
When outages occur, it is important to learn quickly the number and class of users being affected
by that outage. Internal calculations based on user count and revenue per user can augment
outage impact charts with rich levels of data. In the chart above, we see that three of the four
services in our example are currently down. But of those three services, the highest impact is
currently being felt by the HR service. Thus, resolving problems there will bring the most
number of users back to service. These charts help with the prioritization of resources during
outage events.
156
Chapter 7
Service Statistics Charts
Figure 7.11: A representative sample of a server statistics chart.
Along with the charts above, and similar to the metrics chart shown previously, service statistics
charts are valuable when actual data values are of interest to a consumer. For our example above,
we are showing actual values associated with downtime for a sample service. These charts are
particularly handy as drill-down elements. This allows the consumer to visually see a problem
through a more graphical element, and later drill down to actual values when desired.
157
Chapter 7
Stoplight Charts
Figure 7.12: A representative sample of a stoplight chart.
Stoplight charts provide notification to the user similar to how stoplights notify drivers when
they are required to stop or allowed to proceed through an intersection. With stoplight charts,
however, the red color indicates poor performance of the metric in the column for the service in
the row. Green indicates acceptable performance, while yellow indicates some measurement inbetween. The power of stoplight charts comes in the ability to identify specifically what each
color means. Thus, for different charts, the measurement for green and red can and is likely
to be different.
The value in this abstraction comes when the business decides to later manipulate the values for
what they assume to be good versus bad. The chart and its notifications need not change, but the
data the drives that chart changes on the back-end.
Heat Charts
Figure 7.13: A representative sample of a heat chart.
Heat charts are a particular type of stoplight chart that occurs over time. As you can see in the
chart above, the service Intranet experienced a period of middling performance between the
hours of 05:00 and 09:00, followed by a drop into bad. These charts are particularly powerful
over and above stoplight charts because they show an extra axis of data, namely the value of the
stoplight color over time.
158
Chapter 7
Business Calendars
Figure 7.14: A representative sample of a business calendar.
When businesses move to global operations, the spread of time zones between sites and business
services adds a layer of complexity to scheduling outages and providing peak levels of service.
Business calendars assist the consumer with identifying the peak and non-peak levels of service
when calculated across all time zones. This information comes in handy in identifying the best
times of day to perform activities on the system.
Service Quality (Real Time) Charts
Figure 7.15: A representative sample of an instantaneous or point-in-time service quality chart.
BSM is all about service quality that directly impacts the business, and these chart types are a
representative sample of showing service quality at an instantaneous moment in time. The chart
above shows for each of the services being measured how well those services are performing. As
with any of the other charts, the power with these charts is the ability to change the validation
logic in the background as the business identifies new or updated thresholds for what is
considered quality service.
159
Chapter 7
Service Quality (History) Charts
Figure 7.16: A representative sample of a historical service quality chart.
Similar in function to the chart that Figure 7.16 shows, but adding in the extra axis of time,
historical service quality charts provide historical quality information to the consumer. Similar in
difference between stoplight and heat charts, these charts give the consumer information about
when a particular service may have regressed into poor quality. In the example above, the
measured service is performing very poorly, with two movements into the completely down
state.
160
Chapter 7
Image Maps
Figure 7.17: A representative sample of an image map.
With any of the charts weve talked about thus far, it is occasionally useful to plot them against
an image of some form. That image enhances the visual notification associated with the metric.
Most often, these image maps are area maps or geographical maps, but they can relate to any
image that makes sense to the user and the metrics chosen. As you can see in the image above,
we are relating statusred versus greento particular areas on the globe.
Drill-Down Reports
Figure 7.18: A representative sample of a drill-down report.
With any of these elements, the ability to create hyperlinks from data block to data block
provides a level of added information to the user. In this case, drill down reports provide specific
information that describes why a graphic is represented in the way it is. In the case above,
clicking on a representative image drills down to specifics about individual services and their
status.
161
Chapter 7
Service Trees
Figure 7.19: A representative sample of a service tree.
Relating service quality and drill downs in another fashion are service trees. These types of
charts allow the user a single-view look at multiple metrics in a tree-view. Here, we can see the
rollup status of various services. By clicking on the plus sign next to any service, we drill down
to the dependant services below the major header. Creating multiple levels, often based on the
service model itself, provides the user a single-glimpse view of the entire environment. Linking
service tree data blocks to drill down reports provides the consumer with a holistic way of
determining exactly what is causing problems in the environment.
162
Chapter 7
BSM and its Visualizations Provide Return through OPEX Reduction

Operational expenses are recurring, which means that reducing them impacts the bottom line
over and over again. As weve intended to show in this chapter, the implementation of BSM into
a business service environment can provide a measure of return to many different user classes.
Management and financial individuals gain rich access to data with high resolution and low
turnaround. IT gains a single management interface for performing their regular activities with
systems and applications. End users also gain through greater understanding of the state of the
system they are using. All of this arrives through the data collected and calculated through the
BSM system.
In our next chapter, well close out our three-chapter series on obtaining maximum value from a
BSM implementation. There, well focus on the benefits gained by IT. Well talk about how IT
can tie BSM information in with business data, service desk data, and other infrastructure
metrics. Well also focus on how BSM can model different types of data (from infrastructure
data to business data) to generate meaningful information back to the users. As a result, targeted
user groups can make more informed quality decisions in their day to day work.
Download Additional eBooks from Realtime Nexus!

Realtime NexusThe Digital Library provides world-class expert resources that IT
professionals depend on to learn about the newest technologies. If you found this eBook to be
informative, we encourage you to download more of our industry-leading technology eBooks
and video guides at Realtime Nexus. Please visit http://nexus.realtimepublishers.com.
163

BSM Guide

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

BSM Guide

Transféré par

Droits d'auteur :

Formats disponibles

tm

The Definitive Guide To

Chapter 1: The Power of Business Service Management

Mission Critical B2B

Core Web Site

The Intent of this Guide

Business Service ManagementMore than a Framework

Elements Needed by Business Leadership

System performance metrics

Customer wait time

System performance thresholds

Customer drop rate

Network latency percentages

End user experience metrics

What Is a Business Service?

Example Business Services

Managing Business Services

Unacceptable ServiceThroughout any period during the customers shopping and/or

Unavailable ServiceThe customer may be able to successfully login, receive a

3rd Party Credit

Dashboards and Service Visibility

Alignment of IT and the Business

Once the IT organization understands the complexities of the business services as

End User Experience Monitoring

Should include OLAs .

Service Visualization is the idea of providing a graphical abstraction of a business service

Real-Time Service Visualization is the idea of providing a graphical abstraction of a business

BSM Empowers Decision Makers

Chapter 2: The Alignment of IT and Business

The Chasm Between IT and the Business

Responsibilities and Priorities

Overall Business Strategy

Why this chasm? Principally, due to an individuals scope of responsibility. As an individuals

Supporting existing infrastructure

Expanding the business

Users Become Customers

Figure 2.2: As IT grows, key indicators as to its maturity become apparent.

The Gartner IT Maturity Curve

Initiate problem management process

Alert and event management

Mature problem, asset, and change management processes

Defined services, classes, and pricing

IT business metric linkage

IT improves business processes

BSMs Impact at the Various Maturity Levels

Key Steps Toward BSM

Management data integration

Real-time service dashboards

Service desk integration

Real-time service augmentation

ITs Old Focus

ITs New Focus

Figure 2.5: Five common behaviors can be inhibitors to alignment.

Why Invest in BSM?

Revenue generating or revenue/cost impacting

Critical to the business

Supported by the infrastructure of IT

Integrated with business processes

Provided by a service organization, whether internal or external

Low Risk Implementation

Systems Management Systems Monitoring

The Value of Alignment

Chapter 3: IT Service Management Evolution