Académique Documents
Professionnel Documents
Culture Documents
Business
Service
Management
Greg Shields
Introduction
Introduction to Realtimepublishers
by Don Jones, Series Editor
For several years, now, Realtime has produced dozens and dozens of high-quality books that just
happen to be delivered in electronic formatat no cost to you, the reader. Weve made this
unique publishing model work through the generous support and cooperation of our sponsors,
who agree to bear each books production expenses for the benefit of our readers.
Although weve always offered our publications to you for free, dont think for a moment that
quality is anything less than our top priority. My job is to make sure that our books are as good
asand in most cases better thanany printed book that would cost you $40 or more. Our
electronic publishing model offers several advantages over printed books: You receive chapters
literally as fast as our authors produce them (hence the realtime aspect of our model), and we
can update chapters to reflect the latest changes in technology.
I want to point out that our books are by no means paid advertisements or white papers. Were an
independent publishing company, and an important aspect of my job is to make sure that our
authors are free to voice their expertise and opinions without reservation or restriction. We
maintain complete editorial control of our publications, and Im proud that weve produced so
many quality books over the past years.
I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially if
youve received this publication from a friend or colleague. We have a wide variety of additional
books on a range of topics, and youre sure to find something thats of interest to youand it
wont cost you a thing. We hope youll continue to come to Realtime for your educational needs
far into the future.
Until then, enjoy.
Don Jones
Table of Contents
Introduction to Realtimepublishers.................................................................................................. i
Chapter 1: The Power of Business Service Management................................................................1
The Intent of this Guide ...................................................................................................................4
Business Service ManagementMore than a Framework..............................................................4
The Chasm Between IT and the Business............................................................................5
What Is a Business Service? ................................................................................................6
Example Business Services..................................................................................................8
Managing Business Services..............................................................................................10
Dashboards and Service Visibility.....................................................................................13
Elements of BSM...........................................................................................................................14
Alignment of IT and the Business .....................................................................................14
The Evolution of IT Service Management.........................................................................14
Implementing BSM............................................................................................................15
End User Experience Monitoring ......................................................................................16
Achieving Management Value ..........................................................................................16
Achieving Operational Value ............................................................................................16
Achieving IT Value............................................................................................................17
ITIL and Six Sigma............................................................................................................17
Important Definitions.....................................................................................................................17
Business Impact Management ...........................................................................................17
Service Level Management................................................................................................18
Real-Time Service Visualization .......................................................................................18
Operational Metrics ...............................................................................................19
Service/Asset Metrics ............................................................................................19
Business Metrics ....................................................................................................19
Executive Views ....................................................................................................19
Fault Trees .............................................................................................................19
Impact Trees...........................................................................................................20
Business Calendar..................................................................................................20
Process Integration.............................................................................................................20
Workflow ...............................................................................................................20
Six Sigma ...............................................................................................................21
ITIL ........................................................................................................................21
ii
Table of Contents
BSM Empowers Decision Makers.................................................................................................21
Chapter 2: The Alignment of IT and Business ..............................................................................22
The Chasm Between IT and the Business......................................................................................23
Responsibilities and Priorities............................................................................................24
Early IT ..............................................................................................................................25
Users Become Customers ...........................................................................................26
Proactive IT........................................................................................................................26
Alignment Inhibitors......................................................................................................................27
No Common Dialog...........................................................................................................27
Mismatched Expectations ..................................................................................................28
Technology-Focused Metrics.............................................................................................28
Siloing ................................................................................................................................28
Reactive Mode IT ..............................................................................................................29
The Gartner IT Maturity Curve......................................................................................................29
Chaotic ...............................................................................................................................30
Reactive..............................................................................................................................31
Proactive ............................................................................................................................32
Service................................................................................................................................33
Value ..................................................................................................................................33
BSMs Impact at the Various Maturity Levels ..................................................................34
IT Focus Is Changing.....................................................................................................................35
Revenue Impact .................................................................................................................36
Competitive Advantage .....................................................................................................36
Agility ................................................................................................................................37
Reactive to Proactive IT.....................................................................................................37
Why Invest in BSM?......................................................................................................................38
Where it Works ..................................................................................................................39
The Dashboard Audience.......................................................................................40
Technicians and Administrators ............................................................................40
Managers................................................................................................................41
Executives ..............................................................................................................41
Where It Doesnt Work......................................................................................................41
Low Risk Implementation..................................................................................................42
iii
Table of Contents
Cost Containment Aspects.................................................................................................43
Governance and Compliance Aspects ...............................................................................43
The Value of Alignment ................................................................................................................43
Chapter 3: IT Service Management Evolution ..............................................................................44
Maturity Impacts IT Goals.............................................................................................................45
What Is an IT Service?...................................................................................................................46
Service Management..........................................................................................................47
The Timeline of Management and Monitoring..............................................................................48
Early Management .............................................................................................................50
Proprietary Agents .............................................................................................................50
Native/Agentless ................................................................................................................51
Focus on Value ..................................................................................................................53
The Evolution of Service Management Targeting.........................................................................56
Network Availability and Utilization.................................................................................57
Server Performance............................................................................................................57
Troubleshooting and Predictive Analysis ..........................................................................58
End User Experience..........................................................................................................59
J2EE & .NET Application Performance................................................................59
Service Level Management................................................................................................61
Business Service Management ..........................................................................................62
An Example ...................................................................................................................................62
Network Availability and Utilization.................................................................................63
Server Performance............................................................................................................63
Troubleshooting and Predictive Analysis ..........................................................................64
End User Experience..........................................................................................................64
Service Level Management................................................................................................65
Business Service Management ..........................................................................................65
Moving Along the Evolutionary Curve .........................................................................................66
Speeds Troubleshooting.....................................................................................................66
Improves Performance .......................................................................................................67
Fills Out Systems Vision ...................................................................................................67
Enables Proactive Management.........................................................................................67
Summary ........................................................................................................................................68
iv
Table of Contents
Chapter 4: Implementing BSM......................................................................................................69
BSM Provides a Business Focus to IT Operations ........................................................................70
Three Reasons to Implement BSM ................................................................................................71
Understand the Critical to Quality Services.......................................................................71
Manage Daily Risk and Improve Business Decision Making ...........................................71
Initiate Service Improvement Activities ............................................................................71
The Seven Steps of a BSM Implementation ..................................................................................72
Step 0 Preparation .......................................................................................................................72
Identify Key Project Members...........................................................................................72
Identify Stakeholders and Build the Project Plan ..............................................................73
Step 1 Selection...........................................................................................................................74
Identify Critical and Measurable Business Services..........................................................74
Assess Services ..................................................................................................................75
Assess Cost to the Business ...............................................................................................75
Step 2 Definition .........................................................................................................................76
Define Services ..................................................................................................................76
Define Service Requirements ............................................................................................78
Define Problems and Opportunities...................................................................................79
Define Critical Success Factors .........................................................................................79
Step 3 Modeling..........................................................................................................................79
Model Defined Services and Dependencies ......................................................................80
Model Associated Metrics .................................................................................................81
Build the Service Model ....................................................................................................81
Step 4 Measurement....................................................................................................................83
Implement Data Collection ................................................................................................84
Measure Services & Gaps..................................................................................................85
Step 5 Data Analysis...................................................................................................................86
Analyze Returned Monitoring Data...................................................................................86
Validate Measurements & Costing Assumptions ..............................................................86
Build Fault Tree Analyses .................................................................................................87
Build Impact Analyses.......................................................................................................88
Step 6 Improvement....................................................................................................................89
Locate Problem Domains...................................................................................................90
Table of Contents
Identify and Resolve Gap...................................................................................................90
Revise the Service Model ..................................................................................................90
Step 7 Reporting .........................................................................................................................91
Implement Dashboards ......................................................................................................91
Implement Notification......................................................................................................91
Hand-off to Operations ......................................................................................................92
A Carefully Planned Implementation Is a Successful Implementation .........................................92
Chapter 5: End User Experience Monitoring.................................................................................93
System Counters Alone Cannot Fully Represent the End Users Experience...............................94
Looking at the Wrong Set of Data .....................................................................................96
The Egg Timer Problem .................................................................................................96
System Counters Are Critical to the Systems Administrators and End User Experience Is Critical
to the System Users........................................................................................................................97
Agent-Based Monitoring ...................................................................................................98
Agentless Monitoring.........................................................................................................99
Understanding the CNS Spread....................................................................................100
Watching How Users Interact with the System ...............................................................101
An Example .................................................................................................................................102
Visibility ..........................................................................................................................102
Prioritization ....................................................................................................................102
Resolution ........................................................................................................................103
Improvement ....................................................................................................................103
Impacted Technologies ................................................................................................................104
Web Front End.................................................................................................................105
Packaged Applications.....................................................................................................106
Thin Client .......................................................................................................................106
Middleware ......................................................................................................................107
Databases .........................................................................................................................107
Importance to IT Goals ................................................................................................................108
Problem Identification .....................................................................................................108
Prioritization ....................................................................................................................109
Pre-Failure Warnings .......................................................................................................109
Finger Pointing Prevention...........................................................................................110
Clear Problem Communication........................................................................................111
vi
Table of Contents
Vendor Accountability.....................................................................................................111
Customer Satisfaction ......................................................................................................112
EUE Ties into BSM .....................................................................................................................112
Necessary for a Complete Picture of BSM ......................................................................113
Importance of Using Both Methods for Monitoring........................................................114
Proactive Awareness........................................................................................................115
EUE Drives BSMs ROI..............................................................................................................115
Chapter 6: Achieving Management Value...................................................................................116
Obtaining and Maintaining Value in a BSM Implementation .....................................................118
Obtaining Value ...............................................................................................................119
Maintaining Value ...........................................................................................................120
Calculating ROI ...............................................................................................................121
Cost to Implement................................................................................................122
Cost Savings Associated with Implementation....................................................122
Revenue Benefits .................................................................................................123
Management Visibility.................................................................................................................124
Visibility & Dashboards ..................................................................................................124
What to Display ...............................................................................................................125
What Not to Display ........................................................................................................126
Access Control .................................................................................................................126
Trend & Reaction Lines...................................................................................................127
Management Control ...................................................................................................................128
Control Dashboards .........................................................................................................128
What to Display ...............................................................................................................129
What Not to Display ........................................................................................................129
Management Impact on Operations .................................................................................129
SLA Measurement & Fulfillment ....................................................................................130
Purchase / Upgrade Decisions .........................................................................................131
Process Integration...........................................................................................................132
Fitting BSM into the Overall Operational Scheme..........................................................133
End User Visibility & Control .....................................................................................................133
System Status ...................................................................................................................135
Outsourcers & Service Providers.................................................................................................135
vii
Table of Contents
Cost & Risk Reduction ....................................................................................................136
Contract Compliance .......................................................................................................136
Enterprise IT ................................................................................................................................137
Cost & Risk Reduction ....................................................................................................138
Customer Satisfaction ......................................................................................................138
BSM Enables an Ongoing Measurement of Management Value ................................................138
Chapter 7: Achieving Operational Value.....................................................................................139
Post-Implementation Operational Achievement..........................................................................141
BSM Correlates and Consolidates to Make Sense of the Data....................................................143
Unifying Management Controls ......................................................................................145
Operational Visibility.......................................................................................................146
BSM as an Extensible Visualization Tool ...................................................................................147
For Management ..............................................................................................................148
For IT ...............................................................................................................................149
For Customers..................................................................................................................150
Example Visualization Data Blocks ............................................................................................151
Availability Charts ...........................................................................................................151
Control Charts..................................................................................................................152
Dial Charts .......................................................................................................................152
Metrics Charts..................................................................................................................153
Pareto Charts....................................................................................................................154
6 Sigma Charts.................................................................................................................155
Outage Impact Charts ......................................................................................................156
Service Statistics Charts...................................................................................................157
Stoplight Charts ...............................................................................................................158
Heat Charts.......................................................................................................................158
Business Calendars ..........................................................................................................159
Service Quality (Real Time) Charts.................................................................................159
Service Quality (History) Charts .....................................................................................160
Image Maps......................................................................................................................161
Drill-Down Reports .........................................................................................................161
Service Trees....................................................................................................................162
BSM and its Visualizations Provide Return through OPEX Reduction ......................................163
viii
Copyright Statement
Copyright Statement
2008 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that
have been created, developed, or commissioned by, and published with the permission
of, Realtimepublishers.com, Inc. (the Materials) and this site and any such Materials are
protected by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web
site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be
held liable for technical or editorial errors or omissions contained in the Materials,
including without limitation, for any direct, indirect, incidental, special, exemplary or
consequential damages whatsoever resulting from the use of any information contained
in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, noncommercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent
& Trademark Office. All other product or service names are the property of their
respective owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtimepublishers.com, please contact us via e-mail at
info@realtimepublishers.com.
ix
Chapter 1
[Editor's Note: This eBook was downloaded from Realtime NexusThe Digital Library. All
leading technology guides from Realtimepublishers can be found at
http://nexus.realtimepublishers.com.]
Chapter 1
Moderate Data
Processing System
Minor IT System
Figure 1.1: Monday mornings problem with a minor IT system actually drove a problem into a much larger
enterprise-wide system.
Turns out that the minor IT system wasnt so minor after all. In its end-of-month metrics back
to the business, FCGs IT reports no SLA violations for the month. Our Monday morning
problem didnt rise to the level requiring identification to the business, so ITs breach report
didnt include the outage. However, what IT never realized is that its minor IT system is
actually a small component of a much larger enterprise-wide problem.
The business actually felt the problem quite a bit more than IT did because that minor IT
system was a low-level component of a system thread linking all the way to FCGs Tier I
business-to-business (B2B) Web system. This B2B Web system handles all the purchasing,
returns, and delivery information for FCGs glass purchases worldwide. The role of the minor IT
system is to feed special-order delivery routing information to a moderate data processing system
that in turn feeds into the sites returns subsystem.
Chapter 1
FCGs 6-hour outage between 2:15am and 8:15am caused all incoming special orders to crash at
the most delicate step in the processafter the order had been taken and charged but before it
was completed with routing and delivery information. Because of that error, any special orders
received within that 6-hour period were charged to customer accounts but no delivery or routing
information was captured.
The end-result of todays problem is that the ordering department must undergo a timeconsuming and manual process to identify the failed orders and work with each customer
individually to populate its delivery and routing data. This manual process costs the business
moneyat a cost level typically associated with a Tier I outage.
You can see in the example the dissonance between the ITs priorities and those of the business.
What IT sees as a minor IT system actually affects the business in a highly critical way. IT is
not necessarily at fault for their mislabeling of the system. Theyre doing their job the best way
they can. Where the fault lies is in the essential translation from the priorities for IT and those for
the entire business.
Adding to todays problem is the global nature of FCGs business. Though the problem occurred
between the early hours of 2:15am and 8:15am (MST) in the United States where virtually no
B2B business is being transacted, business was only just beginning 9 hours ahead in Europe, the
Middle East, and Africa (EMEA). Sixteen hours behind, Japan and the rest of Asia-Pac are just
finishing their days. Understanding the business calendar and business periods means
recognizing how time shifts affect a globalized company.
Figure 1.2: Tracking FCGs problem along a graph of time shows the skew created by time zones. An impact
that occurs at 2:15am in one part of the world affects another part of the world in the middle of the workday.
Chapter 1
Chapter 1
BSM is more than just a framework. It is a fully defined category of software and
implementation guidelines. It ingests availability and performance data and outputs qualityrelated metrics to the business on the health of the networks business services. BSM applies a
dollar value to the reduction in quality for each identified service and serves up that information
on dashboards viewable and understandable to both IT and business leaders. Taking it one step
further, BSM represents the combination of Monitoring + Money.
The Chasm Between IT and the Business
Chapter 2 will provide a much more detailed discussion of the alignment between IT and the
business. That chapter will discuss how IT is maturing past the days of pure firefighting and the
break/fix mentality and will talk there about the differences in vocabulary and prioritization that
can be inhibitors to attaining a high level of organizational maturity. But lets take just a minute
here to talk about the dissonance between the mindset of the IT guys in the basement and the
executives on the top floor.
For most of the early years of computers and networking, IT organizations have used rich tools
to monitor the status of computer hardware and software. Utilizing such technologies as SNMP
for network and UNIX devices, Windows Management Instrumentation (WMI) for Windows
devices, Java Messaging Services (JMS) for Java applications, or any of the many Web Services
protocols, systems administrators have been able for years to query network-attached devices for
status, inventory information, performance metrics, and active configurations. Connecting these
technologies to a centralized Network Management System enables the IT department to build a
single-screen view into the network.
Figure 1.3: Mature IT organizations have for years incorporated Network Management Systems to monitor
and notify when network-enabled devices incur problems.
Chapter 1
That single-screen view helps to enlighten IT as to the health of the network and the devices and
applications that make up that network. If a device goes down, the Network Management System
notifies the administrator through a pop-up alert, an email, or a page to a mobile device that the
system has gone offline. Help desks everywhere have installed heads-up displays where green
lights go red when bad things happen.
When criteria for performance are preconfigured into the system, the same Network
Management System can notify administrators when performance dips below preset thresholds.
Highly mature IT organizations even define auto-remediation actions to occur when
preconfigured events occur. A mature IT shop has probably been proactively monitoring such
elements for years.
Where the chasm occurs is in the definition of whats important. IT tends to deem the status of
each individual device important. If a device is up, the light stays green. Business leaders have
different priorities, though. For a business leader, importance is best measured by customer
satisfaction, external service availability, and the capability to meet customer needs. If a
customer completes a Web site transaction and is satisfied with the results, the business leaders
light stays green.
But who really owns the service and is responsible for its quality? Is it the business leader who
pays for and relies upon the service? Or is it the IT organization that watches it, manages it, and
ensures it remains up and operational? According to BSM, it is a combination of the two. With
BSM, and the tools that feed its framework, each half of the ownership is provided with the
information it needs to make the best decisions within its universe of control. Table 1.1
highlights some examples of this idea.
Elements Needed by IT
Device availabilities
Service quality
Table 1.1: With BSM, information about a system and the services that reside on that system is broken down
into elements useful to its stakeholder.
Chapter 1
For the purposes of BSM, the identification of a business service at its most basic form is one
whose operation can be quantified in terms of dollars and cents. If a service can be measured by
some amount of cash that moves during its processingand therefore is missing when it fails to
be processedthen it becomes a good candidate for a business service.
Where the complexity arrives in defining such services comes in finding the lines of demarcation
between individual services. This process of breaking down a business service into its disparate
components is the next step in the BSM process and is really the most critical activity. Some
sample questions to consider:
Do we consider our order processing system a single business service? Or can we break
down the system into an order entry service, an order processing service, and an order
notification service?
Is our external Web site a business service? Does it provide quantifiable cash flow?
Is our internal Web site a business service? Will the loss of it impact our ability to
complete the daily flow of business, and if so by how much?
Is the functioning of our internal Windows domain a business service? What functions
rely on its faithful operation?
Which of our network devices and applications are most critical to operations and which
power only tangential operations?
You can see here that breaking down these services into ever smaller and smaller subcomponents
can be a daunting task. But it is the interrelation of these interconnected service subcomponents
that eventually builds what BSM calls a service model. If each business service subcomponent is
akin to a city on a map, the service model is the complete map including all the roads that
connect those cities.
BSMs service model lies at the core of its processing power. It is within the BSM service model
that dependencies between services are described and where individual service subcomponents
are logically interconnected.
Figure 1.4: In BSMs service model, services and service subcomponents are atomized and interconnected to
show dependencies. Upper-level services rely on lower-level services for functionality. The Quality of Service
(QoS) of lower-level services drives the QoS for those above them.
Chapter 1
Figure 1.4 shows an example of six business services. Each of these business services has a
quantifiable point of demarcation. Services at the bottom provide data processing of some form
useful to services that lie above them. Services above others below them require those
subordinate services to function properly.
The preceding model diagram shows a single core service. Generically, that service could be The
Company eCommerce Site. This business service is the ultimate endpoint in which the business
interacts with the customer, or essentially that portion of the internal network that the customer
sees. But that service relies upon a set of dependencies to function properly. Perhaps the two
second-level services are The Ordering Database and The Customer Database. Each of these, in
turn, relies on other subordinate services. These third-level services could be abstractions for real
network constructs such as The External Network, The Customer Authentication System, and The
Data Encryption System.
What is critical in determining the points of demarcation between such services is that they are
not necessarily aligned to network objects or individual applications. We do not define our
service model as The Network Switch that connects to The Network Database Server that itself
feeds The eCommerce Server Cluster. Instead, BSM requires that for most services, you add a
level of abstraction between the physical network object or application and the business
processes that it enables.
Figure 1.5: Filling in the blanks from Figure 1.4, you see how business services interrelate.
Chapter 1
Mission-Critical
B2B Web System
Customer Account
Auth. System
Inventory
Processing System
Order Processing
System
Customer Account
Database
Inventory
Database
Credit Card
Auth System
External Credit
Service Proxy
B2B Extranet
Credit Card
Extranet
Figure 1.6: A more detailed service model outlines the discrete functions of FCGs mission-critical B2B Web
systems and the dependencies between them.
Figure 1.6 shows the example business system broken down into its various logical components.
Each of these components resides on one or more physical devices within one or more data
centers. But, more importantly, each of these components is critical in some measurement of the
operation of the B2B Web system.
Immediately related to the core system itself are its Customer Account Authorization System, the
Inventory Processing System, and the Order Processing System. The Customer Account
Authorization System is used to store customer credentials using its dependent Customer Account
Database. It also allows for the authentication of customers using the B2B Web site. Its related
database stores account personalization information as well as logins and passwords.
The Inventory Processing System is used to manage the workflow associated with recognizing
inventory levels and acting on the information it gets from its dependency, the Inventory
Database. The Inventory Processing System also updates customer personalization information
within the Customer Account Database to note previous orders and to suggest potential future
orders.
Chapter 1
That same Inventory Database also works with the Order Processing System. The Order
Processing Systems responsibility is to ensure that a customer transaction is completed and
logged correctly when requesting a unit of inventory. Because all orders in the example system
are paid for via credit card, the Order Processing System depends on the Credit Card
Authentication System to process credit card information. An External Credit Card Proxy is used
to complete those transactions. Two separate networks are relied upon for the functionality of the
entire system. Those are the B2B Extranet and the separate Credit Card Extranet that is FCGs
connection to the credit card provider. You should immediately see the complexities involved in
deconstructing what seems a simple business service into its component elements.
Chapter 4 on Implementing BSM discusses this complex process in more detail.
Acceptable ServiceThe customer logs into the system, receives a successful logon
within an acceptable amount of time. They navigate through the system to find the items
of interest, also within an acceptable period of time. Once ready to complete their
transaction, the purchase is completed using a payment method of their choosing that
responds quickly and without exposing error, delay, or security compromise to the
customer.
With BSM, the monitoring and notification tools present in the suite need to provide information
to both IT and business leaders that actively validates the state of the service in real-time. Is the
B2B Web site currently in State 1 or has it degraded to State 2 or even State 3? And, if the
service has degraded to an unacceptable state, what impact in whole dollars is experienced by the
company per period of time?
Another of BSMs key benefits is its ability to better troubleshoot service degradation as it occurs.
When the service model is built with a high level of granularity and end user experience metrics are
configured into the system, BSM provides an excellent mechanism for drilling down into specific
problem sets. As 80 percent of troubleshooting is often just finding where the problem lies, this
feature speeds problem resolution. Chapter 5 will discuss the benefits and key components for end
user experience monitoring.
10
Chapter 1
To help you further understand the role of BSM in defining these states and the notifications that
occur when state changes happen, lets look at Gartners further definition of BSM and what is
needed for a software package to qualify as a BSM tool:
To qualify for the BSM category, a product must support the definition, storage, and
visualization of IT service topology or dependency maps. It must gather real-time
operational status data from underlying applications and IT infrastructure components.
And it must process status data against the object model to communicate real-time IT
service status.
Thus, managing business services means ingesting real-time status data from the physical
systems that make up business services and translating that into the abstracted service model.
That data can come from any number of placessystem-based, application-based, or even codebased, such as via Java, SAP, or CMDB API interfaces.
Once the data arrives, it is the job of the BSM system to apply predefined logic to that data to
determine the quality of each system. All this information is pushed in real-time to the
communication mechanisms (alert notifications, dashboards and reports) defined by the
administrator. Ill talk about those mechanisms in a minute.
I havent yet talked about the underlying applications and IT infrastructure components that a BSM
system relies upon for its monitoring data. Chapter 5 will do so and Chapter 8 will provide another
discussion in greater detail. However, it is worth mentioning that a BSM tool need not be the tool that
creates monitoring data. A BSM tool need only be capable of ingesting monitoring data and acting on
that data using the notification concepts that make up the BSM framework. Furthermore, BSM is not
intended to be a service catalog itself. Nor is it alone business activity monitoring or a business
process automation tool.
Before leaving the topic of managing business services, it is important to take a quick look at
what business services are not. Because the focus is so heavily on abstractions of physical
constructs into business processes and the health of those business processes, mirroring business
services to already defined business processes is an effective mechanism for encapsulating them.
11
Chapter 1
External B2B
Web Cluster
Kerberos Auth.
System
Java-based
Inventory System
ERP System
LDAP Database
Oracle Database
B2B Extranet
Router
Credit Card
Extranet Router
Figure 1.7: Although BSM can incorporate elements of physical infrastructure into the service model, it is not
intended to be an IT-centric view of the overall system. This image is therefore an incorrect abstraction for
the example B2B service.
Conversely, the incorrect way to abstract the business service is via a purely IT-centric or devicecentric approach. Doing service modeling in this way is no different from standard IT service
management. It serves only to provide the viewer with a device-centric view of the health of the
business service and complicates efforts to understand how the service impacts customer
satisfaction.
Chapter 4 will talk in greater detail about this process of building the service model.
12
Chapter 1
Both Chapter 6 and Chapter 7 will discuss this topic in greater detail.
13
Chapter 1
Elements of BSM
This guide is broken into 10 chapters, each of which will discuss one facet of BSM. Intended to
function as interdependent building blocks, each chapter draws on its predecessors to flesh out
the BSM picture. As you can see in Figure 1.8, these building blocks start with a description of
the as-is situation in most IT cultures. Well sidestep into BSMs value proposition related to
many networks existing deficiencies and move through the implementation activity with a side
conversation on the experience of the end user. The guide will then branch into three ways to add
management, operational, and IT value, and conclude with a discussion linking BSM with other
management frameworks such as ITIL and Six Sigma.
Figure 1.8: The chapters of this guide incorporate building blocks to guide the conversation towards a full
understanding of BSM as a viable and effective solution.
14
Chapter 1
Implementing BSM
Closing out the introductory material on the status of IT and its need for mature tools like BSM,
Chapter 4 will begin the process of explaining the design, installation, and configuration tasks
required to stand up a BSM instance in your environment.
The chapter will discuss the seven steps of a BSM implementation, starting with design tasks all
the way through implementation and constant improvement phases:
We begin with the Preparation Phase where project plans are outlined and project teams
are identified.
In the Selection Phase, you assess critical and measurable business services using the
criteria discussed earlier in the chapter and analyze each services cost to the business.
The Definition Phase takes the input from the Selection phase and makes key decisions
on which services to bring under management immediately, which to delay, and which to
remove from the project scope.
The Modeling Phase begins the process of data collection. Here, you tie identified
services into existing or new monitoring tools for data gathering and begin the process of
building the service model.
Once the initial service model is created and data gathering is complete, you continue
into the Measurement Phase. This phase involves itself with the measurement of services
over time, identification of gaps in monitoring, and validation of costing assumptions.
The Data Analysis Phase ingests the data gathered in the Measurement phase and
completes more rigorous analysis on that data to begin building fault and impact analysis
models.
Lastly, you implement the key Reporting Phase, where dashboards and other
visualization tools are implemented for key stakeholders to use.
15
Chapter 1
16
Chapter 1
Achieving IT Value
No conversation about the value proposition of an IT system is complete without discussing how
that system provides value to IT itself. Chapter 8 does just that. ITs needs for management and
monitoring are well established. However, BSM provides heretofore unrecognized additional
value through its unique way of looking at data. The chapter will discuss the business, service
desk, configuration, response time, and infrastructure metrics data available to IT within a fully
realized BSM implementation.
It will then dig deep into the IT technology that BSM implementers must understand to link the
BSM system into other systems on the network. Well explore management protocols such as
SNMP, WMI, WS-Management, enterprise messaging, and the Java messaging service and how
these tools are necessary for BSM to link into system data. The chapter will be complete with a
review of the data collection capabilities of a best-in-class BSM system and how these external
data sources connect to BSM.
ITIL and Six Sigma
BSM is a top-down, phased approach that first considers whats most critical to the business. Its
framework for deployment is based on industry-standard practices. Two of these industry
practices, ITIL and Six Sigma, compliment BSM to provide tangible return on investment.
Combining ITIL, Six Sigma and BSM provides rich capabilities for continual quality
improvement with a focus on the business.
Chapter 9 will discuss the ties that connect ITIL and Six Sigma with BSM. It will talk about the
practices and how they interrelate and how a business can use built-in BSM tools to populate Six
Sigma thought-driving and planning discussions. The chapter will also discuss ITIL and Six
Sigma best-practice metrics that are importable into a BSM infrastructure to immediately gain
the benefits of these complementary ideas.
Important Definitions
The next nine chapters will begin the process of educating you on the needs, processes, and
benefits involved in building BSM into your network. But before concluding this chapters
review on high-level topics, lets take a few minutes to discuss important key concepts that
youll encounter again and again throughout this guide. This section introduces concepts specific
to BSM and BSM implementations that will help you understand the necessary underlying
technology and processes associated with BSM.
Business Impact Management
Business Impact Management (BIM) is the idea of network management that monitors the status
of IT devices but not necessarily from a device-centric approach. BIM tools track QoS across
multiple devices but report on a service as a single entity that relies upon those devices.
Where BIM tools differ from traditional management and monitoring tools is in correlating
performance and event data across multiple IT facets for a roll-up view on business system
health. As an example, a traditional management tool may be able to notify administrators when
the network is slow or inoperable. But a BIM tool can wrap this performance shortfall
information with data from the application itself to get a holistic view of the entire system
performance.
17
Chapter 1
Service Level Management
Service Level Management is an ITIL construct that defines the process of constructing,
adjudicating with stakeholders, implementing, and documenting an agreed-upon level of service
for a particular IT system or subsystem as well as the management of the customer relationship.
The following list highlights examples of Service Level Management:
Service Level Management can occur between an IT organization and the business to
outline the specific and quantitative expectations of service quality to be provided for by
IT.
It can occur between the business and its customers, contractually outlining expectations
for service levels from the business to its hosted customers.
It can be contracted between a business and its resource providers. This might seek to
ensure that the business obtains the QoS it requires to provide services to its customers in
turn. It can also provide a basis for contractual remediation when the business does not
receive the contracted level of service.
Penalties avoidance (for providers) and customer satisfaction are factors as to why
organizations have SLM in place.
Service Level Management typically deals an organizations service catalog and performance
metrics associated with those services.
Real-Time Service Visualization
A proper definition of Real-Time Service Visualization requires the term to be broken down into
its two halves and defined piecemeal:
Real Time means simply that the data involved with a system is not snapshot-based but is
instead abstracted to relevant visualization tools as it arrives into the system. Real time is
best contrasted with traditional report-based data, which arrives to the consumer after
collection and preparation.
18
Chapter 1
Operational Metrics
Operational Metrics are those metrics used to represent the day-to-day health and quality of a
particular business service. Operational Metrics are typically measurements of status and
performance over time based on the behavior of a particular business system. These metrics are
concerned with the availability of a business system, its throughput and observed performance,
and its response time. Operational Metrics are used most often to understand the technical
quality of a system.
Service/Asset Metrics
Service and Asset Metrics are those used to identify, inventory, and generally understand the
physical characteristics of a particular service or asset. These metrics can be used to understand
the characteristics and effectiveness of individual services or assets and potentially drive
decisions as to their utility, efficacy, necessity, and reusability.
Business Metrics
Business Metrics are those that relate an item, a process, or an activitys function and processing
to how it impacts the financial position of the business. For items, business metrics can relate to
its age, its utility, and various elements of financial return on the item. For processes and
activities, this can relate to the efficacy of the process to produce value and/or the quantification
of any value provided by the process.
Executive Views
Executive Views are constructs within dashboard views that are specifically tailored for
consumption by non-technical business leaders. Executive views are critical components in a
mature BSM solution because they empower executives with the knowledge they need to
validate the health and quality of a business system. The BSM tenet of digestibility emphasizes
the ability for executives to understand, or digest, the information contained within their
visualization tool.
Fault Trees
A Fault Tree is a visualization tool used in a Fault Tree Analysis. In these diagrams, an undesired
effect is listed as the root of a logic tree. Each potential situation that could add cause to that
undesired effect is listed on the tree as branches towards its root. Subsequent situations that add
cause to upward-level causes are connected below cause items. Fault Trees are useful in the
identification of root cause for a particular problem and help with the visualization of the current
and future potential situations to identify and track affecting problems in a system.
19
Chapter 1
Impact Trees
Impact Trees are used as a visualization tool in identifying what connected systems could be
impacted by a fault within a particular system or system subcomponent. The element at the
bottom of the tree is typically the faulted item and all objects connected upwards from that item
are recognized to be in a faulted or partially faulted state.
One of the added benefits associated with the creation of the service model is the built-in
knowledge of how services impact each other. Thus, an Impact Tree can be created easily by
utilizing the service models interconnections.
Business Calendar
When an organization expands to global operations, that organization inherits the intrinsic time
skew that occurs across numerous and far-flung time zones. Because of this time skew, the time
frames for activity on network devices and applications change drastically. Because employees
or customers may reside in significantly separated time zones, activities on the network can
impact different geographical regions at different times of day.
The Business Calendar defines not only the operational periods of a service, but also takes into
account scheduled downtimes, as well as the importance of the various schedule periods, such as
peak, off peak, etc. The business calendar is time-zone aware, so truly global services can be
modeled and supported. The business calendar functionality also can automatically work out the
calendars of the supporting infrastructure from the business systems.
Process Integration
Process Integration encapsulates the idea of combining the processes from two separate entities
into a single, cohesive business activity. Process integration between disparate elements of a
system or disparate systems can involve the integration of the individual actions or code of those
systems. Across multiple business partners or between partner and customer, process integration
can involve data manipulation and activity manipulation to ensure that the outward data flows
from one organization correctly meet with the inward data flow of another. Use of industry
standardized processes helps to alleviate the cost associated with integration as both
organizations or system elements will utilize equal or equivalent mechanisms for ingest,
processing, and output of process data.
Workflow
Workflow is the sequence of steps necessary to complete an action while following the business
and technical rules of the acting organization. Workflow for a particular process can entail the
positioning of data, its processing, approval for that processing, the completion of tasks
associated with the data, and the logging of the activitys completion as well as other steps in the
process.
Workflow includes the processes intended to guide data from its creation, through its use and
storage, and until its destruction. Integrating workflow rules with BSM means that elements
brought to operator attention can be adjudicated according to predefined rules and stored for later
referral.
20
Chapter 1
Six Sigma
Six Sigma provides a quantitative methodology of continuous process improvement and
reducing costs, by reducing the amount of variation in process outcomes to a level suitable for
the given organization. It pursues data-driven, fact-based decision-making in which decisions are
tied to corporate objectives. It uses an implementation of measurement-based strategy that
focuses on process improvement and variation reduction (Source: Six Sigma for IT
Management, Sven den Boer et al, June 2006, Page 15).
ITIL
ITIL is a framework of best practices that can be used to assist organizations in developing their
IT Service Management process-driven approaches. ITIL recognizes five principal elements that
give guidance on the provisioning of quality IT services and the processes and facilities needed
to support them: Service Strategy, Service Design, Service Transition, Service Operation, and
Continual Service Improvement.
21
Chapter 2
22
Chapter 2
Where the difficulty often presents itself in these situations is in translating what is important to
IT into information that is digestible to executive leadership. If business leaders cant understand
the kind of data theyre being presented at the monthly metrics meeting, they cant make good
decisions on what to do with that data. This chapter talks about the dissonances between what IT
believes is important and what the business leaders want to see.
When these two groups speak the same language and share the same priorities, we say that they
have achieved alignment. This chapter will discuss the alignment issue as well as common
failures in alignment. Well talk about why misalignment occurs and what IT can do to develop
itself both culturally and technologically to resolve the problem. Throughout the discussion,
well incorporate what weve learned so far from Chapter 1 to show how the implementation of
BSM into the operating environment helps enable the alignment of IT and the business.
23
Chapter 2
CEO
CIO
Database Manager
Network Manager
Database
Administrator
Network
Administrator
UNIX Server
Administrator
IT Director
Server Team
Manager
Applications
Manager
Field Tech
Manager
Help Desk
Manager
Applications
Administrator
Field Technician
Help Desk
Employee
Windows Server
Administrator
Figure 2.1: An example org chart for an IT organization. Those individuals above the dotted line are typically
responsible for the overall business strategy, while those below the dotted line typically deal with daily
operational issues.
Figure 2.1 shows a typical organizational chart for an IT department. This chart shows the IT
Director reporting to a CIO, who ultimately reports to the CEO. The IT Directors direct reports
are the managers responsible for their portion of the network. Co-equals at this fourth tier in the
organization are each of the managers who lead their team of administrators. Those
administrators are typically identified as responsible engineers for specific portions of the
network: Bill the Windows Administrator is ultimately responsible for the functionality of
Windows Active Directory (AD). Jane the Network Administrator specifically manages the
network gear in the companys DMZ to the Internet. Bob in Applications really only manages
the B2B sales application.
24
Chapter 2
Also shown in Figure 2.1 is a dotted line representing the line of demarcation between those
individuals typically responsible for overall business strategy and those that handle daily
operations. Notice the bottom-heavy nature of the org chart in relation to that dotted line. Due to
the high positioning of the dotted line, it is here where we see the biggest chasm between the
goals of IT and those of the business.
Unlike individuals in sales and marketing who work with business-level goals as part of their daily
operational tasking, individuals in IT are often insulated from business decisions. The summation of
this insulation, along with the vocabulary created and used by IT, is a large contributor to our chasm.
Business Goals
Availability
Profitability
Managing change
Managing risk
Break-fix
Customer Service
Table 2.1: The goals of Early IT are often the least aligned with those of the business. As the business
attempts to grow itself, IT finds itself struggling to manage the existing infrastructure.
25
Chapter 2
This independence predominantly occurs because early networks can be quickly thrown together
to support the needs of the burgeoning business. The details of redundancy, service resiliency,
and manageability are swept to the side as the priority is to simply get the service operational.
The initial network hypergrowth combined with the hypergrowth of the early business often
means plenty of firefighting for IT.
Its worth stating here that firefighting and otherwise reactive modes associated with Early IT should
not be considered a black mark on IT itself. Reactive-mode IT is a necessary evil of any new
business endeavor. This firefighting is more an indication of the sheer magnitude of effort necessary
to build and manage a modern business network.
IT that begins to operate proactively begins to see the forest for the trees. They begin
understanding the metrics necessary to identify the health of the network. They start to recognize
the natural cycle of IT purchasing and plan more carefully for those purchases. And they begin to
recognize their role in the greater business, providing calculated support for the necessary
business services as theyre needed. The best Proactive IT teams align seamlessly with the
business and its goals.
26
Chapter 2
Lets take a look now at a more formalized model of IT Maturity, presented by Gartner. This
model expands upon what weve discussed in this section to talk about the key indicators
associated with the maturity level of an IT organization. The intent here is to highlight how the
organizations maturity parallels with its alignment to business. In all of this, well discuss how
the tenants of BSM are a catalyst for solidifying that alignment.
Alignment Inhibitors
No conversation on alignment is complete without a discussion of the behaviors that tend to
prevent that alignment from occurring. Weve already discussed alignment inhibitors throughout
the earlier text, but for completeness, lets discuss each in turn. As we consider the roadblocks to
getting IT and the business on the same page, think about where these elements are present in
your organization. Is your IT team a cohesive part of the business strategy or do they sit in their
own part of the building walled off from the rest of the employee base? Among the following
items, alignment between IT and the business principally means that IT and the business know
each other and say hi in the hallways as they pass by.
No Common Dialog
Our conversation in the hallway analogy rings true perhaps most specifically in terms of
vocabulary. If IT and the business are utterly incapable of communicating with a common
dialog, IT will forever be relegated to the left half of our maturity curve. Furthermore, when IT
and the business are incapable of talking, the business itself suffers. Others who figure out the
role of their own IT infrastructures dont get left behind in terms of competitive advantage.
Two things must happen for the common dialog to occur. First, IT needs to figure out
mechanisms for reducing the technical complexity in their communication. Like any good
college speech communication class, IT must learn to tune the conversation to the listener. From
a metrics approach, the understanding of how finance is realized in each business process is a
key component. For a BSM implementation, this component is critical as one of the major steps
in BSM is the identification of granular business processes and the assignment of dollar values to
each. Thus, it can be argued that the common dialog is really the first step in incorporating BSM.
The second element that must occurthough business leaders may not agreeis the need for
them to understand IT. If the business strictly sees IT as a cost center full of techies, alignment
can never occur. Business leaders need to see the value in IT-branded metrics as well. It is as
difficult for IT to derive service quality metrics based solely on business metrics as it is solely on
technology-based ones. The commonality of quantification goes both ways.
27
Chapter 2
Mismatched Expectations
As was explained in the chapter example, the availability expectations of FCGs IT department
didnt match what really happened. Knowing of a service outage and responding to it needs to be
augmented with the knowledge of how that outage affects the business. This is the central tenant
of BSM. Once each service is granularized and analyzed with an eye towards business impact,
those expectations begin to align. Interestingly enough, though the lack of a common dialog is
arguably the biggest inhibitor to alignment, the impact of mismatched expectations often causes
the most impact to the business.
Technology-Focused Metrics
The idea of mismatched expectations feeds directly into the issues surrounding technologyfocused metrics. When metrics are created as technology-centric, they lose the total realization of
the customers experience with the environment.
Consider our example critical B2B system. FCGs technology-focused metrics identified only a
very small outage to the environment, although that very small outage actually causedfrom the
users perspectivea huge problem. The loss of a single system may sound inconsequential on
paper, but the total impact to the business is large.
Only by flipping the metrics 180 degrees can we illustrate the system from the perspective of the
systems users. Ultimately those users are the reason for that systems creation in the first place,
so logically it only seems rational that that systems measurement of success is based on the
satisfaction of its users.
Chapter 3 will incorporate a detailed discussion about the maturation of monitoring tools and
techniques that ultimately culminates with BSM. Through BSMs framework and tools, we can flip
upside-down our traditional ways of thinking about monitoring data and enable metrics that more
closely align to the needs of the business.
Siloing
Siloing is the concept of individual toolsets or teams working in insulated environments where
their activities may not necessarily be communicated elsewhere. In siloed environments, the
activities can be unnecessarily replicated elsewhere in the environment, wasting resources on the
duplication.
From the perspective of misalignment, siloing can also represent the lack of communication and
mismatched goals between departments in an organization (Source:
http://en.wikipedia.org/wiki/Silo_effect). When IT-internal goals are siloed away from the rest of
the business as a whole, they lose the necessary collaboration that drives alignment.
Exacerbating this issue is the fact the there are significant silos within IT as well. The network ,
the servers, and the applications are all managed within separate silos, leading to metrics being
technology-focused, rather than business-focused. IT goals must be a component of business
goals and worked on in collaboration with the business if the two are to effectively merge.
28
Chapter 2
Reactive Mode IT
Our last inhibitor is one weve discussed at length in this chapter. When IT is operating at or
above 100% capacity with just the daily care and feeding of the network, it is impossible for
strategic thinking to occur. Though the process involves an initial cost, in order to elevate IT out
of strictly reactive mode, some team members must be permanently or quasi-permanently set
aside for the tasks of strategic thinking and long-term planning.
Theres an old IT joke that goes something like, I dont have the time to automate this process. Im
too busy doing it manually!
Figure 2.3: In the Gartner model, Proactive IT is only the third step towards maturity. Gartners model also
adds the trigger points and associated benefits enjoyed by organizations that achieve each level in the
model.
What drives the rightward movement of an IT culture is its acceptance of planning and
automation components. Youll see right off that Gartner identifies Proactive IT as only the third
step in the maturity process. With Gartners model, once an organization gets to the Proactive
stage, theyre only halfway to fully recognized maturity. The reason for this is the addition of
service-oriented thinking to the automation and planning components associated with the
Proactive stage. Service-oriented thinking aligns with the business service model that BSM
resides upon. Lets take a look at the characteristics associated with each of the stages in the
Gartner model with an eye towards how that stage integrates with the tenants of BSM.
29
Chapter 2
Chaotic
Earlier in this chapter, we discussed Early IT and the associated mindset. That mindset integrates
well into the Chaotic stage of the Gartner model. In the Chaotic stage, Gartner identifies a few
key characteristics (Source: These and all characteristics to follow are from the Gartner IT
Management Process Maturity Model, Transforming IT Operations into IT Service Management,
Data Center 2003, Deb Curtis and Donna Scott ):
Ad-hoc
Undocumented
Unpredictable
Multiple Help desks
Minimal IT operations
User call notification
These six characteristics identify an IT environment that is highly un-optimized. So much so, in
fact, the IT department has no capability of even understanding the underlying environment
itself. A problem that occurs in this environment is likely not realized until a user notifies IT that
the problem has occurred. Automatic notification capabilities are not established. No
documentation of services is available to track linkages between services and service
dependencies. The lack of control renders the environment highly unpredictable.
In the Chaotic environment, tools are purchased for tactical reasons and freeware and open
source tools are often chosen above enterprise-level tools due to initial cost barriers. However,
its important to note that some IT organizations in the Chaotic state have actually over-invested
in tools with the premise that purchasing a tool equates with improving service. And, because
they dont have mature processes, every department has their own tool, duplicating cost and
effort. Toolsets are siloed as are the personnel who use those toolsets. The culture in Chaotic
environments can involve organizational infighting as lines of demarcation are not wellestablished.
Relating the Chaotic environment to our discussion of BSM, it is very difficultif not
impossibleto bootstrap a BSM implementation into an environment that lacks even a modicum
of definition. The organization will likely require a shift to the right before it can consider a BSM
solution.
30
Chapter 2
Reactive
That first shift to the right for a Chaotic organization is a move to the Reactive stage. Without
repeating what weve already discussed about Reactive IT, lets look at a few of the
characteristics Gartner identifies with Reactive organizations:
Best effort
Fight fires
Inventory
Monitor availability
As you can see, the first shift adds a host of benefits to the IT organization. Although service
availability is still at a best effort stage and SLAs are likely not yet implemented, the
organization is at this point beginning the process of understanding its environment through the
inventory and availability monitoring process. Inventory and monitoring datathrough
predominantly up/down monitoringare feeding management databases even if the data is not
being acted upon.
IT within most companies resides in this stage of maturity. And interestingly enough, many ITminded professionals prefer to work in environments at this stage. At this stage, IT is still a bit of
the wild west but without the complete adhocracy associated with the Chaotic stage. Change
control measures are voluntary and unplanned service outages may still occur due to
miscommunication between the various components of IT.
With an eye towards BSM, environments in the Reactive mode are fully capable of
implementing a BSM solution. However, the implementation of that solution and its associated
service model will involve heavy documentation and formalizing of the environmenttaking the
wild out of the west if you will. Implementing a BSM solution at this stage will organically
shift the organizations maturity another notch to the right. Because of the nature of IT culture at
this stage, this shift can be painful for the employees within the organization.
As stated earlier, it is not necessarily bad that an IT organization lies in the Reactive stage. Only that
the organization has not invested the time and material into elevating key personnel out of firefighting
and into analysis and automation activities. The slow and steady incorporation of automation activities
has the tendency to organically drive this move. It need not be a dramatic change from Reactive to
Proactive.
31
Chapter 2
Proactive
Our previous discussion ended here with the Proactive stage, but Gartner uses this as a stepping
stone to the higher levels of IT service orientation. Proactive stage IT enjoys some very useful
benefits, and only here do those benefits begin the process of aligning IT goals with those of the
business itself:
Monitor performance
Analyze trends
Set thresholds
Predict problems
Automation
They key determinant in identifying a Proactive stage IT organization is the use and analysis of
performance data and how that data relates to the end user experience. Less-mature organizations
still dont have a good answer to the questions, why is the server slow today?, who is
impacted?, and how long has this been a problem?
With Proactive stage IT, the organization begins the process of actually using and acting upon
the data collected by the inventory and monitoring solutions implemented in previous stages and
adds the crucial component of monitoring performance from the end-users perspective. Here, IT
begins the process of understanding the underlying pinning of the network infrastructure and
how it impacts service availability. The maturity of internal processes at this stage begin ITs
ability to truly fulfill SLA guidelines because an understanding of the actual capability of the
network is known.
End-user experience monitoring will be discussed in detail in Chapter 5.
Very important here is that one major issue still lingers in the Proactive stage: SLA guidelines
and upward flow of metrics remain IT-focused and not business-focused. This lack of business
focus is the single issue that keeps IT from reaching the next stage of maturity.
Organizations that implement management and monitoring at this phase are making good use of
the data, but that use is still IT-centric. Implementing a BSM methodology and solution at this
phase will actually provide the greatest return for the business. The general service model is
relatively understood, if not used, at this point. It is here that the IT organization has the maturity
level to understand the cause and effect associated between availability of the service model and
how it affects business operations. A BSM implementation at this phase will rather quickly move
IT that critical additional shift to the right and will do so with the greatest return on the initial
investment.
In Chapter 1, we said that BSM really is the combination of Monitoring + Money. It is this ability to
relate monitoring data to monetary business impact that shifts an IT organization to the Service and
Value stages.
32
Chapter 2
Service
Few organizations mature on their own to the Service stage. Here, the IT organization truly
understands its role in the daily operations and long-term viability of the business. In Chapter 1,
we talked about how BSM helps illuminate the quality of a business service. Here within the
Service stage is where the dollars and cents figures associated with that quality of service are
actually identified and acted upon. That financial description of a service is the main component
of ensuring a successful BSM implementation.
Here in the Service stage, IT enjoys the following benefits:
Real-time infrastructure
Business planning
It is generally accepted that no organization has really reached this stage yet, though a fully
realized BSM implementation has every capability of launching a willing organization into this
stage. Whereas the Service stage is the embodiment of BSMs central tenants, the incorporation
of BSMs data into automated business decision-making and real-time service control and
management can enable a tightly defined organization to recognize this stage.
33
Chapter 2
Chaotic
Notification
Environment documentation
Network (in)stability awareness
Reactive
Proactive
Service
Value
Table 2.2: At each level along the maturity model, an organization can move towards IT and business
alignment.
34
Chapter 2
IT Focus Is Changing
Whether your organization chaotically fixes problems as they break or predictively recognizes
pre-failure warnings and auto-reconfigures resources to suit, IT in all types of organizations is
slowly maturing across the board. As the science of IT continues to formalize with business
process frameworks like ITIL and others, computing environments need not necessarily always
begin at the Chaotic stage.
Moving through the stages of maturity in an initial IT implementation is getting easier as
common processes and best practice approaches become more generally available to the public.
Along with that automatic maturity comes a slow change in the focus of IT away from the
device-centric and product-centric behaviors of the past and towards a service-centric focus.
This industry-wide change means that fewer organizations are spending less time in the Chaotic
and Reactive stages. The formalization of IT business processes also makes it much easier to
incorporate technologies such as BSM that augment those processes with data and automation.
Four major topics should immediately come to mind when considering how ITs focus is
maturing towards an integrated component of business: Its impact on total business revenue, how
a well-oiled IT infrastructure is a competitive advantage to business, the ability for agile IT to
enhance the agility of business, and how the movement to Proactive IT directly benefits the
business bottom line.
Device Availability
Revenue Negative /
Cost Containment
Technology Focus
Support the Business
Service Availability
Revenue Neutral /
Revenue Positive
Business Impact Focus
Part of the Business
Figure 2.4: For organizations at every point in the maturity curve, maturity naturally occurs due to industrywide effects. ITs old focus on cost containment and supporting the business is slowly transforming to a coequal relationship.
35
Chapter 2
Revenue Impact
Traditionally, the IT organization has suffered the business as a loss center. A necessary evil of
all modern businesseseven those not recognized as technology companiesIT traditionally
centers its budgetary activities around cost containment. If IT can trim costs through process
standardization and automation, the resulting giveback to the organization at years end can be
maximized.
This least worst way of budgeting for IT expenditures has had the effect over time of reducing
ITs ability to service its customers. Organizations that historically incentivized IT upper
management through cost containment goals often found themselves hurting in the long run as
expensive technology investments age and new mechanisms for access are made available.
Exacerbating this problem is the inability for many IT organizations to find effective IT-based
metrics that tie into business goals. IT organizations that lack metrics for quantifying the value of
IT to the business have the most difficulty in validating justification for new projects and
initiatives. Often in these organizations, getting new technology in the door is a measure of the
coolness factor or the this-product-is-no-longer-supported factor rather than any quantified
financial benefit.
Contrast this set of behaviors with those in companies who have determined a best-fit set of
metrics relating IT value back to the business bottom line. In these organizations, metrics are
readily available to justify IT expenditures. Rather than relying on the budgetary handouts of
executive leadership, IT serves as co-equal with the business in identifying and exploiting
business opportunities.
In many companies, this changeover to a revenue-neutral or revenue-positive environment goes
far into rapidly maturing IT and aligning its goals with those of the business. Relating this
conversation back to BSM, the technologies and frameworks that comprise BSM assist with the
value quantification process. With BSMs concern for the quality of a service comes ready-made
metrics for identifying its real business value.
Competitive Advantage
Along with the changeover from a revenue negative to a revenue positive organization, IT gains
the ability to drive competitive advantage for business. As an example, look at a widely spread
organization with employees working outside the traditional brick-and-mortar office. ITs
incorporation of rich, process-aligned remote access tools for non-traditional workers
automatically provide a competitive advantage to the business as a whole. Competitors who
require field workers to work through inadequate interfaces incur a time cost per field worker per
transaction per day for each quantity of data those field workers need to input. That time cost
translates into a business advantage.
In our example, mature IT organizations that are able to recognize the business value of correctly
implemented remote access solutions often get the approval to implement them. Then, after
implementation, correctly defined IT metrics ensure that such a system continues to be valued by
the business.
When IT matures to the level at which it is considered a co-equal part of the business that not
only enables business but also drives business, the business as a whole will enjoy an advantage
over competitors. This elevation doesnt come easily. Through organizational maturity comes the
necessary commonality of language and business relation.
36
Chapter 2
Agility
Very similar to the concepts of competitive advantage are the mature IT organizations ability to
rapidly reconfigure as necessary for the maximized functionality of business. Businesses,
especially SMB and mid-market businesses, require rapid shifts of resources as the market and
economy changes. Mature IT will understand the requirements of constantly shifting business
and implement technologies and processes that can operate in todays business environments.
IT organizations that constantly find themselves catching up as the needs of the business
change are not incorporating the tools and technologies necessary for automation and rapid
service deployment. The instrumentation data and related logic associated with BSM and its
frameworks help identify where automation provides the best value to business. As not all forms
of automation provide good return, a mature IT organization will use its monitoring, inventory
data, and trend lines to quickly determine the best fit for new tools and services.
Agility is a key indicator of an IT organizations stage of maturity.
Reactive to Proactive IT
Lastly is the key element of long-term planning. Immature IT organizations tend to wade in the
daily tasking of system care and feeding with little look towards the future. These organizations
often find themselves overwhelmed when a major service upgrade is forced upon them by
vendors. IT organizations that find themselves paying extra to use yesterdays technologies are
likely not performing the necessary planning. As the focus in IT changes and IT continues to
develop a common dialog with the business, these planning issues fall to the side as the business
budget entangles itself with the planning activities of IT.
37
Chapter 2
38
Chapter 2
Executive
Views
What is the Business Impact?
Business Metric
Views
What is the Problems Effect?
Service & Asset
Metric Views
What is the Problems Cause?
Operational
Metric Views
Figure 2.6: The data visualizations or views created by BSM build upon each other. Each representative
consumer is given real-time insight into the answers that make sense to their responsibilities.
Where it Works
There are some obvious places where BSM works best within an organization. Business services
as defined by BSM are:
Understanding that, BSM really only works when quantitative revenue metrics can be created for
the service in question. To fully implement BSM, the service must have some impact on the
revenue bottom line and that impact must be able to be quantified into terms of dollars and cents.
So what about infrastructure servers like Domain Name Servers (DNS) and Windows AD
servers? Though their outage may not necessarily directly impact a business service, the potential
exists that their loss can trace to some loss of service capability.
39
Chapter 2
Most computers rely on DNS for name resolution functionality. So losing that service can have an
impact on how your servers talk to each other.
In any case, BSM implementations are the easiest when the business already has an underlying
understanding of its internal business processes. The BSM service model should relate more to
those processes than the underlying hardware infrastructure. Though for those whose processes
are not well-defined, the service model creation process often results in a better understanding of
the componentization of those processes.
Relating back to our chapter example where John Brown and his IT team spend hours of time
each month on gathering, crunching, and reporting statistics, the real-time component of BSMs
data gathering goes far into reducing that process overall recurring cost. As an example, lets
assume that FCG implements a fully recognized BSM implementation that includes the service
model and all attached metrics required by the business. In this example, John Browns process
of gathering month-end statistics is reduced from a multi-person, multi-day task involving the
compilation of data from numerous systems in multiple data formats to simply printing off the
desired dashboard from the BSM system.
Obviously this is an extreme example, because our calculations on return dont include the work
involved with setting up the dashboards and configuring data connectors to each disparate
system. But whats important here is the concept that the automation components intrinsic to the
BSM toolset enable this data to be gathered. Once implemented, organizations that use BSM
tools can organically come to understand their data points of interest and incorporate them into
the proper visualization as they see fit.
The Dashboard Audience
As you can see in Figure 2.6, the typical organization of our resulting BSM visualization layers
data on top of each other. Each of the consumers of data in the business is presented with the
information relevant to their level of responsibility. What is critical here to understand is that
BSM centralizes visualization data from numerous otherwise segregated systems into a single
view. Rather than needing to look through two or three or four separate views using separate
tools to get an understanding of the network environment, BSMs data consolidation and
business logic rules allow for a one-screen picture of the network.
Technicians and Administrators
Following on with our description of Figure 2.6, the operational metrics typically received from
device monitoring systems align with our inventory system data for asset management. These
items help IT in the trenches understand the status of device health, location, and composition on
the network. Our individuals in the trenches get the information they need to track problems and
identify trouble spots on the network. This helps them answer the question: What is the
problems cause?
40
Chapter 2
Managers
Managers, both operational and strategic, also gain the luxury of the same data provided for
systems administrators. Adding to that data are the asset metrics that help them determine best-fit
for existing asset classes as well as planning data to help with expansions and future purchases.
Adding BSMs quality-based logic into the mix, managers can be alerted and react to
preconfigured management reaction lines. As the manager, both in IT and elsewhere is typically
responsible for the level of customer interaction and customer service associated with the
business service under management, they can come to understand the question: What is the
problems effect?
Executives
Elevating our discussion to the level of the executive, their desire for device-based information is
typically relatively low. The executive that cares about an individual routers failure in a remote
data center is relatively rare. What the typical executive does care about, however, is how that
routers failure affects the ability of their company to perform business. BSMs built-in logic
capabilities can parse data from numerous management systems to get a holistic understanding
of the networks health.
Where it excels over monitoring systems is in that logics ability to link to a dollars and cents
view associated with the outage. As is the case in our chapter example, the FCG leadership
would have much more information to work with had they known that the minor IT systems
outage linked so directly to the ultimate functionality of their critical B2B system. Some
examples of Key Performance Indicators (KPIs) that may be of value to the FCG executives are
Web site drop rate, number of orders placed or shipped, or customer satisfaction.
Having that information readily in-hand, the executive can take concurrent action with the outage
to maintain continuity of business and business relationships. As we show in the earlier figure,
the desire of the executive is to understand the question: What is the business impact?
Where It Doesnt Work
We cant talk about where BSM works unless we also talk about where it wont work, though
this half of the discussion is really the inverse of our list of potential business services. If your
business organization utilizes its network solely for the purposes of internal workflow
processing, BSM may not be the best fit for you. As stated numerous times, for BSM to be
valuable, a dollar figure must be placed upon a business service (and, therefore, the lack of it). If
the business network is used strictly for difficult-to-quantify internal workflow, BSM will not
provide the same level of value as to one that impacts the corporate bottom line.
41
Chapter 2
BSM
System
Instrumentation Data
Application Monitoring
System
Asset Management
System
Figure 2.7: BSMs data typically arrives from other preexisting management systems data gathering efforts.
Connectors are configured to pull instrumentation data from those systems into BSM for further calculation.
42
Chapter 2
Cost Containment Aspects
BSM has further benefits associated with cost containment activities. As weve learned in our
chapter example, the cost of even a single server outage can be high. Thus, increasing our total
effective uptime by even a small percentage has far-reaching budgetary implications.
As an example, if we assume that a highly critical customer-facing system has a desired uptime
of 99.5%, that equates to slightly more than 3.5 hours of downtime per month. If BSMs service
model and the service dependencies that link its elements help us arrive faster to a solution when
an outage occurs, we can translate that into a higher uptime. Increasing our uptime percentage
from 99.5% to 99.6% buys the organization an additional 45 minutes per month of uptime. With
highly competitive, high burn-rate organizations measuring loss at many hundreds of dollars per
user per month, this 45 minute gain translates into thousands of dollars of recurring cost
containmentmerely by having a better visual representation of the problem domain.
Governance and Compliance Aspects
Lastly, the additional costs of maintaining governmental and industry compliance records often
overwhelm unprepared IT organizations. The reality of many compliance regulations is that
some auditable technical control must be in place to manage and monitor systems for compliance
violations. Depending on the compliance regulation, those controls may have different objectives
such as preventing the release of Personally Identifiable Information or protecting data of a
financial nature. Incorporation of BSM helps aid the regulators in recognizing that due diligence
has been done on the part of the organization to understand its security posture and rapidly notify
when configurations go out of baseline or security triggers are flagged and noticed by the
organization. The historical reports and real-time dashboards of a BSM implementation easily
shows the security officer and auditors that those due diligence controls are in place and
operational.
43
Chapter 3
What the heck is a DEN RTR, he thinks as he starts to dial up his IT Director John Brown,
and why do I care if it failed a ping response? These sorts of middle-of-the-night mobile device
beeps must be commonplace for the IT guys. But Im too busy and too near retirement to get
blasted out of bed like this once a week. How do these guys do this all the time?
John answers the phone with an equally bleary voice. Whats a D-E-N-R-T-R and what
happens when it fails a ping response, asks Dan to his IT top gun.
John responds groggily, Gmornin, Dan. That means one of our backup routers in the
customer DMZ couldnt be reached by the monitoring system. Lack of a ping response tells us
that its not talking on the network.
Is this bad?
Not really, John explains, Weve got another router on that network that load balances with
the router that went down. When either one of them goes down, the other has enough bandwidth
to handle the load until we get it back up. Its nothing to really worry about. One of my guys will
fix it when they get in later this morning. That routers been giving us trouble anyway. Its
probably time to replace it with a newer model.
Nothing to worry about, eh? Thats what you said last month when we lost that minor IT
system. Dan thinks to himself as he hangs up the phone and tries to catch one more hour of
sleep.
You see, a little over a month ago that minor systems outage caused FCG a six-figure cost
overrun in their accounting department. And since then thats all thats been on Dans mind.
Outages. All the time, it seems.
Because of that outage and ITs mischaracterization of it, Dan has asked to have his mobile
device paged by the notification system whenever anything shows a problem. But in asking he
never realized just how often things went down. Whats worse is that they always seem to go
down in the middle of the night, and always right before a big customer presentation the next
day. Hes beginning to think that asking for this level of detail was a mistake, but he doesnt
want to back down now.
You see, it was Dans job on the line when he got called into the CEOs office to explain the
situation and the budgetary hit last month. He doesnt want to go through that experience again.
He needs a new router, then? he thinks as he falls back asleep, Well discuss that in the
morning.
44
Chapter 3
45
Chapter 3
What Is an IT Service?
According to the IT Information Library (ITIL), an IT Service is defined as:
A service provided to one or more customers by an IT service provider. An IT service is
based on the use of information technology and supports the customers business
processes. An IT service is made up from a combination of people, processes and
technology and should be defined in a service level agreement.
Taking this one step further, we should also define a business service. A business service is an
IT service that directly supports a business process, as opposed to an infrastructure service,
which is used internally by the IT service provider. The term is also used to mean a service that
is delivered to business customers by business units. Successful delivery of business services
often depends on one or more IT services.
Deconstructing this statement in relation to traditional IT managed devices and applications, an
IT Service can encapsulate the processing of data; the transmission of that data through
processing elements; the visualization, manipulation, and administration of that data through
users and administrators; and the data itself.
So in laymans terms, an IT Service is something ultimately recognizable by that services
consumer. If the consumer can see the forest for the trees associated with their description of
the service, then we have done a good job of creating the service as categorizable.
There is one major difference here between the ITIL definition and how a service is internally
defined by many IT organizations. Those organizations not at higher levels of maturity often
leave off the most important component of the definition: The business and workflow processes
that enable the successful use and management of that data as it proceeds from creator to
consumer. In effect, what many lack is the reason for the datas existence.
Figure 3.1: Numerous CIs make up an IT Service. Many IT organizations lack the organizational maturity to
include business process as a manageable CI.
46
Chapter 3
This removal of business relevance from service definitions, especially when dealing with the
monitoring needs of an organization, leaves an incomplete picture. Lacking those processes and
procedures leads to a techno-centric definition of IT Services which inhibits their alignment with
the business processes that rely on them.
Weve discussed at length the need for business-centric definitions in any mature understanding
of a business service. Throughout the rest of this chapter, well justify this need by filling in the
historical gaps of how that datas definition has evolved over time.
Service Management
Identifying services is one thing, but ultimately managing those services is yet another. One
definition of IT Service Management is the implementation of a strategy that defines, controls,
maintains, and enhances the IT Services for the enterprise. It embodies people, processes, and
technology in order to provide quality of service (QoS) for business objectives and operational
goals.
As weve discussed in Chapter 1, one of the major components of either a BSM installation or
really any Network & Systems Management (NSM) rollout is the identification, classification,
and granularization of IT Services into their atomic components. Removing BSM completely
from the picture, think about the steps you take in setting up monitoring within a traditional
NSM environment:
NotificationOnce identified and grouped, those groups are assigned notification rules
based on the need for IT to recognize state changes for elements within the group.
Element groups of high criticality are given more stringent notification rules than those of
lesser impact on operations.
Remediation AssignmentOnce the model becomes relatively static and the comfort
level of the organization increases, automatic remediation actions can be assigned to
regular events. Obviously the addition of automatic remediation components has an
element of risk, so rarely does this step in the process see use in full production.
47
Chapter 3
48
Chapter 3
Early Management
Network Management
Open systems Management
SNMP-based Management
Proprietary Agents
On -System Code
Vendor-specific Agents
Little Inter-Software Integration
Native / Agentless
Open Standards / APIs
Common Collection & Transport
System-based Protocols
Value Focus
End User Experience
Data Quality Trumps Quantity
Improvement Trumps Reaction
Business Value of IT
Figure 3.2: Four discrete timeframes in the evolution of systems management and monitoring.
49
Chapter 3
Early Management
In the beginning, there was the Simple Network Management Protocol. SNMP was originally a
network-focused protocol designed to provide a common framework for devices to relate state
changes to a centralized Network Management System (NMS). SNMP describes both the
protocol that transports state change information around the network and the framework that
defines each network devices remote monitoring and configuration capabilities.
SNMPs original goal was to provide the administrator information about the status of network
devices. Originally relegated to those devices only, the protocol and central NMS could notify an
administrator when a network device such as a router, switch, or firewall dropped off the
network, began losing an excess of packets, or otherwise entered into an undesired state. Early
NMS were, and in many cases continue to be, completely device-centric. Typical heads-up
interfaces display network maps with stoplight charts displaying the status and health of network
devices.
As time progressed, additional components were added to SNMP to manage elements other than
network devices. SNMP Management Information Bases (MIBs) were extended into the server
and application space as well as environmental control devices within the data center. SNMPs
device-agnostic architecture meant that virtually any device that retains on-board state
information had the ability to push that information to a central NMS through the SNMP
transport protocol. In fact, SNMPs capabilities are and were so generically designed that it
continues to this day as a major tool in network management for many devices across the
network.
Proprietary Agents
Over time, some of SNMPs weaknesses eventually came to light as administrators attempted to
use it as a tool for managing configuration change on the part of network devices. Whereas
SNMP is an excellent tool for reading device information, using it as a tool for writing a
configuration was found to have security implications. Also, while many devices eventually
grew the capabilities of pushing information through SNMP, few adapted that ability for
updating on-board information. Moreover, even as servers and applications started down the road
of providing SNMP-capable interfaces for monitoring for state change, not all server and
application functionality was exposed.
In this vacuum of remote management capabilities came forward a set of proprietary software
solutions that could provide the necessary management. Although SNMP was embraced by
virtually all vendors of network devices, those who built systems and applications found
themselves moving towards other third-party tools along with their on-board agents to provide
better access to system and application information.
Its important for us to stop here to talk a little about those early attempts at systems and
application management. The early attempts, with many still being used today, utilized
proprietary agents that are installed directly onto the system of management. These proprietary
agents were pieces of code that needed to be installed onto every system under management.
The agent would regularly inventory the system for configuration and state information, wrap the
result into a transmittable package, and send that package across the network to the NMS. The
use of installable agents for systems management had both its pros and cons (see Table 3.1).
50
Chapter 3
Pros
Cons
Table 3.1: The pros and cons of proprietary agent-based systems management and monitoring.
It should be noted here that agent-based utilities are not necessarily a bad thing. In fact, many
modern management and monitoring tools continue to make excellent use of agents as a system
component. Agent-based utilities have the ability to provide more management capabilities to the
administrator because additional capabilities can always be coded into the agent. And, all things
being equal, agent updates are often much easier than system updates.
The biggest downside of agent-based tools is the non-value added cost of managing agent
installations. This can be a time-intensive process and many tools dont include mechanisms for
locating rogue hosts on the network. Presence of these non-managed endpoints can skew statistical
reporting for agent-based systems.
Native/Agentless
As OS vendors began catching up to the needs of administrators for centralized manageability,
they began recognizing the value of including some of the agents function within the native OS
code. By providing agent-like code within the system and leveraging open standards for the
categorization, function, and transport of that agent-like data, it grew more and more possible to
manage systems purely from within OS-internal APIs.
Four components of native or agentless systems were required for the proper identification,
storage, and transport of configuration and state information to independent NMS systems.
Those components, pictured in Figure 3.3, are the state collection component, the storage
component, the API, and the transport component.
51
Chapter 3
On-Board
Collection Component
Network
rds
nda tocol
a
t
S Pro
en
Op sport
n
Tra
On-Board
State Database
Management
& Monitoring System
On-Board API
Agentless System
Under Management
Figure 3.3: The four components of agentless monitoring.
Your BSM implementation will likely include connections to agent-based as well as agentless
management components.
One major requirementand arguably the requirement that delayed the large-scale incorporation
of agentless systemsis the need for industry agreement on the API and transport component.
Think about the receiving NMS system. For that NMS to properly work with numerous agentless
systems of different vendors and classifications, economies of scale in terms of APIs and
transport protocols is necessary. Otherwise, the NMS vendor would need to code individual
interfaces for each type of system and each type of on-board APIsomething they are likely not
going to do due to cost implications. So until that industry agreement was realized, there were
few entries into agentless NMS.
52
Chapter 3
Agentless systems also include both pros and cons as components of their architecture (see Table
3.2).
Pros
Cons
Table 3.2: The pros and cons of agentless systems management and monitoring.
As you can see from Table 3.2, the agentless architecture is not necessarily a panacea for solving the
problems associated with proprietary agent systems. Agentless systems introduce problems of their
own as they solve others. The major limitation of agentless systems lies in their inability to rapidly add
new capabilities, relying on those built-in to the system to provide the brunt of the functionality. OS
vendors also typically update their code less regularly than would the vendor that supplies an agent.
Often, an OS vendor will update their APIs only at major milestone code releases. Thus, additional
functionality can take an extended period of time to be realized in the market.
Focus on Value
Our last phase in the timeline actually moves away from evolving the management components
on the individual endpoints. This phase engages itself with the addition of logic for things like
service contract fulfillment, service-based solutions, framework fulfillment, event integration,
service monitors, and end user response time evaluation among others. This phase in the timeline
of management and monitoring concerns itself less with the data collection and more about
improving the quality of the collected data, so its integration is decoupled with individual
endpoint management. Incorporating the logic addition means few if any changes to the
individual endpoints.
53
Chapter 3
On-Board
Service Level
Logic Processing
Visualization
Processing
Service Level
Valuation System
On-Board
Service Level
Logic Processing
Pro
p
nsp rietary
ort P
roto
co l
Proprietary Agent API
Visualization
Processing
Tra
Database
Network
s
dard
Stan rotocol
n
e
Op port P
s
Tran
Management
& Monitoring System
Figure 3.4: Valuation in NMS arrives as a Service Level Logic Processing component either within the
traditional NMS or separate from it but leveraging its data collection capabilities.
In fact, this addition can operate as a function segregated from the NMS itself, operating as a tool
that leverages NMS-collected data for further processing. The lesson here is that wrapping a
value system on top of existing NMS systems allows for the continued use of those systems
while non-destructively adding the value recognition system into the operating environment.
54
Chapter 3
Lets take a look at each of these example capabilities in turn:
Framework populationIT frameworks are only valuable once they are populated with
data. The introduction of management and monitoring into frameworks brings value to
their end results. Many IT frameworks assist with the troubleshooting and resolution
process but may be cumbersome to implement during times of critical outage. When data
is automatically populated into these frameworks, their use during outage incidents is
likely to provide more value.
Well talk more about these capabilities and how they enhance operational value of a system in
Chapter 7.
BSM is one tool that enables each of these components to be added into an existing NSM
environment. BSM implementations typically do not involve a rip-and-replace of existing
tools, so their incorporation is a low-risk activity. As well discuss in Chapter 8, BSM software
typically arrives with sets of data collection tools and aggregation functions to pull data from
existing NSM tools, both those that incorporate proprietary agents and newer agentless tools.
Because BSM can ingest data through multiple disparate toolsets, it enjoys the pros of each
toolset while pushing down the responsibilities of each toolsets cons onto that toolsets data
collection system. With this understanding of the timeline of management and monitoring, lets
continue our historical analysis to discuss how the targeting of service management has evolved
over the years towards the need for BSMs concept of service quality.
55
Chapter 3
BSM
Figure 3.5: The elements in Service Management targeting center towards BSMs service valuation.
56
Chapter 3
Network Availability and Utilization
The network is the common intermediary that all computer systems rely upon. Adding to this is
the multi-server nature of most business systems today. In todays networks, with rare exception,
any form of data processing requires the cooperation of numerous systems connected by the
network for proper functionality.
Thus, it only makes sense that this common touch point was usually the first component to come
under management and monitoring for most networks. When the network goes down, virtually
all data processing comes to a halt. As todays networks require constant uptime in order to
complete their daily tasks, any outage of the network becomes quickly critical.
Going beyond simple availability is the need for proper management of network utilization and
performance. Data processing needs as they move from system to system can require varying
levels of network bandwidth for their completion, and the management of that bandwidth and its
use is highly critical. Highly immature networks do not typically have any measure of network
utilization understanding. Thus, it is common for new services to be added to the network until a
user-noticeable change in performance is realized.
Like the old saying Once youre thirsty, its long past the time you should have taken a drink,
it is critically important for the stability and continued operation of the network for its managers
to understand what kind of data traverses it and at what volume. Understanding these
calculations goes far to ensuring that network augmentation activities occur before theyre
needed.
Server Performance
As the business grows more reliant on its network, each of its individual servers that make up a
data processing thread grows more important. Early businesses leveraged centralized computing
for most data processing, which meant fewer endpoints to monitor for problems. But as
interdependence of servers in a path became more complex, the computing model grew to a
heavy focus on decentralization and those endpoints grew geometrically. Moreover, data
processing requirements can be drastically different based on the needs of its consumers:
Resource-intensive operations can occur at times that conflict with other needs.
The scheduling of network infrastructure activities, which can be very highly resource
intensive, can conflict or interrupt business data processing.
Weve already talked about the complexities associated with the business calendar. Once a business
begins engaging in e-commerce or worldwide operations, the issues associated with multiple time
zones complicate the scheduling of business processing. This scheduling involves complex
mathematics (and therefore software and processes) to ensure conflicts do not occur.
57
Chapter 3
Because of the growing needs of the business for always-on processing, the management and
monitoring of these activities and their impact on server performance grows as the business
reliance on its infrastructure grows. Incorporation of performance management and monitoring
activities on individual servers ensures the visualization of resource use in a way that is
actionable by its administrators and capable of being planned by IT and business analysts.
Relating this need to BSM and our discussion on IT maturity, the problem with server
performance is usually not the recognition that it needs to occur. Rather, the problem often is in
what to monitor. With thousands of potential counters on each system, and differing counters
based on system or device type and installed applications, incorrectly separating the wheat from
the chaff in terms of counter efficacy is an inhibitor to good server performance management.
Troubleshooting and Predictive Analysis
Once an organization understands its network and the servers that reside on that network and
incoming data is tuned such that monitors are watching for valued data, only then can that data
be used for troubleshooting and predicative analysis. If you fully understand the incoming data
that you want to receive, you can relate that to the data youre actually receiving.
Through logical subtraction of these two elements, we can begin analyzing the quality of the
services under management:
When performance data goes above thresholds, we can recognize and resolve a resource
overuse situation.
When availability data consistently goes under thresholds, we can predict when to
implement compensating mechanisms for load balancing or failover states.
It is within this phase that we can begin assigning thresholds to individual system states. Later,
well augment those thresholds with an assignment of quality. That quality assignment
eventually becomes the dollars and cents valuation needed to implement BSM.
58
Chapter 3
59
Chapter 3
BSM
Figure 3.6: J2EE and .NET application performance bridges the layers of Server Performance,
Troubleshooting & Predictive Analysis and End User Experience.
These types of applications are specifically called out because of the nature of their pluggable
monitoring interfaces as well as their custom nature. Many businesses have incorporated custombuilt applications on these platforms for the purposes of business-specific data processing. Often,
these types of applications are coupled with Web front-ends and customer-facing interfaces,
making them highly critical in the eyes of a business customers.
Unlike Commercial Off The Shelf (COTS) software, these home-grown applications are built inhouse and may not necessarily contain needed management interfaces within their proprietary
code. Therefore, the best way to manage these types of applications is often through the APIs
within their residing code frameworks. As an organization leverages more of these home-grown
applications residing on pluggable frameworks, the more their critical operations rely on them.
Mature organizations leverage tools such as these for End User Experience to monitor and
manage them.
60
Chapter 3
61
Chapter 3
An Example
Lets now take what weve learned and incorporate it into a real-world example of a business
system, exploring how the management and monitoring of that business system can evolve from
network availability through each of the layers discussed earlier, ending up with a BSM
realization. In this example, well look at a simple Web-based customer-facing system, hereto
referred to as the system. That system is comprised of the components identified in Figure 3.7.
Figure 3.7: Our example system includes twelve components. Each components role is critical to the
operations of the system.
62
Chapter 3
As you can see in Figure 3.7, the system encompasses twelve individual components:
A directory server
An e-commerce server
And the six network connections that interconnect them to each other and to the Internet
Each of these components must work in concert for a customer from the outside to properly
connect to the Web server, locate the item they want to purchase, create and make use of an
account, and purchase the item. The outage of any of these components will involve a related
degradation to the total service quality to this systems customers.
Network Availability and Utilization
What should be immediately obvious, even to the inexperienced observer, are the linkages
between each of the components in this system. Notwithstanding the single connections between
components shown in the picture, the loss of any single component negatively impacts the ability
of the system to service its customers. If the systems Internet connection goes down in the
middle of the night, the system wont be able to service customers attempting to purchase items.
Additionally, should this business release a new product or product update that causes its
customers to want to purchase the item in large numbers, the network utilization between
disparate components could get oversaturated. Notification during these types of events that the
network is oversaturated will help to identify that there is a run on the product.
Server Performance
Going one step further in the obvious components of our system, it is likely that the owners of
the system will also want to know how utilization of that system by its customers affects the total
performance of the system. As we discussed earlier in this chapter, at this phase, were merely
looking to see overall component performance as a measure of the systems capability to service
its customers. As an example, if the resource metric % Processor Use elevates above 90% for the
E-Commerce component, there is a reasonable expectation that the system may be having issues
keeping up with the demands of its incoming customers.
63
Chapter 3
64
Chapter 3
65
Chapter 3
Figure 3.8: The data in BSMs visualizations can correlate the measure of poor quality alongside that
measures cost of poor quality and superimpose the cost to upgrade to help make better management
decisions.
66
Chapter 3
Improves Performance
Knowing the performance on a system means you can impact change in that performance.
Starting with network and working into whole system performance, the administrator
immediately gains a very rough insight into the inner workings of that machine. But trying to
look at whole system data to track down a deeply application-based problem is like cutting your
grass with a chainsaw. The solution works, but its not adapted specifically for the problem.
Moving along the evolutionary curve results in data that is more tightly focused towards the
problem domain. Adding to this is the nature of monitoring data itself. As monitoring data and
the tools that visualize it grow in quality, it gains drill-down capabilities to narrowly define the
issues that are affecting performance.
Fills Out Systems Vision
Computer systems are unlike physical systems in that it is not possible without tools to see the
electricity as it goes by on the wire. When building a physical system, it is easy to visualize the
machinations of the system because theyre right in front of its operator. But computer systems
require specialized tools just to see into the system.
Incorporating the correct suite of tools alongside the right process framework with which to use
those tools helps the operator better see into the inner workings of the system. Adding advanced
dashboards provides for better management decisions by non-technical users as well.
Enables Proactive Management
As stated in the previous section, its difficult for non-technical users to fully manage a system
when they dont have vision into the workings of that system. Leveraging instrumentation and
effective data processing of that instrumentation data means that managers can manage best.
Computer systems are meant to ease processes in the physical world, and not complicate them.
Thus, making the use of those systems as easy as possible for the individuals who need to
manage their interface with the physical world enables IT to make better, more proactive
management decisions.
67
Chapter 3
Summary
In this chapter, weve added a technical understanding to what we learned in the previous chapter
on the culture of IT organizations. Here, weve discussed BSMs relation to the bigger picture of
service management and monitoring as well as provided a historical basis for how that vision
came into being. Weve talked about the evolution of service management and the targeting of
service management to elements within the business system. Along with that, weve taken what
weve learned and provided an extended example detailing how the movement along that
evolutionary curve improves the quality of information coming into the business on the health
and quality of their critical systems. Throughout all of this, weve talked about how each element
in the history of service management eventually makes its way to BSM.
In the next chapter, well move away from our introductory conversations on BSM and dive
straight into the process of implementing it into a production environment. That discussion will
involve the eight steps of a BSM implementation: Preparation, Selection, Definition, Modeling,
Measurement, Data Analysis, Improvement, and Reporting. For each step, well discuss the
necessary tasks and elements to complete to ensure a successful implementation.
68
Chapter 4
69
Chapter 4
Heck, heres what Id love. Get me another little monitor I can sit right here on my desk that
just shows me how our systems are doingwhat our customers are feeling when theyre doing
business with our Web site. Something thatll give me the warm fuzzy that our systems are up,
were still meeting our numbers, and our customers are happy. Can you get me one of those?
John groans silently to himself, Performance data? Now he wants systems performance data
too? How am I going to get that to him as useable product?
Dan sees Johns concerns with his line of thinking, but he also recognizes the need for both John
and his IT department to start thinking strategically. Maybe he can turn this challenge into an
advantage for FCG, Heres what Im going to do. Its just about time we start setting
performance goals for next year. Im going to set a goal for you for next year to figure out the
answer to this problem. Ill take care of finding the funding and any business analysts you need.
You just figure out the technology.
John stands up to leave the office, wondering how hes going to figure this one out. Dan stops
him with a grin, Oh, and John. Do it fast. My wife hasnt been too happy either with your 4am
D-E-N-R-T-Rs.
70
Chapter 4
71
Chapter 4
Step 0 Preparation
Before beginning any project, the identification of team members and stakeholders is critical for
the division of responsibility within the project. Here within this step are a few key points
necessary to ensure that the project begins down the correct path.
Identify Key Project Members
Firstly, as was discussed in Chapter 1, one of the most critical components of identifying a
project team is the assurance of non-technocentrism. Although at first blush a BSM
implementation can involve much impact from the IT organization, BSM in and of itself is a
process-centric tool. The incorporation of too much technical input into the project team at the
outset can have the tendency to turn a BSM implementation into little more than an IT Service
Management implementation (e.g. with an inappropriate focus on IT elements).
72
Chapter 4
That being said, a BSM project team should include the following members:
Executive SponsorThe role of the executive sponsor is to fund the project and ensure
that that project stays within scope, budget, and relevance to its needs within the
organization. The Executive Sponsor will likely not be a regular contributing member to
the team, other than to provide overall guidance.
Business Service ManagerGenerally also the project manager, the Business Service
Manager is tasked with ensuring the overall success of the project as well as reporting its
status upwards to executive management. From a technical standpoint, the Business
Service Manager is responsible for defining the business services of relevance and
assisting with the development of their requirements.
Business Service Analyst(s)In conjunction with the Business Service Manager, any
Business Service Analysts assigned to the project team have the responsibility for
identifying and isolating individual business services, their requirements, and linkages
between business services. Their job here is to create the business service model and
populate that model with the necessary risks, linkages, and controls. Once the service
model is built and implemented and Step 5 Data Analysis has begun, the role of the
Business Service Analyst is to monitor and interpret the data being generated vis--vis
the model. This individual need not be of technical background, but rather a background
with deep understanding of underlying business processes.
IT Specialist(s)Once the service model is identified, that model must be connected into
data gathering and service monitoring tools. This function may be a part of the BSM
system itself or more likely may be components of existing monitoring and management
tools. The role of IT Specialists is to facilitate the proper connection of those tools into
the BSM system.
73
Chapter 4
Step 1 Selection
Step 1 embodies the identification of services that will ultimately become a part of the service
model. Here the analysts on the team will analyze business services from a process focus and
identify the lines of demarcation between individual services. Important here is that Step 1 is
merely an inventory and identification function. We are not yet defining services and their
representation. Here, we are merely getting our hands around those services that are in-scope, of
value to us, and out-of-scope for this iteration of the project.
The lead-in to this section mentioned that the BSM implementation can be a process that is never
truly complete. The project team must be very careful at the outset of the project indeed at this
phase to keep the initial scope aligned to services that are low hanging fruit.
You need not necessarily identify all the services and processes in your organization during your first
pass through the seven steps. Greater success is actualized by running through the steps more often
with fewer elements in the model than the inverse. Iterating through the steps with a smaller model,
especially during the initial adoption, provides early wins for the implementation.
74
Chapter 4
Figure 4.1: Copied here from Chapter 1 are our representations of a good service model breakdown on the
left based on the interrelation of business processes. On the right is an incorrect model breakdown focused
on individual devices.
Assess Services
While inventorying the services that the business provides, the feasibility of each services
ability to be easily categorized and quantified is also completed. Some business services are only
tangentially related to the established Key Performance Indicators of the business, and so will be
more difficult to quantify during early passes through the seven steps. The idea for our first pass
is to find those services that are most critical to the business and yet are easy to incorporate into
the service model.
Priority one here is to pick services that are most central and most critical to the business. Priority two
is to choose those easiest to work with. The reason for this is that the action of quantifying easy
services within an organization iteratively reveals new touch points for the later quantification of the
hard services in later cycles.
75
Chapter 4
Step 2 Definition
Once the inventorying of services is complete and the selection of candidate services for initial
inclusion into the model has been made, those services need to be defined in terms relative to
BSM. Within this, Step 2 Definition, the team will identify and solidify the boundaries of the
services of interest.
Define Services
The first step here is to gain as much knowledge as possible about the structure, behavior,
necessity, and relevance of the service. This service might have ties into other services unknown
to the project team or may have elements that make it more or less difficult to define as later
steps begin deconstructing its dependencies. So by defining each service as comprehensively as
possible in this initial step, much is learned about their inputs and outputs.
One mechanism for best documenting the service characteristics is to use a spreadsheet. Identify
categorizations of interest about the service that will assist in later plugging this service into the
BSM model. Some of those categorizations could relate to those in Table 4.1 below.
When creating this spreadsheet, ensure that each cell within the spreadsheet is atomic. Hybridizing
data within an individual cell means that that category has not been defined as elemental as is
necessary.
Another handy tool to use in helping to visualize individual services is to draw up use cases and
the associated data flow or process flow associated with that use case. As an example, for a
purchasing system, the use case might include each of the components of a purchase, from
browsing, to inventory validation, to shipping cart population, to checkout, to item delivery.
76
Chapter 4
Categorization
Description / Utility
Business Purpose
Users
Service Hours
Location of Service
Code Ownership
Outage Impact
RTO / RPO
Dependencies
Lastly, what other services does this service rely upon? This
information will be heavily used in creating the service model.
Table 1.2: The table above provides list of possible characteristics that could be used to identify a service in
the model.
77
Chapter 4
Define Service Requirements
Now that weve come to understand the nature of the service in a narrative format, we need to
translate the requirements of that service into quantifiable metrics we can use to measure its
quality. This component may be one of the most important activities to be done for each
individual service as this activity identifies the numbers by which the BSM systems
mathematical logic uses to translate a loss of service quality into a numerical result. Three
elements must minimally be identified and values assigned:
ReliabilitySlightly more difficult is the scoping of how often this service can become
inoperative or undergo a loss in service performance. Some services have a greater
tolerance for an outage. Some services include redundancy features that limit the scope of
an individual element outage. Depending on how the service was categorized, those
redundancy features may or may not be included in this calculation. Be aware that this
information will be used heavily in identifying and measuring the quality of the service as
compared with the desired level of service. Metrics to use here include: Acceptable mean
time between failures, acceptable mean time to repair.
78
Chapter 4
Step 3 Modeling
Once an inventory of the desired business services has been collected, the connection of those
services can begin. Looking above in Table 1.2, each service should have a list of dependencies.
Those dependencies will go far in helping the team identify the connections between services.
The resulting service model will be a top-down decomposition of the business service in relation
to its constituent components. One artifact of this process will be the creation of ever more
detailed hierarchical diagrams identifying business processes in relation to the processes and
resources that support it.
79
Chapter 4
Failure Model Effect Analysis (FMEA)This is a tool used to identify and categorize
the risks associated with potential failures within a system or a process. This tool
identifies the possible failures that can occur within a system and prioritizes these failures
by the seriousness of their potential consequences, their frequency of occurrence, and the
ease in detecting them. FMEA is most often used as a bottom-up approach to failure
detection. This augments the top-down approach to generating our BSM model. Here, the
FMEA tool assists with identifying how the failure of a dependent component can impact
the quality of the top-level service.
Fault Tree Analysis (FTA)A more top-down approach is the completion of a Fault
Tree Analysis against the system. This thorough system for documenting the probability
of fault amongst various logically linked situations helps in categorizing the risk of a
system and where that risk may manifest. FTA is handy for adding numerical values into
the BSM system.
80
Chapter 4
81
Chapter 4
Figure 4.2: Once the service model is fully realized, the next step is to connect its processes to the IT
functions that drive its data. This mapping is used by the BSM system to populate the model with metrics
information.
82
Chapter 4
Step 4 Measurement
Once the modeling is complete, our next step is to link the designated monitoring and measuring
tools into the service model. It is within this step where much of the effort within the enabling
BSM software tool begins. Here, for those services and their associated metrics previously
identified, categorized, and modeled, well begin the process of actually measuring the metrics
we aim to obtain. In later steps well take this information, analyze it for gaps in service, and use
it to drive change within the environment design.
In Step 0 of this process, we discussed how much of the BSM implementation process does not
necessarily require heavy involvement with the IT organization. However with Step 4 comes
much of the work needed by IT specialists. Here, the team will be implementing or otherwise
coding the necessary connectors that pull data from disparate systems into the BSM software
platform. Those skills are often highly-specialized and often are specific to the type of software
platform the BSM system attempts to connect into. It is important for IT to be part of the process
up until this point so they can prepare the systems from a technical standpoint for the monitoring
plug-ins necessary to begin measuring.
Remember too that BSM is not intended to rip and replace existing monitoring systems. Nor in many
cases is it intended to be a systems monitoring system of its own. The organization likely already has
monitoring tools in place that leverage technologies like SNMP, NetFlow, WMI, WS-Management,
and other management protocols on which monitoring data is already being collected and stored. The
BSM implementation can simply pull from that data for its metrics needs.
83
Chapter 4
84
Chapter 4
End user experience monitoring tools may additionally be attached into the customer-facing
interfaces of externally-facing systems. Well talk at length on these types of tools in the next
chapter. But for now know that the code frameworks that typically drive these customer-facing
tools often have built-in monitoring toolsets that allow for the integration into the BSM system.
Processes like synthetic queries and scripted actions can simulate the load of a particular user
and determine their wait time (e.g. their experience) while using the system. This information
ties into KPIs associated with customer satisfaction.
The BSM environment may also tie into Service Desk applications to get a time-based
understanding of how user experience drives incoming requests and complaints. One effective
KPI for measuring the quality of an external service is to monitor for incoming tickets alongside
end user experience monitoring. By doing this, an organization can discover what the pain points
are with their particular brand of customer. Some customers may be more or less willing to
handle elements of pain within systems. The rate of generated ticket workloads can drive a better
understanding of how those users are experiencing the system.
Measure Services & Gaps
As the team begins to implement data collection tools around the network, the BSM system will
begin measuring the quality of each listed service that makes up the model. Areas where data is
not yet incoming will show as gaps in desired metrics. We are not yet to the step where we can
begin implementing reporting and dashboards to visualize those metrics, so careful attention to
system data as it arrives into the system will identify where KPIs are being measured and where
gaps still exist.
At this point, a review of the existing data coming in as related to KPIs currently in place or
desired to be in place by the organization is an excellent double-check against the service model.
The service model, though considered frozen in its first iteration by the project team, may
require additional work to pull the necessary data required by the system. This can also manifest
as calculations that are lacking necessary data to properly represent loss as a measure of service
quality.
85
Chapter 4
Once the initial connectors are in place, the BSM system begins collecting data from the various
systems throughout the network. The BSM system, when configured with appropriate metrics
and logic associated with those metrics will apply cross-device and cross-application
computations to determine health and quality status.
Within Step 5, we will begin the process of analyzing the incoming information and trending that
information to see if the data weve expected to receive aligns with the data we intended to
receive. Once this process is complete and we are ensured the validity of the model, we can
begin to analyze the system to see where gaps in service occur. This may be based on bad service
quality or customer ratings, system overloading, element response time, or transaction
throughput.
Two tools used to find these gaps that well discuss later in this section are Fault Trees and
Impact trees. A component of the service model, these two tools identify where the root causes
and overall impacts to a service degradation or outage occur.
Analyze Returned Monitoring Data
Initial incoming data arrives in a relatively raw format. This raw data often needs to be converted
into a format useable by the calculations required elsewhere within the system. The process of
converting this data may involve multiple steps as data may require multiple refactoring based on
the target metrics required for it. As an example, performance data may arrive in a binary format
and need to be refactored into a numerical format for analysis in comparison with outage metrics.
It may require additional refactoring into a third format based on measured time to compute its
data in comparison with performance data from other kinds of systems.
Validate Measurements & Costing Assumptions
During this learning mode for the model, it is also important to validate that the measurements
as identified by the project team are to scale and include correct units. Converting KPIs into
measurable statistics can be a complex mathematical and logical task which can involve unit
comparison and conversion between multiple systems. This depends on each individual systems
capability to supply the data in the correct units. Its also essential to validate measurements
taken against the original data to make sure they are computationally correct.
One function of validating necessary measurements is the ratification of costing assumptions
made during the model generation. The initial determination as to the cost associated with a loss
in service quality or a loss of service altogether may have been based on faulty or misleading
information. Or the data arriving into the BSM system may not confirm the assumptions. It is
important here for the reliability of the system that the earlier costing assumptions are related to
the actual data arriving in-system.
86
Chapter 4
Build Fault Tree Analyses
Once the model is validated as correct the data within the model can be analyzed in comparison
with desired metrics to help identify where individual components of the system are not
performing to specifications. Now that the model is in place and functional, the reduction in
quality of each individual element can be related to how that element affects the whole. For
example, in Figure 4.3 the completed service model now shows how a change in performance of
the Inventory Database impacts each of the services that rely on the Inventory Database.
Here we can see that the performance of our Inventory Database has gone above our desired
specification of 5000 transactions per second. That reduction in quality directly impacts the
Inventory Processing Systems capability to respond to inventory requests fast enough. It also
affects the Order Processing Systems capability to fulfill orders as a fulfilled order will change
the level of inventory. Ultimately, each of these metrics directly relates to the customers
satisfaction or dissatisfaction with their experience.
Weve discussed thus far about how a major component of the reporting step involves the
digestibility of information specific to its consumer. Here we see how this information is
immediately of value for multiple consumer classes, depending on how it is presented. Nontechnical executives and business leaders can get a high level representation of the system and
associated (lack of) service quality. They can use this data to make decisions about the business
in general or additional purchases to augment the design. Administrators gain necessary
information to help them quickly troubleshoot the problem.
Its worth noting here that prior to having this model in place the organization may not have been able
to trace how unhappy customers were impacted by a loss of performance in a down-level system.
BSMs built-in fault tree analysis tools provide both the IT department as well as the business
leadership the data they need to make the right decisions. That decision may be to purchase a
second load-balanced Inventory Database or vertically scale the existing one.
87
Chapter 4
Mission-Critical
B2B Web System
Customers
Unhappy!
Customer Account
Auth. System
Inventory
Processing System
Order Processing
System
Inventory Delay
> 400 ms
Customer Account
Database
Inventory
Database
Orders
< 50/sec
Credit Card
Auth System
Transactions
< 5000/sec
External Credit
Service Proxy
B2B Extranet
Credit Card
Extranet
Figure 4.3: Our completed service model begins to show how a fault in the Inventory Database in this case,
the number of transactions per second going above specifications can impact the systems above that rely
on that database.
88
Chapter 4
Mission-Critical
B2B Web System
20 User
Drop Rate
Customer Account
Auth. System
Inventory
Processing System
37 Inventory
Changes Delayed
Customer Account
Database
Inventory
Database
Order Processing
System
17 Orders in
Wait State
Credit Card
Auth System
Transactions
< 5000/sec
External Credit
Service Proxy
B2B Extranet
Credit Card
Extranet
Figure 4.4: Relating the information within our service model to an impact analysis, we begin to see how the
out of specification performance of our Inventory Database is directly impacting other functionality.
Ultimately, our system is seeing a 20 User Drop Rate due to this problem.
Step 6 Improvement
Step 6 in our process asks the question, So, now what do we fix and why? Up until this point
our process has been driven by the need to populate the framework with data. Also a component
up to this point is the analysis of the collected data to find where gaps exist in the best design of
our system. Here in Step 6 we finally get to take what weve learned thus far and turn it into
productive change for the service under monitoring.
89
Chapter 4
It's worth mentioning here that it is entirely possible that no improvement may be needed. Monitoring
is, by nature, a long-term activity. Thus, the time delay between Step 5 and Step 6 may in actuality
involve a period of time. Two elements can characterize this time delay.
First, there may be no problems whatsoever with the service. Though we all wish this was the case in
our systems, the selection criteria for our first service was to find the one that was already causing us
the most pain. So, the likelihood of this occurring is low. What is more likely is
Second, finding no actionable data within our system means that something within our model is likely
missing. That may be an unknown connection to an IT function, a missing step in the process, or a
metric that is either missing or mischaracterized. If the project team finds themselves reaching Step 6
and finding little to action upon, circling back to Steps 2 or 3 may be in order for additional discovery.
90
Chapter 4
Step 7 Reporting
Our final step in running through the seven steps is setting up the dashboards and associated
reports. This step is set to last as it gives the project team time to analyze data and refine the
model before tying down the model to specific reporting functions. Once reporting is configured,
it has the tendency to freeze the model. This is the case because model and characteristic or
metric changes will impact reports.
Once reports are enabled for consumers, they typically grow reliant on them. So, be aware that
making reporting available is often a step that involves strong configuration control.
We will actually discuss in detail the process of creating reports and dashboards in Chapters 6
and 7. So for this chapter well review only from a high level the steps necessary.
Implement Dashboards
Dashboards are often skinned web sites that dynamically update data as necessary. We say
skinned because a visualization image is often used as the base layer. Atop this base layer are
overlaid data representations like Stoplight charts, Control charts, Pareto charts, and other tools
that represent numerical data in a graphical format. There are some best practices for developing
dashboards that well discuss in detail in Chapter 7. But for now know that this step will involve
equal measures of graphic design, data manipulation and visualization, and knowledge of web
platforms.
Implement Notification
Notification elements can also be added into the BSM system. Similar to the problems we saw
within our chapter example, the executives and business leaders in the organization want to be
notified when situations occur that they can understand and resolve. They arent necessarily as
interested when individual IT functions (like Johns extranet router) have problems that they are
unable to act upon. Once the BSM implementation is in place, notification components can be
enabled that provide valuable data.
As a continuation of our chapter example, once FCG implements their own BSM system, they
may consider removing Dan from all the device notifications he is currently receiving. Instead,
he may want to know when the drop rate for the web site goes beyond a certain metric. The
failure of an individual router means little to him and his position and skill set cant add value to
its resolution. But when the web sites drop rate goes below acceptable parameters, that situation
impacts FCGs bottom line, which is a problem that he can identify with and help to resolve.
91
Chapter 4
Hand-off to Operations
The final step in this process is the ultimate hand-off to operations. This step involves fully
documenting the configuration, entering that configuration into the organizations configuration
control tool, and dissolving the project team or redirecting it to another spin through the seven
steps for another service.
Another component of this hand-off is the procedures necessary to monitor and maintain the
model. Spinning up another project team for minor updates to the model can be a waste of
resources. Thus, one final task for the project team will be to document those procedures in
enough detail that individuals later on can affect change.
92
Chapter 5
Chapter 5
Nothing to report here, sir, John reports, Since we updated our monitoring system to watch
for performance metrics on the servers, we found a few servers that needed extra memory or
another processor. Those were all upgraded months ago. Since then, other than the occasional
processor spike, we havent noticed much in the way of problems. You should be seeing this too,
now that weve got that monitor on your desk. Youre seeing the same info that Im seeing.
Dan responds, But Im getting reports from our customers, just one today in fact, who say that
their experience with using the Web site has been really bad. Freezes. Lockups. Error pages. The
whole experience isnt good sometimes.
Well, theres nothing that I can report from this side. Were running one of the best monitoring
platforms you can buy, John proudly exclaims, Were monitoring dozens of performance
counters on everything from routers to switches to the servers themselves, and I cant report on
anything thats out of the ordinary.
Dan finds himself a bit ruffled by Johns flippant response. Hes concerned about the experience
his customers are feeling when they interface with his company. Hes purchased some very
expensive systems monitoring equipment. His other monitor shows a happy, healthy system.
But somehow all of that monitoring equipment still isnt capturing the essence of his customers
experience.
Meet me in my office now. Dan instructs John, Weve got more strategizing to do.
94
Chapter 5
% Processor Time
Available MBytes
% Disk Time
Web Cache Hit Rate
Rejected Requests
Current Connections
GetRequests
% Processor Time
Available MBytes
% Disk Time
Database Reads / Sec
Database Writes / Sec
Log Writes Average Latency
Database Cache Size
Table Opens / Sec
External B2B
Web Cluster
Kerberos Auth.
System
Java-based
Inventory System
ERP System
LDAP Database
Oracle Database
B2B Extranet
Router
Credit Card
Extranet Router
% Processor Time
Available MBytes
% Disk Time
Pages / Sec
Bytes Sent / Sec
Page File % Usage
Interrupts / Sec
Context Switches / Sec
Figure 5.1: FCGs monitoring system is watching for system counters at multiple levels, but those counters
arent telling the story of their users experience. This figure highlights typical counters often enabled on
many systems. But these counters alone dont show the entire picture of what their customers are seeing.
95
Chapter 5
96
Chapter 5
Second, it is also critical to get a big picture of the entire environment. In order to do
so, a tool can be installed into that environment that watches for all the traffic that passes
by in that environment. This tool watches for situations in which contentions for resources
may be causing problems. Or, it may look for individual transactions that dont complete
or take extra time to complete. It may also recognize when externaland otherwise
unmonitoredforces may be contributing to the problem.
In either of these two cases, the concept of a transaction is critical to recognizing what this sort
of system is looking for. This end user experience monitoring tool needs to look for business
transactions or completed interactions between business systems, and how those interactions are
behaving. If transactions arent behaving properly, there will likely be an impact to the overall
operations of those systems on the network. Those delayed or failed transactions may not
necessarily impact the overall performance of the server, but they do manifest into the users
experience.
In Chapter 3, we talked about some of the different mechanisms by which monitoring data can
be obtained. Over the years, these different types of data-gathering mechanisms have evolved to
provide ever better quality of information through different vectors. Each different mechanism of
collection provides data that the others cannot.
For example, an agentless solution can more easily monitor the interrelation between systems
over the network than an agent-based solution can. However, due to their installation onto an
individual server, agent-based solutions typically have more access to the inner workings of a
system. Agent-based solutions can also repetitively execute synthetic transactions to a system to
judge their overall performance over time.
Lets take a look now at these two types of End User Experience (EUE) monitoring classes and
how they work. Each can work with the other to get a holistic picture of the system along with its
interrelation with the rest of the computing environment.
97
Chapter 5
Agent-Based Monitoring
The goal of agent-based monitoring is two-fold. First, by installing agents onto individual servers
that make up a business service, the agents can look deeply into the processes and activities that
make up that service. The agent can analyze behaviors within the server to look for individual
transactions, the success or failure of those transactions, and the quantity of time elapsed to
complete those transactions. Because the agent is installed directly to the specific server of
interest, that agent can be configured with relatively unrestricted access to gather and report on
the information it needs from within the server.
This is a very important pointagentless monitoring mechanisms can only query a server through
APIs that are published and enabled for external interfacing. These externally facing interfaces do not
typically expose all the data within a server, usually for security or functionality reasons. Thus, the
addition of agent-based monitoring improves the overall level of information to be processed by the
EUE system.
Second, agents can also be installed onto clients throughout the network. The agents on these
clients are then programmed to emulate an end-user performing key business transactions
throughout the day. Depending on the maturity of the EUE system in place, those instructions
may be capable of
Interfacing with a third-party packaged application such as SAP, Siebel, or other shrinkwrapped software to complete a common task.
By installing these agents on systems across the network, the EUE system can compare the
results of each transaction with those of other agents to see where individual sites may be
experiencing problems. In many ways, the idea with agent-based solutions is to determine the
total time necessary to complete a transaction from multiple locations to help identify the
characteristics and locations where poor application performance is experience.
Agentless monitoring, which well discuss in the next section, can require very little setup time to
configure. However, as you can see with agent-based monitoring, there is a period of configuration
necessary to identify the transactions of interest and record them into the agent. For mature EUE
software packages, this recording process can be relatively easy. The hard part is in identifying the
applications and transactions that are of monitoring interest to the business.
In Chapter 4, we discussed the seven step process to implement a BSM solution. Many of the same
processes that are used to build a BSM service model can be leveraged to assist in the process of
identifying the right transactions and service components to monitor. As with the service model
creation process, this transaction identification activity will be an organic, iterative process.
98
Chapter 5
Agentless Monitoring
Much different than agent-based monitoring is the concept of agentless monitoring. Here, code is
not installed to the individual servers that make up the business service nor are any transactions
synthetically generated to the systems under management. Instead, we leverage a central solution
that is configured to watch for all the traffic across the network. Once installed, the service
begins to look for a series of known metrics that can occur across the network:
When did a particular transaction start? When did it stop? Between what two servers,
services, and applications did it occur?
If the transaction did not complete, was it because the user cancelled it or was it due to a
network problem or poor performance?
If the transaction did complete, did it do so within an acceptable amount of time? How
much time was spent on the server, the network, and the desktop?
What are the network conditions across all hosts? Is one host consuming inordinately
more bandwidth than normal? Why is that occurring? Is that consumption affecting
transaction completion?
A concern in some networks is the promiscuous nature of agentless monitoring. An agentless EUE
tool is indeed watching for many (and sometimes all) traffic types in a particular network segment. In
some environments, this may go against established security policies. Thus, there may be political
pressure not to incorporate an agentless tool due to the type of collection it is performing. That being
said, the benefits associated with an agentless monitoring solution must be placed against the
security liability associated with allowing it on the network. In addition, although the monitoring is
promiscuous, many agentless monitoring tools operate by inspecting just what they need in the
network traffic and retaining only the information necessary to classify the results of that inspection. In
addition, the agentless monitoring tool should have the ability to mask out sensitive information such
as passwords. In many cases, the benefits to the organization far outweigh any perceived security
risks.
This agentless solution, in combination with the business logic programmed into the BSM
service model, will determine the business impact associated with any transactions that did not
complete properly or within a proper amount of time. When a transaction does not complete
properly or timely, the services quality is reduced. Wrapping this idea into the greater picture of
BSM, the reduction in service quality directly impacts the dollars-and-cents calculations
provided by the BSM system.
Well talk more about the interconnections between EUE and performance logic and BSMs financial
logic later in this chapter.
99
Chapter 5
Obviously, in order for this system to do its job, it has to understand the traffic it is receiving. If a
Web server is communicating with a Web browser client, that traffic needs to be understood as a
Web request followed by a Web server response. It can also be a series of requests and responses
that make up a complete business transaction. This type of communication is programmatically
easy to understand. Where mature EUE systems provide extra added value is when those systems
can additionally translate non-Web application traffic.
For example, if the EUE system understands the communication that occurs between an SAP
system and an Oracle back-end database, it can watch the traffic between those two systems and
look for individual transactions. The same holds true with any packaged application. When
considering an EUE system, consider one that includes the special decodes that can translate
traffic as necessary between the systems that ultimately make up the business service model.
As you can probably guess, for a system that is watching traffic all across the network, the sheer
mass of traffic that system needs to process is huge. One of the most critical parts of an agentless
EUE system is merely to know what kinds of traffic to process and which to discard.
The network that allows those systems to communicate with each other.
For any issue that is raised by an EUE system, the problem most often can be related to one of
these three elements. As an example, for a transaction to complete, there is a quantity of time
required for the client to make a request, the network to transmit the request, the server to receive
and respond to the request, the network to return the response, and the client to process that
response. A correctly implemented EUE system should be able to provide a spread of timing
information for each of these elements.
Total Transaction Time
Figure 5.2: The total time necessary to complete a transaction is comprised of multiple steps in the process.
The CNS Spread identifies each of these elements and their relation to the total transaction time.
100
Chapter 5
This information on the spread can be used in multiple ways. As a troubleshooting tool, it
comes in handy for isolating where the problems with transaction processing are occurring. As a
component of a notification system, it can alert administrators when individual components of
transactions are not completing within specifications. As a Help desk mechanism, it can be used
to assist users with identifying why their experience is not at their normal level. Most important,
this information can be used as a first step in understanding the true nature of the users
experience and what elements are driving that experience level.
Figure 5.2 shows only a very simple example of the spread. This example shows the interaction
between a single client and a single server. Most business systems and their transactions involve the
communication between multiple entities to complete a transaction. It is that interrelation that can be
captured by EUE monitoring and is one of its greatest value propositions.
101
Chapter 5
An Example
So implementing EUE doesnt necessarily replace typical system counters used by IT in
measuring the total performance of their systems. Instead, it adds a new class of counters that
watch for individual user interactions with the system. As users interact with the system, an EUE
system can measure those interactionson a click-by-click basisto ascertain a feeling for what
the overall users experience is with the system. Though much of this measurement is involved
with the measurement of elapsed time and time delay, this is not the only tool.
Time tells the tale of how much time users are waiting on system elements, but the experience
also relates to individual transactions that dont complete or only partially complete. The true
tale of the users experience is the aggregation of all these metrics.
Lets take a look now at what might have occurred had FCG implemented an EUE system to
augment what IT Director John called the best monitoring platform money can buy.
Visibility
With traditional systems monitoring tools, the counters being measured are based on the
performance of the entire system. So those counters may not necessarily pick up problems when
they arent of a nature large enough to affect the system as a whole. System counters typically
watch for resource overuse. But the timing delays that EUE is watching for typically dont result
in that level of resource overuse. So, the visibility into the specific type of problem FCGs web
site is experiencing is not being measured by their whole-system counters.
Had FCG implemented an EUE system to measure timing delays, their system would have
picked up on the individual transaction delays that caused users to wait multiple minutes between
clicks. That visibility would have alerted them to look for problems at a lower level of the
service model. Perhaps a piece of un-optimized code within the purchasing system was causing a
counter to time out in certain circumstances. The delay associated with that counters timeout
could have been at the root of the problem. Only an EUE system can peer deep into the
individual transactions to see the precise conversation in which that counters delay occurred.
Prioritization
Because FCGs system didnt include EUE monitoring, and because the problem didnt impact
whole system counters, they were unaware that the problem was even occurring. FCG was
unable to prioritize resources towards fixing the problem because of their lack of visibility.
In other examples, EUE monitoring may identify multiple locations in which problems are
occurring. But they also provide data as to which systems are truly affecting users. If a dozen
open problem tickets are created by the help desk associated with issues on the web site, EUE
can help identify which of those problems are actually affecting the user population. This grants
IT the ability to assign resources first to the problems with the greatest business impact.
102
Chapter 5
Resolution
IT administrators cant fix a problem when they cant see the problem itself. Lacking the tools
that dig deeply into each individual transaction, it is challenging to identify problem root causes.
Because whole system counters do not necessarily completely describe the workload being done
on a particular network device, it is necessary to use tools that can.
EUEs agent-based tools have the ability to simulate transactions between a representative user
and the system itself. Those transactions can be run automatically throughout the day and from
multiple locations to form a representative understanding of how a sample user might be
experiencing the system. Lacking this capability, administrators would need to regularly and
manually click through the system to get a feel for its health.
Specific to each measured transaction is its spread of timing information between ownership by
the client, network, and server. This spread is an excellent starting point for locating deviances.
Drilling down from that point, additional debugging information specific to the transaction can
be viewed by the administrator to further isolate the problem. Deconstructing the problem in this
way speeds resolution because it helps to focus troubleshooting efforts to the specific issue at
hand.
Improvement
Lastly, once the problem is known, it is easier for IT to identify how best to resolve that problem.
ITs typical response for many problem is to add additional hardware to the environment to
support added load. But in many cases with complex systems this is not the most effective fix.
Where traditional monitoring shows no problem but EUE monitoring shows a delay, the problem
may not be attributed to a hardware resource shortfall. It may be attributed to a code fault or a
misconfiguration. EUE tools allow IT to more correctly improve the system without defaulting to
costly hardware expansion as its only tool for resolution.
103
Chapter 5
Impacted Technologies
Among other elements, the value of an EUE system is directly related to the types of service
classes that system can interact with. For example, an EUE system that is limited to web traffic
only will lack critical visibility into the packaged applications and legacy programs that typically
interact with back-end servers When an EUE solution cannot translate the communication that
occurs on the back-end, then a complete vision into each transaction is not fully recognized.
Lets take a look at five classes of business services that are typically part of a typical business
computing environment. For each, well analyze how an EUE system can impact their
operations.
Figure 5.3: A fully-realized EUE system should tie into multiple classes of business services as well as the
network they reside upon.
104
Chapter 5
105
Chapter 5
Packaged Applications
Most business systems dont stop with just the web server. Web front ends typically require
additional data from one or more enabling back-end services. In many cases, those services are
packaged applications like SAP or Siebel for ERP data, or Oracle plus Oracle Forms for database
connectivity and customized business applications Unlike web services, where all web traffic
relies on the common HTTP protocol for data transport and rendering, these packaged
applications may have their own protocols for getting data from the client to the server. These
applications may not necessarily use a web browser as their data rendering tool at the client side.
They may have their own desktop clients that have additional and/or different functionality.
Thus, the EUE system used to watch the traffic for these sorts of applications needs to
understand the traffic that occurs between client and server.
An effective EUE system will come equipped with the translations or special decodes
necessary to see into the traffic between the servers and clients that make up these packaged
applications. For packaged applications that use multiple servers for distribution of various
workloads, the EUE system will also need to understand the server-to-server communication as
well. This is necessary because not all issues are directly related to the first-tier client-to-server
traffic. Some issues may occur between the individual servers that work together to make up the
total service provided by the packaged application.
When considering an EUE system, look for those that can support the packaged applications
typically enterprise-level applicationsthat are components of your BSM service model. Good
EUE solutions should support easy-to-use connectors that allow for the direct listening for traffic
between all elements of your packaged applications in relation to clients and any web front ends.
Be careful with some EUE solutions. They may only include monitoring of web transactions. This
limitation will restrict the level of information you may require out of your packaged enterprise
applications.
Thin Client
One class of packaged applications that requires special attention involves the delivery of
applications through a thin client interface. These applications such as Microsoft Terminal
Services or Citrix Presentation Server are positioned in front of applications to reduce the overall
effects of network latency of bandwidth required to deliver that application to its users.
Consider the situation where the network trafficthe conversationbetween an applications
server and its client is particularly chatty. In this case, positioning the client far away from the
server in terms of network proximity means that that applications response time is negatively
affected. Because of the network distance between the two halves, the traffic takes a longer
amount of time to get from client to server. This increased time means that the client will operate
much slower than in the case where the client is close in proximity to the server. Thin client
applications relieve this problem by positioning the client next to the server and passing only
screen updates and mouse/keyboard movements between client and server.
106
Chapter 5
The use of EUE for thin client applications is multifold. First, in situations where applications
are experiencing poor quality, an EUE systems CNS spread can be used to determine where the
delays are occurring. If it is determined that the client and server would perform better when they
are closely positioned, then EUE can justify the move to a thin client solution for the problem
application.
Also useful with thin client applications and the analysis of EUE data is the determination of data
problems for existing thin client solutions. Due to the aggregation of multiple users onto a single
server in most thin client solutions, the actions of one user can impact the experience of others.
For example, one user whose activity on the server uses too many processor resources will cause
a slowing down of performance for all others on the server. A fully-realized EUE
implementation can be used to determine if the problem relates to the thin client server, the
application server, the network between them, or the network between the thin client server and
the client itself. In another example where only a single server in a farm is experiencing a
problem, EUE can assist with isolating the problem server to help with a quick resolution.
Effective EUE systems should also be able to align the traffic in such a way to isolate userspecific traffic not only from client to server but also thin client server to back-end server. By
isolating traffic in this way, an end-to-end understanding of the traffic patterns can be used in
troubleshooting and remediation.
Middleware
Although middleware is not always an easy win for an EUE implementation, its incorporation
can benefit from many of the same factors associated with packaged applications. As end users
do not necessarily work directly with the environments middleware tools and code frameworks,
their incorporation into the total environment analysis can be challenging. However,
incorporating middleware monitoring into the overall EUE system ensures that the end-to-end
transaction is being monitored. An effective EUE system will include modules that allow
connection into the pluggable frameworks that make up most middleware.
Databases
Databases are similarly challenging as are middleware applications. Though they can be a critical
component to the overall performance and experience measurement process. As databases
contain the whole of the data needed by the business system, their inclusion can be critical in
determining the overall health and performance of that system.
Databases that are overloaded in terms of raw performance can specifically impact the delay
associated with all other members of the system. This is due to their nature near the bottom of
the service model. Inclusion of necessary database monitoring capabilities will help ensure that
the full transaction measurement includes client to server to data store, and back if necessary.
In addition to all these, it is also worth stating that the network itself and the devices that make up that
network are an impacted technology. Individual network components and their performance can have
a net effect on the overall measurement of user experience.
107
Chapter 5
Importance to IT Goals
Thus far in this chapter weve talked about the utility of an EUE implementation and how it
relates to the business as a whole. But there are specific benefits to IT that can be gained as well.
Traditionally, IT has relied on systems management and monitoring tools to provide them with
the necessary information they need to troubleshoot their environments. However, as we
discussed earlier with our egg timer metaphor, those tools provide shallow levels of data. A
mature IT organization will recognize the need for deeper levels of monitoring data to assist with
the administration of its systems. That same IT organization will see how the concepts of EUE
can provide that data by digging deeper into the individual transactions associated with a
business services operation.
In this section, lets take a look at a few of these benefits specific to IT that can be gained by
implementing EUE. From aiding in problem identification and prioritization to augmenting prefailure warnings, EUE provides a framework for problem isolation. From an organizational
standpoint, its information also helps in speeding the troubleshooting process by eliminating the
finger pointing problem and aiding in inter-team communication. Most importantly, these
work together to enhance vendor accountability and ultimately customer satisfaction with the
system.
Problem Identification
Traditional monitoring systems have the capability to alert when a problem situation or SLA
breach occurs. However, the alert that these systems provide is typically limited to the individual
situation that tripped the alarm. Digging deeper into the problems root cause is limited, because
an alerted problem can be comprised of multiple, individual sub-problems, or can be one that is
buried within another layer of the system. It is due to these limitations into visualizing the
problem that the major time element associated with many problems is simply identifying what
went wrong.
As we discussed earlier in this chapter, EUEs focus on transactions means that a users issue
with the system can be understood from many different levels. The spread of an applications use
of client, network, and server resources is an excellent starting point for the identification of a
problems root cause. This spread provides the troubleshooting administrator a more defined
starting point for tracking down the resolution to the problem.
Moreover, digging deeper into each individual transaction allows the system to alert the
administrator when problems occur at every step along the path of the transaction.
Deconstructing each individual mechanism that makes up the business system helps with the
atomization of each service element. This process of deconstruction is very similar to the process
used in generating the BSM service model.
Incorporating the necessary thresholds for this alerting is a necessary component for the
administrator to complete. Determining what those alerting thresholds should be can be a timeconsuming process. However the benefits of knowing when transactions are not within specifications
often outweigh the effort.
108
Chapter 5
Prioritization
Even in mature IT environments there are situations where multiple alerts go off at once. When
this occurs it can be problematic to understand which of these alerts are important to the
functionality of the business and which are of lesser importance. For example, there may be a
dozen alerts active within the management system, but eleven of those alerts are actually minor
problems that do not require immediate attention. One of those alerts could be one that impacts
the entire user base for the business. The process of understanding the true nature of the alert and
prioritizing its remediation can be augmented with the information brought forward through an
EUE system.
Due to an EUE systems tie into BSM and the BSM service model, each element that makes up
the business service under management has an impact assigned to it. Those impacts relate to the
number of affected customers and the amount of dollar loss associated with a reduction of
service quality. When the situation occurs where multiple alerts are presented, EUE and its tie
into BSM helps the IT department understand the business impact of each alert. With this
information, IT then has the resources it needs to resolve the most critical and impacting issues
first, while de-prioritizing less critical problems.
Pre-Failure Warnings
It is common that a user interface experiences a period of pre-failure before an actual failure
occurs. This pre-failure period may relate to an increasing load on the system or a component
that trending shows will soon not be able to keep up with the demand placed upon it. What is not
common is the recognition of this condition occurring before the failure actually appears. In
these situations, only comprehensive trending and historical analysis can assist the IT department
with finding these issues before they happen and augmenting the system with additional
resources as necessary.
Too often with IT organizations at lower levels of maturity, service failures occur because IT
does not have enough information available to recognize when a system requires additional
resources, more computing power, or a reconfiguration. EUE can provide that information by
continuously monitoring the environment for transaction timing. Trending analysis can be done
for service and individual component performance related to transaction speeds. When that
analysis points to an impending failure at some point in the future, IT is better prepared to add
additional resources as necessary. This also enhances the budgetary process, as fewer surprise
purchases are necessary for IT to maintain the environment.
Consider the following non-IT example as a metaphor for this situation. What if the power company
didnt monitor power usage in various parts of the grid? Lacking pre-failure and trending analysis of
power usage could mean that building and expansion in certain areas could cause a major loss in its
ability to serve power.
As IT organizations mature, their services grow towards a utility status similar to the power company.
In these cases it is possible for IT to maintain always-on service, planning for expansion rather than
being forced into it by external forces.
It can be further argued that as the IT organization matures, the business matures with it. The
business ultimately grows to require this always-on capability as IT discovers the ability to provide it.
109
Chapter 5
Finger Pointing Prevention
When critical situations occur with a business system, business revenues are on the line until the
problem is fixed. Every second counts in these situations, so solving the problem quickly is
critical to operations. The problem within many of these situations, however, is that the typical
response by IT is to get everyone into a room and break down the problem.
This isnt necessarily a bad mechanism for isolating a complex problem. IT individuals typically
have experience within a single component of IT. Im a network person. Youre a server person.
Over there are the database people.
Few individuals truly understand the entire system from end-to-end with the technical know-how
to understand problems as they occur. Thus, the circling-the-wagons approach in many
organizations is the only way to get enough experience in one location to track down the
problem.
The problem here is involved with IT personnel ownership of their piece of the computing
environment. Professionalism on the part of individuals means that each person in these meetings
can default towards proving why the problem does not lie within their scope of management.
Individuals in these meetings are incentivized by professional pride to find the problem in other
areas of the computing environment. This, combined with the stress of the problem itself, can
lead to finger pointing within the group, each person trying to find the problem in other areas
of the environment.
EUE assists with the finger pointing problem first and foremost through the information
gleaned through its CNS timing data. Here, when a problem occurs that is critical to operations,
the first step can be to look for where the transactions client, network, or server times vary from
the baseline. The timing information across multiple systems and multiple platforms assists the
troubleshooting team with more quickly tracking down the problem.
Even more important is the expensive nature of the group meetings themselves. Considering the
per-hour cost of bringing together large numbers of people to identify the problem domain costs
the organization in time and money. The opportunity cost of bringing key members of IT
together for problem resolution is the effort spent on either fixing the problem or performing
other necessary critical work. In organizations with lower levels of maturity, these major
problems can occur often. Here, IT finds itself in a state of perpetual firefighting, which limits
its ability to move towards higher levels of maturity.
A fully-realized EUE system can free these senior-level resources to enable them to work towards
strategic, maturing activities rather than tactical, firefighting activities.
110
Chapter 5
Very few single individuals, especially in enterprise environments, speak the language across all
the layers of a business computing environment. Thus, a centralized framework is necessary that
can speak some elements of all the necessary IT languages. That framework assists for locating
and isolating issues as they appear, but more importantly it is recognized as one that can talk to
each individual in their primary IT language. An EUE system is a potential framework that can
support this functionality.
Vendor Accountability
Another issue entirely is involved with holding feet to the fire for vendors and their
applications that the IT organization must support. In most organizations the computing
environment is made up of a number of individual applications that work together to provide the
business service.
One common problem with this cross-pollination of applications is the tendency of individual
vendors to throw an issue over the wall when support is requested. As an example, the
database vendor suggests that the problem is related to the middleware component. So, a call to
the middleware component is necessary. The middleware vendor believes the problem lies
within the operating system. So, a call to the operating system vendor support is necessary.
Getting all three of these vendors on the phone at the same timeand more importantly the
correct people within the vendors support organizationis challenging if not impossible.
111
Chapter 5
Support technicians associated with many vendors are often incentivized by closing cases rather
than fully completing them. Thus, some vendors will tend to throw issues over the wall rather
than work them through to completion. This is particularly cumbersome when multiple
components of multiple vendors are part of the same business service. It is often functionally
impossible when large levels of business-specific customization are done with the vendors
product.
The data provided by an EUE system provides easily-transferable documentation about the
behavior of a vendors application. Data from the EUE system can be provided to the vendor as
clear documentation that the problem lays within their product. In some cases, this information
can be used to assist with directly pinpointing the problem within their code. When code issues
and custom vendor patches are necessary to fix a particular problem, this documented evidence
is essential.
This data helps in convincing the vendor that the problem does indeed require a code revision.
This same data can also then be used by the vendor in identifying the area in which the fix is
necessary.
Customer Satisfaction
Most importantly, all these elements tie into the BSM tenet of customer satisfaction. A service
with a high level of quality directly relates to improved customer satisfaction of that service.
When IT can proactively identify issues and resolve them without attracting the notice of the
user, then they are working at a high level of maturity. That high level of maturity helps IT align
better with the needs of the business, and ultimately drive business profitability.
112
Chapter 5
Traditional Device
Monitoring
Availability Logic
Figure 5.4: Data from three components come together to fill the BSM picture: financial logic from the BSM
service model, availability logic from traditional device monitoring, and performance logic associated with
end user experience.
113
Chapter 5
114
Chapter 5
Proactive Awareness
Lastly, all this data is useless unless the business acts upon it. Knowing that a particular business
service is experiencing a loss of quality is only useful when IT knows what to do. Proactive
awareness is a function of higher levels of IT maturity due to ITs enhanced knowledge of the
components that make up the computing environment:
IT has the historical data with which to understand how the system and its users evolve
over time.
IT is more prepared from a technical perspective to impact its bottom line from a
budgetary perspective.
IT has more capability for measured and planned growth rather than 11th hour funding
requests when emergency resource needs arise.
IT grows more capable of working with the business on strategic activities like business
growth and service expansion rather than merely fulfilling service requests.
All of these relate to ITs ability to better service the business and the customers of the business.
By being more proactive with the resources under its care and feeding, IT grows more capable of
making better business decisions.
115
Chapter 6
116
Chapter 6
As hes pondering a shift in one of those needles, the phone on his desk rings. Picking it up, he
finds his buddy and customer Joe on the other line. Theyd finally gotten around to that golf
game last week, and Dan figured Joe was calling to gloat about his unbelievable shot on the 16th.
Hey, Dan, Joe starts, Hows that short game of yours coming along?
Just as good as it was the other day, responds Dan. He let him win last week, or at least thats
what he keeps telling himself, Or as bad, if youre calling to gloat.
Joe laughs, Not at all! Actually I was here to talk a little about that Web site of yours again.
Ive been getting some more reports from my people down on the first floor.
A shiver goes up Dans spike, but just for a moment. Not again, he thinks as his eyes shift to
his monitor and its dials, all of which are pointing in the right direction, I trust your people are
having a good experience with it? Weve been putting in a bunch of new equipment designed
specifically to help us understand when you guys are having slowdowns or other problems.
Actually, thats the reason why Im calling, Joe continues, Our guys are reporting great
responses in the past month. I just got out of our monthly tag-up meeting with the people down in
purchasing, and they asked me specifically to thank you and your team for whatever youve done
over the past month. Our productivity down there is up 20%.
Well thanks. After that phone call a couple of months ago, we made it our first priority to figure
out what was causing the problem and get it resolved, Dan explains, In fact, we went quite a
bit further than that. We found out that the things we were watching for werent really telling the
true story of what you guys were experiencing. So, we implemented some new technology that
helps us understand your experience a little better.
Interesting Joes voice trails off for a second. He continues, Well, thats the other reason
behind my call. Theyve been raving so much about the changes over a short period of time. Im
here to pick your brain as to what we could do with our own Web sites.
Dan beams, Well, let me tell you about this new stuff. First of all, youve got to see this new
monitor on my desk
117
Chapter 6
Figure 6.1: This and the next two chapters will discuss the achievement of value along three axes associated
with the implementation and use of a BSM solution.
To this point weve been looking at the technical aspects of Business Service Management, and
its complement that is End-User Experience monitoring. Weve talked about the technical and
process-based aspects of implementing such a system to the benefit of the organization. This
chapter as well as the next two chapters will deviate from those discussions a bit to consider the
value returned back to the business by implementing such a system.
In this chapter well discuss the value associated with managing business systems. Here, well
talk about the potential return that can be obtained by enterprises, outsourcers, and end users
themselves. Well show some examples of management dashboards that enable that return, and
how the information gained through those dashboards improve business leaders ability to better
service their customers.
118
Chapter 6
In the following two chapters well continue the conversation on value, delving into the
achievement of operational and IT value. In Chapter 7, well focus on how BSMs information
can reduce operational expenditures to an organization. Well also talk about how BSM can be a
management umbrella, under which management controls can be housed. There, well revisit the
topic of dashboards, discussing best practices in building effective ones.
Chapter 8 will conclude our conversation on value, focusing our discussion back onto IT.
Business leaders like Dan in our chapter example gain higher quality information through a
fully-realized BSM solution. But IT gains as well. IT gets the incorporation of a toolset that
assists service desks with problem identification, administrators and developers with resolution,
and IT managers with data that drives and justifies future purchases. In that chapter well talk
about the connectors available to many BSM solutions. These connectors enable BSM to plug
into various applications and frameworks.
Lets focus our discussion now on obtaining and ultimately maintaining value in the
implementation and use of a BSM system. That value comes from a set of potential drivers,
which benefit deployments in enterprises, with outsourcers and solution providers, as well as
value to the customers of a system.
Obtaining Value
We first need to break down the value obtained by an organization into two sets of categories.
First, there are some elements of value that arrive through the implementation of BSM. Others
arrive through the use of that fully-realized solution. As you can see through our chapter example
above, Dans new monitor arrives along with a whole new set of data. Though the monitor is
new, the data that comes with it has been reformatted such that it is now much easier to digest.
His previous vision included a set of data that had value, though not to him. That information
was useful to the developers who code as well as the administrators who maintain the system.
The information his new monitor shows him originates from End-User Experience monitoring
agents that are looking at individual transactions between users of the system and the system
itself. Those EUE agents are also looking at the transactions between subcomponents of his B2B
system. When those transactions drop below set thresholds, his dial moves to the left. When
transactions remain within desired levels of performance, his dials stay firmly on the right. His
use of the system means that he has a persistent heads-up display that provides him with a vision
of the overall health of the network.
When his system begins experiencing problems that would affect his customers experiences, he
can be proactive and communicate with them immediately. He can maintain critical business
relationships as problems occur rather than after theyve lingered for a period of time. This
advance information prevents calls such as the one he experienced in our last chapter where his
customers are forced to notify him when problems occur.
This brings us to the second set of categories by which value can be obtained. There are both
tangible and intangible benefits associated with BSM. The tangible benefits align with an
improved capability to see where problem spots are within the system and resolve them. The
speed in which those problems can be resolved is a direct and tangible impact. The enhanced
situational awareness Dan gains through his monitor feeds into the intangible benefits of the
system. Dans ability to maintain relationships with his customers is affected through that
capability and can be considered an intangible benefit.
119
Chapter 6
Table 6.1 lists a few more of the benefits an organization can obtain through BSM. These
benefits are broken down by category:
BSM Implementation
BSM Use
Tangible Benefits
Intangible
Benefits
Table 6.1: A non-exhaustive list of value gained by an organization through the implementation of BSM. That
value is broken down into various categories.
Maintaining Value
Obtaining that value is one component, but maintaining it over time is yet another. One
component of a fully-recognized BSM implementation that complements this has to do with the
metrics provided by the system itself. What we mean by this is that the job of a BSM system is to
provide metrics validating the health of systems and the quality of services. Those same metrics
can be used to simultaneously validate the value of the BSM system itself.
More than anything, BSM is a tool to crunch complex monitoring data. Thus, a snapshot of system
metrics prior to its implementation can be compared with future snapshots to validate the value it
provides.
Lets look at some examples of how this is the case. Recognizing value over time for a BSM
system involves the continued measurement of that value. A BSM implementation does that
through a series of metrics, the most relevant of which is called the Cost of Poor Quality
(COPQ). This metric, which we first talked about in Chapter 3, measures the quantity of lost or
deprecated transactions that occur over a period of time. When an Average Revenue per
Transaction metric is related to that measurement, this provides an overall understanding of the
total revenue opportunity lost over that unit of time.
This metric can be an excellent starting position from which to determine how a system that fails to
meet desired specifications impacts the business. When BSM gathers this metric, it is then related to
the amount of business being lost associated with poor quality. That data as it changes over time
provides an excellent measurement of how well a BSM system is impacting a business ultimate
bottom.
120
Chapter 6
Also useful is the nature of BSMs collection and calculation mechanisms itself. Once in
production a BSM system is unlike other management systems in that it automatically begins
creating metrics associated with its own value. This occurs naturally as a part of BSMs
calculations of revenue impact. As it goes through its calculations over time identifying and
categorizing system characteristics, it concurrently calculates value measurements of its own
worth.
By monitoring these metrics over time, an organization can track the improvement of their
managed services related to BSMs involvement. Some metrics that assist in this validation
include:
Each of these metrics can be used concurrently in measuring the quality of the identified
business service along with determining the value of the system itself. As an example, if the
metric for Problem time to resolution decreases over time after the implementation of BSM, it
can be argued that BSMs data assisted with the resolution of those problems. To further validate
that assessment, one can align that metric with others such as Number of unsuccessful
transactions per day (historical) or Rate to target IT transaction improvement. Combining these
metrics further justifies the rationalization that BSM is improving that business services
capability to serve customers.
Calculating ROI
Specific Return on Investment data can be challenging to calculate. So, this section will not
attempt to build complex calculations based on cost and anticipated benefit. Instead, in this
section well review some of the cost and benefit metrics that can be merged together to
illuminate potential investment return. In order to calculate a proper ROI, three elements are
necessary that merge together to provide a complete picture. Those three generic elements are
the cost to implement, the anticipated cost savings associated with the addition of the new
technology, and the revenue benefits expected with its use. In the sections below, well discuss
each of these in turn.
Figure 6.2: Adding together the cost savings benefits with the revenue benefits and subtracting the cost to
implement gives a good representation of a BSM implementations ROI.
121
Chapter 6
Cost to Implement
Implementation costs for a BSM system relate in many ways to the cost of the software itself.
That cost includes the evaluation process, its installation, consulting and training services needed
to properly train internal staff on its operation, and hardware resources.
Of the three metrics that make up our ROI, those that relate to the cost to implement can be
considered the easiest to measure. They involve hard dollar expenditures needed to find the bestfit BSM system and bring it in the door for the company. In addition to the elements noted
above, it is important when considering these numbers should be recurring costs associated with:
Software maintenance. A good rule of thumb for the costs associated with software
maintenance is 18% of the initial purchase price. That estimate typically runs across
multiple vendors as the expected amount of annual expenditure necessary to keep the
software under maintenance.
Technical support. Depending on the vendor, additional costs may be required for
technical support. An important contractual element that should be considered when
making a purchase is involved with the inclusion of technical support as a component of
annual maintenance costs.
Hardware refresh. BSM systems are intended to be long-lasting solutions. Thus, a proper
ROI should additionally include hardware refresh costs at intervals, usually three or five
years. This ensures that as technology changes, hardware is regularly purchased to keep
up.
Service desk load. Concurrent with the reduction in problems is a reduction in case load
to the service desk. By reducing issues that have risen to visibility by users, they are less
likely to need the services of the service desk. With mature organizations knowing their
metrics for the cost associated with each service desk ticket, reductions in load can be
directly related.
122
Chapter 6
Another potential metric relates to the burden of systems management tools operational within the
environment. When the functionality of systems management tools can be aggregated, the number of
redundant tools in the environment can be reduced.
For many management tools, the highest cost of ownership relates to client management, or the
activities associated with managing clients on-system. When those tools can be reduced through the
implementation of a BSM system, this incurs a cost savings to the organization.
Revenue Benefits
Revenue benefits associated with the implementation of a BSM system typically relate to the
quality of transactions within the monitored system. When transaction quality can be measured
and compared historically, this provides a basis by which added revenue realization can be made.
Improved transaction quality directly relates to the overall quality of the service itself. Below are
some metrics used in the calculation of that quality:
Average revenue per successful transaction. This metric is the basis for many of the
calculations recognized in this section. When revenue can be related per transaction, this
gives us the bar by which revenue loss or gain can be related through improved
transaction quality.
Number of unsuccessful transactions per day. This metric is doubly useful during the
implementation of a BSM system. Prior to the EUE monitoring that arrives with a fullyrealized BSM solution, it can be operationally challenging to measure the number of lost
transactions. When that monitoring is enabled, the organization gets a first look at how
many transactions are actually being lost. This first look can then be compared with
others over time as BSM drives improvements to the business service.
Average time per transaction. This metric can be the primary measurement of transaction
quality when not related to a failure. The time elapsed to complete a transaction bears
directly into the users ability to complete that transaction. When users are unable to
complete transactions within an appropriate amount of time, they may leave the system
rather than complete the transaction.
User drop rate. Related to the above, when users grow frustrated with an un-optimized
system, they will eventually give up on their interface with it. BSM enhances revenue
when improvement activities related to its information reduce this metric.
It is helpful when calculating the ROI associated with these numbers to include a target improvement
rate associated with the BSM implementation. This target rate is the desired level of improvement the
organization wishes to achieve by implementing the system. When creating the ROI calculations, it is
helpful to use the target improvement rate as a lever for visualizing how its change relates to a
change in overall return.
123
Chapter 6
Management Visibility
Getting the most management value out of a BSM implementation also relates to the information
that system can provide. The information collected through traditional device and EUE
monitoring is only as good as its presentation to its consumers. Considering this, it is critical that
good dashboards be built that are suitable to the individuals that require their information. As we
discussed back in Chapter 1, one of BSMs central tenets involves the digestibility of the
information provided.
In the following sections, well discuss how management visibility is obtained through the
implementation of effective dashboards.
The dashboards shown in the remainder of this chapter are intended to be used as examples of how
dashboards can be configured. Depending on the BSM solution chosen, dashboards may look
different or use different widgets to display data. Those used in the following sections show a broad
sample of how dashboards from any BSM solution can be used.
124
Chapter 6
Figure 6.3: An example of an IT management dashboard for a financial institution that shows system status
as well as business metrics(credit card transactions, costs).
What to Display
The hardest part in designing good dashboards is finding the best-fit quantity of data to include
as part of its main page. Dashboards typically run within an Internet browser window such as
Internet Explorer or Mozilla Firefox. So ensuring that dashboards are sized in a way that works
with those browsers is also critical.
Figure 6.3 shows an example of a dashboard of interest to a financial institution. This dashboard
is specifically tuned towards the executives of that institution. Here, EUE agents are looking at
transactions across multiple sites and aggregating that information into a single view. The
executive gains a single-screen view of the health of the environment, while at the same time
getting financial information that relates to the health of transactions going through that system.
125
Chapter 6
Youll immediately see that most of the information in Figure 6.3 is graphically related. The
screen can be considered relatively busy, as it is full of information. However, the graphic
nature of the information makes it easy for the consumer to follow over time. Important in
creating dashboards is finding the correct elements of information, and presenting them in a way
that the eye naturally is attracted towards information of interest. Green is a color typically used
to show health while red is a color used to show unhealthy elements. In the same vein, upward
trending data typically indicates improving health and revenue while downward trending data
indicates declining health and revenue.
The incorporation of widgets such as dials, heat charts, spread charts, and maps along with color
coding further helps the dashboard consumer.
The goal with any dashboard is to create a picture whereby its consumer does not need to look
closely to understand what is going on. Similar to how dashboards work in automobiles, its consumer
should be able to merely glance at the screen and immediately recognize health or problems.
126
Chapter 6
Trend & Reaction Lines
Figure 6.4 continues our discussion by showing another representation, this one an executive
summary of various systems within an organization. Here, we also see additional widgets that
display historical information associated with the business service. In Figure 6.4, we see that
service quality over a period of time has trended to a down state. The current user impact
associated with that outage is shown on the lower-right.
Reaction lines are visual elements that let the consumer know when a situation has progressed to
the point where some action is required. It is possible using dashboard generation tools to
graphically represent the points at which those situations occur. By creating reaction lines with a
graphical representation, consumers do not need to monitor textual data for problems.
This dashboard can be an example of a first-level drill down screen. When problems occur,
consumers want to know what they are related to. This dashboard in the upper-right shows
metrics for SLA fulfillment as well as the trending of monthly quality. This dashboard can assist
an executive with ascertaining when problems occur and the impact associated with those
problems while not being deluged with the technical details associated with the problem.
Figure 6.4: This Executive Summary dashboard shows some trend lines based on user impact time and
service quality. Reaction lines notify dashboard consumers when a problem has hit a critical state and some
remedial action must be performed.
In Chapter 7, well review a comprehensive list of widgets that can be added into dashboards for
various reasons. These widgets are configured such that data feeds their actual positioning. Some
widgets work better in some situations than others. In Chapter 7, well talk about the best practices
associated with their use and in building dashboards in general.
127
Chapter 6
Management Control
In addition to providing visualizations of the business service environment, dashboards enable an
improved sense of control. When management is empowered with information at their fingertips
they are given the ability to make more informed decisions about their business. Depending on
the need of the consumer, dashboards can be configured with data that enables the executive or
business analyst with the powers to change the environment.
Figure 6.5: An example of a control dashboard, this operations details view shows detailed information about
the state of various locations and business services. For each, more detailed information is provided, giving
the consumer a specific view of what areas may require attention.
Control Dashboards
Control dashboards can exist at the top level or be configured as drill-down elements. The idea
with control dashboards is to provide enough information to their consumers (for example, IT
and executive management) that they can make effective decisions regarding the operation of the
environment. Good control dashboards also help with augmentation decisions. As environments
grow they inevitably require purchases and upgrades to support the needs of their users. By
enabling the consumer with information regarding performance, activities, and behaviors, the
consumer can enact change to the environment as necessary.
128
Chapter 6
Figure 6.6 shows an example of a second-level dashboard that presents more detailed
information about multiple business services over a spread of multiple locations. Service quality
for any particular service is listed in the dials in the center, while history and business calendar
information is presented in the upper-left. Important here is the inclusion of textual explanations
of situations occurring within each business service and/or location.
The presentation of this information provides its consumer a more holistic view of the details
associated with a failure condition. This enables them to make better decisions in terms of
problem resolution or customer relationship development.
What to Display
These types of dashboards typically include Key Performance Indicators (KPIs) that show the
health of services within the network. Whereas top-level visualizations are best served using allgraphical representations, lower levels require the addition of textual information that validates
the images at the top level.
The typical consumer use case with these sorts of dashboards becomes involved when business
services go out of specifications. When thresholds are breached, the top-level dashboard will
elevate an indicator showing the situation. At best, a dashboards users need only a single glance
to recognize problems, and start working toward their solutions. The consumer than can be given
the ability to drill down into that problem to see its cause, information about its resolution, and
any impacts that are occurring.
What Not to Display
Important here to realize is that the same drill-down linkages that occur from top-level to
secondary-level dashboards need not stop with the first level. At the point of secondary control,
the dashboard designer should often remand highly specific data to a third-level dashboard. This
allows the same dashboard to service multiple classes of users. Those with the technical
experience to understand and action upon specifics can drill down to third-level information.
Those without the experience or the job-related responsibilities can remain at the level of detail
of use to them.
Management Impact on Operations
The elevation of information to the level of business management provides transparency between
business management and IT operations. In organizations with technical components, business
leaders often suffer from a technology gap, where their experience with business concepts dont
align with the level of technology being used in service of their customers. This gap in
knowledge and experience can be especially problematic when executives are unaware of the
activities within their business technical employee base. They may make decisions that dont
make sense from a technical perspective.
By enabling a reconfiguration of traditionally technical information into revenue targets and rates
understandable by the non-technical executive, this goes far in aligning the goals of IT and the
business. That alignment is a central tenet of Business Service Management.
129
Chapter 6
SLA Measurement & Fulfillment
One specific type of dashboard useful for both management and IT is associated with Service
Level Agreement measurement and fulfillment. Back in Chapter 2 we talked about how
immature IT organizations have a tendency to set SLAs that relate only to individual device
health rather than the overall status of the business system. Immature IT organizations also tend
to set SLAs that are complicated or operationally unfeasible to quantitatively measure.
BSMs data gathering and calculation tools allow for SLAs to be assigned to IT and outside
organizations that are measurable. Most specifically, BSM allows for real-time collection of
data. When SLA counters can be collected and reported on in real-time, this allows for a much
better recognition of fulfillment.
Figure 6.6: A dashboard widget, showing SLA measurements and their targets.
Consider the situation where an immature IT organization has laid SLAs in place with the
business. When those SLAs are only measured at months end or at the end of each quarter, it is
operationally challenging for IT to meet their goals. When goals are not met, long periods of
time must elapse between measurements. This inability for IT to see their status in relation to
their goals makes difficult the process of meeting those goals. It is impossible to see how the
efficacy of individual activities relates to the improved or worsened accomplishment of that goal.
Only by providing regular updates through interfaces like dashboards can the completion of
remediation activities be easily related to the ultimate goal.
Figure 6.6 shows a representation of a dashboard widget that shows four specific SLAs and the
SLAs associated with each. These SLAs relate to availability targets, and as is shown by the
example three of the four goals have not been met for the period. The visualization shown in this
widget provides IT with a real-time rather than a monthly or quarterly measurement of its
success or failure in meeting its required SLA goals.
130
Chapter 6
Figure 6.7: Dashboards can also be used to show the utility and failure rates of assets. This information
provides insight into the need for future purchases or upgrades.
These metrics dovetail into those discussed above relating to transaction health. When
transaction health can be related to the inability for assets to keep up with the load, this is a key
indicator that additional purchases may be necessary. This quantitative information in the hands
of management helps justify new purchases. It can reduce the cycle time associated with
purchase requests, as purchases are made when they are required.
Even more useful, when predictive analyses are made against existing transaction and health
trend lines, it is possible for managers to begin the procurement process before failure situations
occur. As asset procurement lead times can be extended, predicting the need for additional assets
before they are required allows for graceful scaling of existing services without costly downtime
associated with overuse.
131
Chapter 6
Process Integration
Each of the above topics relates to the iterative improvement processes that are enabled through
the visualization of necessary information. Immature IT environments exist in that mode
primarily due to the lack of information at their fingertips that shows them where bottlenecks and
other problems exist within their environments.
Process improvement frameworks such as Six Sigma and ITIL assist with this activity. But
alone, these are frameworks little more than instruction sets. Data is required to make correct
improvement actions within an environment. That data can come from the elements and
transactions monitored through an EUE and/or BSM system.
Figure 6.8: A dashboard widget showing individual business services and their level of deviation from
thresholds. This information can be used along with process improvement frameworks like 6 Sigma or ITIL in
improving technical and personnel processes.
Figure 6.8 shows an example of how data can feed into a process improvement framework such
as Six Sigma. In this example, the dashboard shows the status of individual business services and
major components of those business services. For each of these, a sigma is valuated to the
service. That sigma relates to the amount of deviation from desired values is present within the
system.
When the level of sigma for transaction performance goes beyond established thresholds, as is
the case with the Bad Debit subsystem of the Financial Planning service, a Cost of Poor Quality
value is assigned. In this case, the cost associated with going out of specifications is $258K. This
heads-up display provides process engineers and business analysts a view into the transactions
within a system, and helps them identify where deviation impacts corporate revenue.
132
Chapter 6
Technical. Technical end-users often are those that still exist within the organization, but
are users of the business systems under management. These insider personnel often
have a high requirement for transaction information in order to perform their jobs.
Transaction information can be factored in ways that enables them to see trends in use
and environment states.
As an example of each of these, think first of a large mortgage brokerage. The consumer of their
loan origination system is likely highly technical. They likely will want rich information about
the status of mortgage metrics, their location, their processing status, and information about the
industry as a whole.
133
Chapter 6
Conversely, an example of a non-technical individual can be the customer of an external B2C
system. If an individual wants to purchase products from a companys web site, they dont
require industry and trending data. But they may want information about the status of that web
site. The scope of data they require is less than in the example of the mortgage broker. To them,
simple status information involving their individual order and the status of the web site is what is
useful.
Figure 6.9 An end-user dashboard for a technical consumer. This visualization aggregates end-user
transaction data for an example government system.
134
Chapter 6
System Status
As you can see in Figure 6.9, end-user consumers are predominantly interested in status
information about the services they consume. The visualizations there show sample data
associated with the level of repair of city elements, housing data, and rate payments. End users of
this system in this example are less interested in the quality of transactions going through the
system. Instead, theyre interested in the transactions themselves.
One specific example of service quality that is of interest to end-users is the ultimate availability
of the system in total. BSM solutions are better than static The System is Down screens for endusers because they can provide more accurate information about the status of the outage and the
expected time for services to be returned. These types of data can be categorized into:
Projected time to repair. When problems occur with a business system that impacts the
end user, those users more than anything are interested in the quantity of time the system
is going to be down. If a system they rely on for regular transactions goes down, knowing
that it is expected to return in 60 minutes is more valuable to them than attempting to retry accessing the system over and over again until it returns.
Scheduled outages. Systems also typically have known outage windows. Those windows
can occur during slow periods for one time zone, but the global nature of the Internet
means that one time zones slow period is the middle of the day for another. When
businesses work globally, informing users of scheduled outages allows users to change
their use patterns so they are not impacted by the outage.
135
Chapter 6
Figure 6.10: A dashboard widget that shows critical Operational Level Agreements and Underpinning
Contracts for an outsourcer. This information helps the outsourcer adjust resources in real time as contract
conditions change.
136
Chapter 6
Enterprise IT
Enterprise IT is yet another consumer of the information provided by BSM. Although much of
our conversation to this point has to do with the movement away from a device-centric approach
to monitoring, IT still has the job of care and feeding of individual devices and applications. As
BSMs data collection tools have the ability to ride atop existing traditional device monitoring
systems, the information gathered by those systems can be brought into BSM for IT
consumption.
This setup has the advantage of unifying the tools employed by all branches of the organization
to manage their business systems. Enterprise IT can make use of the same suite of visualization
tools used by executives, application developers, and business analysts. This allows all groups to
speak the same language and leverage the same toolset in identifying problems, finding
solutions, and ultimately managing the environment.
Figure 6.11: An example BSM dashboard for business and IT executives illustrating each business function
area for a bank e.g. Claims handling, telephone banking service, e-commerce etc. The traffic light colors
show the quality performance for the services supporting these banking areas.
137
Chapter 6
138
Chapter 7
139
Chapter 7
Both men grab their lunch and sit down at the tables close to the cafeteria line. John makes sure
to position both men so that they can see the television showing the news of the day.
The system we in IT put together to relay that information to the televisions is pretty
extensible, John says, pointing to the television, All we need to provide it is a network location
of something we want to displaya PowerPoint deck, a Web site, or even a video streamand
itll play it in any order that we want. Most of the time were just rotating the PowerPoints HR
puts together with the news of the day. Theyre set to update every 6 seconds.
OK, Dan comments, thats why be bought the system. Whats your idea?
John sits back in his chair, still with that light-bulb-going-off look on his face, Heres my idea.
That BSM system that weve been using for a while, weve pretty much got the configuration of
that system down pat. Were using it as a tool in IT not only for monitoring but also as a central
location for many of the otherwise disparate management toolsets we used to use. You and the
other executives and financial types are using its data to keep you up-to-date on our financials.
Even our customers are now getting to see parts of it, what with the new status and outage
notification screens that it automatically drops onto the Web site when were having issues.
Go on, Dan urges.
Well, with these new TVs, weve got a new tool whereby we can keep the whole employee base
informed about the status of our company. What if we started providing up-to-the-second info on
our financial status from BSM? How well were meeting our goals. How well were doing with
sales. Those sorts of things, on fire now, John continues, Kind of like a tool for keeping
morale up when were doing well financially. When were not doing so well, it can serve as a
reminder that we need to buckle down. Fiscal transparency to the employees, and all that.
John continues, All that data is already in BSM in real-time. All we need is to create a new
dashboard to display it. Wed have to be careful about providing too much info so were not
giving away any secrets. But we could put together some dials and heat charts with generic
fiscal health info. Wed just rotate that Web site with HRs info and the special-of-the-day.
What do you think?
Interesting. I think I owe you lunch, says an impressed Dan as he picks up the check.
140
Chapter 7
On the right side of Figure 7.1, we see the output of BSMs calculations. These are a series of
visualizations that can be used to validate system health, understand the financial impact of IT
systems, and ultimately make decisions based on data that has been formatted into a digestible
format.
Connector Data
Visualizations
Dashboards
Information
Actionable Data
System Information
Service Status
Raw Data
Figure 7.1: In many ways, BSMs internal computational logic is like a black box. Raw data from connected
systems goes in one end. Visualizations of that information in digestible formats are output on the other end.
141
Chapter 7
Our chapter example involves a story whereby FCG is using the internal financial logic within
their BSM installation in new and unique ways to provide value to their general employee base.
As youll see in this chapter and hopefully throughout this guide, BSMs ability to gather and
integrate data is comprehensive and covers many areas within IT. The only limitation is in your
own imagination to develop dashboards and other heads-up displays that are useful to their
consumers.
In this chapter, well discover some of the ways in which added return can be gained through the
implementation of a BSM system. Much of that return comes through the reduction of
operational expenditures on the part of any dashboard consumer. As Figure 7.1 shows, well
focus our attention in this chapter on the ingest and output portions of a BSM system, and how
BSMs involvement with those linkages enhances its return to the business. Specifically, well
talk about BSMs capabilities to operate as a management umbrella, consolidating many
traditional management consoles under its unified interface. Well then discuss BSMs
visualizations and how their extensibility allows them to be used for many different classes of
users. Well conclude this chapter with a look at the various data blocks that can be made part of
a dashboard.
Reducing Operational Expenses (OPEX)
As is explained in the narrative that makes up our chapter example earlier, BSMs black box
makes it highly useful for the formatting of raw data into formats that make sense to multiple
classes of users. Executives and financially based users can leverage the financial information to
gain a real-time understanding of the role of IT-based income to the business bottom line.
At the same time, other users can benefit from this information as well. As well see in a second,
BSMs functionality can become a management umbrella, under which many common
management tools can be unified. BSM includes the ability for IT service measurement and
reporting, both from the perspective of IT as well as the customer and the business. By unifying
disparate management tools and adding a business-oriented layer to IT service management, this
reduces operational costs especially around problem resolution processes and the impact of
problems on the business.
Also possible in consideration of this black box is the extensibility of the information that can be
provided to users. With the horizontal scaling of a BSM systems user baselike what was done
with the televisions in our chapter exampleBSMs toolsets have the ability to reduce the
overall operational expenses associated with managing IT systems.
142
Chapter 7
The systems monitoring system ensures that systems are running properly and verifies that
servers and networks are running to pre-established baselines.
The systems management system augments all of these by enabling the baseline itself,
providing for policy-based and centrally controlled changes to the environment as well as
the maintenance of system configuration.
143
Chapter 7
Presentation Layer /
Dashboards
Service
Model
Instrumentation Data
CMDB
Service
Desk
Figure 7.2: The information from and activities associated with disparate management toolsets can be
centralized through the BSM implementation where business rules are applied to make sense of all the data
BSMs black box allows for the aggregation of instrumentation data from each of these systems
into a single location. More importantly, its calculation engine allows for the relation of
information between individual management systems. As BSM comes equipped with its own
suite of tools for acting on the information it receives, it uses the service model to apply business
rules taking data from disparate source and turning it into information that is meaningful to the
business. It is possible to use the BSM system as the overarching umbrella for the management
of many facets of an IT environment. In the next few sections, lets take a look at how this can be
done through the connection of BSM to the other management systems in an environment.
144
Chapter 7
Management tools. Most organizations have several management and monitoring tools
already in place. While a BSM system doesnt replace these existing tools, it makes it
easy to integrate data from other IT infrastructure management products, service desk
software, configuration management databases and other applications across multiple
platforms. BSM compliments management and monitoring tools by filtering out the
noise and turning multitude of IT data into information that makes sense to the business.
Notification. In the same vein as with management toolsets, notification elements can
also be segregated. Network element notification can be enabled through one protocol
and service while server and application notification are enabled through an entirely
different one. When notification systems are segregated from each other, it grows
challenging to identify root causes in an environment as notifications associated with one
part of the environment source from one location, while other notifications source from
elsewhere. When BSM ties into segregated notification systems, this centralizes the point
of notification and eases the pain of discovery.
Figure 7.3: An example of how BSM calculations can be inputted into the system. Here, the calculations are
used to validate network SLA compliance. The information gathered to fulfill these calculations can come
from many different sources
145
Chapter 7
Operational Visibility
BSM also provides return in terms of the overall visibility and control into IT systems. As IT
systems are based on intercommunication between hundreds or thousands of disparate elements
all residing on a common network, it is not possible to see the environment in a physical sense.
Instruments are necessary to do the seeing for its operators. These instruments, provided by the
point monitoring tools and the data aggregated into the BSM solution, provide a human-readable
representation of the health and operation of services on the network. This representation
provides value to the business in a few key ways:
Situational Awareness. When instruments are provided with the best possible data in
which to perform their calculations, the users of an environment get the best possible
understanding of the function of that environment. Situational awareness refers to the
ability for a systems users to recognize what is going on within the network. BSM
provides this through the translation and reduction of huge quantities of data into levels
that are consumable by its users. Later on in this chapter well talk about some of the
individual data blocks that are used in BSM visualizations to enable this.
Characterization and Resolution of Problems. Along with the point above is the proper
understanding of the problem itself. BSMs incorporation of data from multiple types of
monitoring systems as well as its own EUE instrumentation gives the troubleshooter the
necessary awareness of the problem itself.
Identifying Fault Domain & Root Cause. Because problems at the outset might be
masked by other factors, the largest time in component problem resolution in IT is
typically spent in problem identification. Finding the location where the problem exists,
the fault domain, as well as the actual source of the problem itself, the root cause is
challenging without proper situational awareness. Lacking these capabilities, IT can
spend too much time tracking down a problem in the wrong location. BSMs toolsets,
especially with its agent-based and agentless EUE monitoring goes far into identifying
the individual transactions related to the issue. Reducing the time spent in problem
identification mode can significantly reduce the operational costs of that problem, as its
time-to-restore metrics are greatly enhanced.
146
Chapter 7
147
Chapter 7
For Management
Management and executives find value in BSMs summarization of the transactions being made
within monitored business services. This group of people is incentivized to ensure the current
and future profitability of the company. When companies make use of business services and IT
as a function for bringing income into the company, any situational awareness associated with
the rate and movement of that income is of value to this group.
Moreover, as the level of resolution increases for the data provided to this group, they become
increasingly better capable of making decisions about the products transacted through the
business system. They can make business decisions regarding changing those products. As an
example, they can alter the products presentation, or market them in different ways. When those
events occur, this group through the data presented to them can see immediately how those
activities relate to the rate of sale or other factors of importance.
Its important to mention too that products are not the only focus for BSMs financial and management
visualizations. If BSM is tied instead into services managed by the business system, the same kinds
of monitoring and visualizations can be provided to management.
If you take another look through the example visualizations shown in Chapter 6, youll see that many
of the sample implementations there relate to service-based industries and the continual
improvement associated with their activities.
KPIs. Key Performance Indicators are a central and critical meter for this group of
people. At the management level, KPIs often measure the performance of the business as
a financial unit. Without delving into specific kinds of KPIs, in traditionally manual
systems these valuable indicators may be presented to management only at intervals.
Decreasing the quantity of time between these intervals increases the resolution of the
data. It increases the quality of the information being provided to management. BSM
does this by measuring and reporting on KPIs in real or near-real time.
Overall Service Quality. Related to business metrics and KPIs is the overall measurement
of service quality as a whole. This single metric provides to management a singleglimpse understanding of the functionality of the business system of interest. As well see
in the data blocks later on, visualizations associated with overall service quality can be
created in ways to make it very obvious to management the exact point when service
becomes degraded and when it again returns to acceptable service. Knowing this
information assists them with performing their management duties.
Business Impact. Related not only to the quality of the service being provided to a
system, but also to the area in which problems are resolved is the idea of business impact.
When overall service quality is reduced for a service below acceptable levels, some
activity or element is the cause of that change. Often, multiple problems are present
simultaneously on a system. It is important that available resources be assigned to fix
those problems with greatest business impact first. The measurement of business impact
provides management with critical information to this end.
148
Chapter 7
For IT
ITs needs can be much different than those of business management. Whereas business
management concerns themselves with the viability of the company as a whole, ITs
responsibilities are scoped towards management of the computing environment. As such, IT will
be interested in validating the health and functionality of systems that drive business services. In
stating this, it is important to recognize that the same types of ingested data that fulfill the needs
of management are often used to populate metrics for IT. In terms of providing return back to IT,
some of the following metrics are valuable to ITs daily activities and long-term planning:
Service Level Statistics and Compliance. Mature IT organizations should have mature
SLAs in place for managing their relationship with the business. As stated in previous
chapters, the problem with many SLAs is in determining the quantitative measurement of
their fulfillment. As a manual task, this can be highly time-consuming and provides
results only at intervals. The calculations within a fully-realized BSM instance can do this
automatically and at regular intervals. By providing proactive SLA information at more
regular intervals, IT can better gauge how changes to the environment directly impact
service quality.
Mean Time to Restore. Problems within IT environments are a fact of life. Issues with
complex computing equipment happen all the time. Managing how those problems are
resolved is one of the major tasks of IT management. The fulfillment of this metric helps
IT understand how well-positioned are their resources. It also helps them reposition
resources to better fulfill problem resolution.
Affected Users. As discussed above, identifying the affected users and the level of
affected users assists in positioning troubleshooting resources in the best way possible.
When multiple problems occur simultaneously, relating the problems to the number of
users being affected by the problem means that higher-impact problems are resolved first.
149
Chapter 7
For Customers
Customers are a different group entirely than the other two discussed in this section. As external
Customers are typically non-trusted or semi-trusted members of the computing environment, the
level and type of data provided to these people should be much less than internal employees. As
discussed in our chapter example, any data released to televisions around the corporation will
likely need to be highly scrutinized to eliminate the probability of disclosure of sensitive
information.
That being said, customers of a business system are often outside the organization. Thus, as the
ultimate end-user of the system, they are most likely to want to know information about overall
system status. Three good metrics are helpful to end-user customers that provide this level of
information:
Scheduled Outages. When a BSM system is integrated with a notification system for
scheduled outages, this enables end-users to get rich alerting. Consider the problem of
being an end-user when this information is not available. When a scheduled outage
causes the system to be unresponsive to the end-user, if they have no information about
the timing of the outage, they are forced to re-attempt entry at regular intervals until the
system is again responsive. This involves a time cost for the end-user. By providing them
with a notification that shows when the system can be expected to be again available,
they can attend to other tasks until the expected return-to-service time.
Outage Notification. Outage notifications are similar in concept to the metric above, but
also include unscheduled outages. Similar to the problem outlined above, when a
business system experiences an unscheduled outage due to a problem, being able to
provide users with a notification about the problem lets them determine their next course
of action. Ultimately, providing more information of this type to the user means greater
user satisfaction during non-nominal periods.
Infrastructure Status. When simple outage notifications are augmented with additional
data regarding individual components of the business system, this status information can
also be of value. Providing information about individual system status helps more
technical users with better explanations about the activities theyre seeing currently onsystem.
With rare exception, any data provided to end users regarding the status of their experience helps
increase their level of satisfaction with the system.
150
Chapter 7
Availability, either in terms of user or service availability measures the quantity of time over a
period in which a service can be used by its consumers. Depending on whether were measuring
that based on the service itself or the ability for its users to make use of that service, we may
want to include two metrics.
The charts above measure this timing over a period of time, in this case 24 hours. By providing
this measurement, the viewer can see immediately the overall health of the system.
151
Chapter 7
Control Charts
Control charts are general tools for relating numerical information. They are used to measure the
value of a metric over a period of time. The metric in the chart above is irrelevant to our
discussion. What is important is that any metric of interest can be measured over a preconfigured period of time using these types of charts.
Dial Charts
A generalization of the chart used above to measure availability is the dial chart itself. Dial
charts are handy for both financial as well as IT-based metrics because of the very obvious way
in which they relate their data. Typically bad levels associated with metrics are associated with
the left side of the dial. Good levels are put on the right. We say good and bad here rather
than low and high because with some metrics the high value may represent a bad
condition. As you can see in the example above, the metric for service value is used in this
chart, but any bounded metric can be used with this type of data block.
152
Chapter 7
Metrics Charts
In some cases, actual values may be of interest to the consumer rather than a graphical
representation. In these cases, metrics charts can provide actual numerical values associated with
the metrics being gathered by the system. In the case above, we are measuring the Mean-Time
Between Failures and Mean-Time to Restore metrics for a series of elements. For each, a
value associated with the meeting or failure to meet the SLA is also positioned.
Metrics charts are generic in that any values that relate to each other can be used in the same
chart. Also handy is the addition of rules to metrics charts for particular columns. As an example,
in the chart above the value of the SLA column can be configured not as a direct measurement
but instead as a test based on the values of the other columns. This feature allows the data block
author to provide text values for numerical measurements when added clarity is needed.
153
Chapter 7
Pareto Charts
Pareto charts are a type of bar chart that measures multiple values and plots them in descending
order from left to right. In the case above, the bar charts starting from the bottom are read left to
right with the instance of highest value on the left. Values are often related to percentages, but
this is not required.
Pareto charts also show for each value the cumulative percentage associated with the metric
measured. In the case above, the actual value for IT approximates 47 to 48 units of outages. That
integer value represents 40% of the total quantity measured across all elements in the chart. As
the line graph moves from left to right it represents the cumulative percentage associated with
each elements bar and those to its left. This depicts the top sources causing problems to the
overall service with IT being the first, intranet being the second, and so on.
Pareto charts are most often used to measure quality. Their main purpose is to highlight the most
important factors among a set of factors.
154
Chapter 7
6 Sigma Charts
When success and failure of processes can be measured quantitatively, the use of 6 Sigma charts
can be helpful in providing a visual notification of process quality. Well talk more about 6
Sigma in Chapter 9, but for now know that these charts show the level of successful and failed
processes instantiations over a period of time. They are handy in finding areas in which
processes are failing at inappropriate levels.
155
Chapter 7
When outages occur, it is important to learn quickly the number and class of users being affected
by that outage. Internal calculations based on user count and revenue per user can augment
outage impact charts with rich levels of data. In the chart above, we see that three of the four
services in our example are currently down. But of those three services, the highest impact is
currently being felt by the HR service. Thus, resolving problems there will bring the most
number of users back to service. These charts help with the prioritization of resources during
outage events.
156
Chapter 7
Along with the charts above, and similar to the metrics chart shown previously, service statistics
charts are valuable when actual data values are of interest to a consumer. For our example above,
we are showing actual values associated with downtime for a sample service. These charts are
particularly handy as drill-down elements. This allows the consumer to visually see a problem
through a more graphical element, and later drill down to actual values when desired.
157
Chapter 7
Stoplight Charts
Stoplight charts provide notification to the user similar to how stoplights notify drivers when
they are required to stop or allowed to proceed through an intersection. With stoplight charts,
however, the red color indicates poor performance of the metric in the column for the service in
the row. Green indicates acceptable performance, while yellow indicates some measurement inbetween. The power of stoplight charts comes in the ability to identify specifically what each
color means. Thus, for different charts, the measurement for green and red can and is likely
to be different.
The value in this abstraction comes when the business decides to later manipulate the values for
what they assume to be good versus bad. The chart and its notifications need not change, but the
data the drives that chart changes on the back-end.
Heat Charts
Heat charts are a particular type of stoplight chart that occurs over time. As you can see in the
chart above, the service Intranet experienced a period of middling performance between the
hours of 05:00 and 09:00, followed by a drop into bad. These charts are particularly powerful
over and above stoplight charts because they show an extra axis of data, namely the value of the
stoplight color over time.
158
Chapter 7
Business Calendars
When businesses move to global operations, the spread of time zones between sites and business
services adds a layer of complexity to scheduling outages and providing peak levels of service.
Business calendars assist the consumer with identifying the peak and non-peak levels of service
when calculated across all time zones. This information comes in handy in identifying the best
times of day to perform activities on the system.
Service Quality (Real Time) Charts
BSM is all about service quality that directly impacts the business, and these chart types are a
representative sample of showing service quality at an instantaneous moment in time. The chart
above shows for each of the services being measured how well those services are performing. As
with any of the other charts, the power with these charts is the ability to change the validation
logic in the background as the business identifies new or updated thresholds for what is
considered quality service.
159
Chapter 7
Service Quality (History) Charts
Similar in function to the chart that Figure 7.16 shows, but adding in the extra axis of time,
historical service quality charts provide historical quality information to the consumer. Similar in
difference between stoplight and heat charts, these charts give the consumer information about
when a particular service may have regressed into poor quality. In the example above, the
measured service is performing very poorly, with two movements into the completely down
state.
160
Chapter 7
Image Maps
With any of the charts weve talked about thus far, it is occasionally useful to plot them against
an image of some form. That image enhances the visual notification associated with the metric.
Most often, these image maps are area maps or geographical maps, but they can relate to any
image that makes sense to the user and the metrics chosen. As you can see in the image above,
we are relating statusred versus greento particular areas on the globe.
Drill-Down Reports
With any of these elements, the ability to create hyperlinks from data block to data block
provides a level of added information to the user. In this case, drill down reports provide specific
information that describes why a graphic is represented in the way it is. In the case above,
clicking on a representative image drills down to specifics about individual services and their
status.
161
Chapter 7
Service Trees
Relating service quality and drill downs in another fashion are service trees. These types of
charts allow the user a single-view look at multiple metrics in a tree-view. Here, we can see the
rollup status of various services. By clicking on the plus sign next to any service, we drill down
to the dependant services below the major header. Creating multiple levels, often based on the
service model itself, provides the user a single-glimpse view of the entire environment. Linking
service tree data blocks to drill down reports provides the consumer with a holistic way of
determining exactly what is causing problems in the environment.
162
Chapter 7
163