Vous êtes sur la page 1sur 281

Introduction

Introduction to Realtimepublishers
by Don Jones, Series Editor

For several years, now, Realtime has produced dozens and dozens of high-quality books that just
happen to be delivered in electronic format—at no cost to you, the reader. We’ve made this
unique publishing model work through the generous support and cooperation of our sponsors,
who agree to bear each book’s production expenses for the benefit of our readers.
Although we’ve always offered our publications to you for free, don’t think for a moment that
quality is anything less than our top priority. My job is to make sure that our books are as good
as—and in most cases better than—any printed book that would cost you $40 or more. Our
electronic publishing model offers several advantages over printed books: You receive chapters
literally as fast as our authors produce them (hence the “realtime” aspect of our model), and we
can update chapters to reflect the latest changes in technology.
I want to point out that our books are by no means paid advertisements or white papers. We’re an
independent publishing company, and an important aspect of my job is to make sure that our
authors are free to voice their expertise and opinions without reservation or restriction. We
maintain complete editorial control of our publications, and I’m proud that we’ve produced so
many quality books over the past years.
I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially if
you’ve received this publication from a friend or colleague. We have a wide variety of additional
books on a range of topics, and you’re sure to find something that’s of interest to you—and it
won’t cost you a thing. We hope you’ll continue to come to Realtime for your educational needs
far into the future.
Until then, enjoy.
Don Jones

i
Table of Contents

Introduction to Realtimepublishers.................................................................................................. i
Chapter 1: The State of Systems Management ................................................................................1
Overview..........................................................................................................................................1
Goals of Systems Management........................................................................................................2
Business Alignment .............................................................................................................3
Coherent Business Strategy .....................................................................................3
Multiple Business Objectives ..................................................................................4
Dynamic Requirements............................................................................................4
Technical Integrity ...............................................................................................................5
Malfunctioning Applications ...................................................................................6
Malicious Software ..................................................................................................7
System Configuration Vulnerabilities......................................................................8
Improperly Managed Access Controls...................................................................11
System Availability............................................................................................................12
Compliance ........................................................................................................................13
Spectrum of Systems Management Practices ................................................................................15
Ad Hoc Systems Management...........................................................................................15
Ad Hoc Systems Management in “Practice” .........................................................15
Effects of Ad Hoc Systems Management ..............................................................16
Controlled Systems Management ......................................................................................17
Continuous Improvement...................................................................................................19
Rationalizing Systems Management: SOM ...................................................................................21
Elements of SOM...............................................................................................................21
Unified Management Framework ..........................................................................22
Modular Services ...................................................................................................22
Open Architecture..................................................................................................22
Benefits of Service-Oriented Systems Management .........................................................23
Summary ........................................................................................................................................24
Chapter 2: Core Processes in Systems Management .....................................................................25
Aligning Business Objective and IT Operations ...........................................................................25
Ad Hoc Growth of IT Infrastructure..................................................................................26
Managing IT to the Big Picture .........................................................................................27
Planning and Risk Management in IT ...........................................................................................27

ii
Table of Contents

Basics of IT Planning.........................................................................................................27
Planning Technical Architecture............................................................................28
Organizational Structure ........................................................................................29
Budget and Staff Management...............................................................................30
Communications ....................................................................................................30
Risk Management in IT .....................................................................................................31
Prioritizing Business Objectives ............................................................................31
Assessing Risks and Impacts .................................................................................31
Mitigating Risks.....................................................................................................32
Business Continuity .......................................................................................................................33
Maintaining Security and Ensuring Compliance ...........................................................................33
Regulations and Compliance .............................................................................................34
Privacy and Confidentiality ...................................................................................34
Information Integrity..............................................................................................35
Information Security ..........................................................................................................36
Threat Assessment .................................................................................................36
Vulnerability Management ....................................................................................37
Change Control ......................................................................................................38
Auditing for Security and Systems Management ..............................................................40
System Events........................................................................................................40
Application-Level Auditing ...................................................................................41
User Auditing.........................................................................................................41
Incident Response ..............................................................................................................41
Capacity Planning and Asset Management....................................................................................42
Capacity Planning ..............................................................................................................42
Asset Management.............................................................................................................42
Acquiring Assets....................................................................................................43
Deploying and Configuring Assets........................................................................43
Maintaining and Retiring Assets............................................................................44
Service Delivery.............................................................................................................................44
Service Level Management................................................................................................45
Financial Management of IT Services ...............................................................................45
Capacity and Availability Management.............................................................................45

iii
Table of Contents

IT Service Continuity Management...................................................................................46


Summary ........................................................................................................................................46
Chapter 3: Industry Standard Practices and Service-Oriented Management.................................47
Organizing IT Operations Around SOM .......................................................................................48
Overview of Best Practice Frameworks ........................................................................................49
Best Practice Principal 1: IT Services Have Much in Common........................................49
Best Practice Principal 2: IT Services Are Interdependent................................................50
Best Practice Principal 3: Measure IT Services.................................................................50
KPIs........................................................................................................................51
Best Practice Principal 4: Utilize Repeatable Process .......................................................55
Best Practice Principal 5: Leverage Broadly Applicable Models......................................56
Frameworks and SOM .......................................................................................................56
Best Practice Frameworks and SOM .............................................................................................57
Technology Management and ITIL ...................................................................................58
Service Delivery within ITIL.................................................................................58
Service Support within ITIL ..................................................................................59
Planning to Implement Service Management ........................................................60
Security Management ............................................................................................60
Infrastructure Management....................................................................................60
Release Management .............................................................................................60
Other ITIL Disciplines...........................................................................................61
COBIT................................................................................................................................61
Planning and Acquiring .........................................................................................62
Acquiring and Implementing .................................................................................62
Delivering and Supporting.....................................................................................63
Monitoring and Evaluating ....................................................................................63
Information Security and ISO 17799.................................................................................64
Risk Management and NIST Guide for Technology Systems...........................................65
Risk Mitigation ......................................................................................................66
Risk Evaluation and Assessment ...........................................................................67
Leveraging SOM to Support Frameworks and Standards .............................................................67
Summary ........................................................................................................................................68
Chapter 4: Moving to a Service-Oriented System Management Model........................................69

iv
Table of Contents

Building a Foundation for Enterprise IT Systems Management ...................................................70


Asset Tracking ...............................................................................................................................71
Inventory Management ......................................................................................................72
Controlling Change with Standardization..............................................................73
Fine-Grained Inventory Controls...........................................................................74
Patch Management.............................................................................................................74
Assessing the Relevance of Patches ......................................................................75
Testing Patches ......................................................................................................76
Scheduling Patch Installations ...............................................................................77
Deploying Patches .................................................................................................77
Distributing Patches ...............................................................................................78
Verifying Patches...................................................................................................78
Cataloging Patches.................................................................................................79
System Security .................................................................................................................79
Protecting Confidentiality of Information .............................................................79
Ensuring System Integrity......................................................................................80
Maintaining Availability........................................................................................80
The Role of Systems Management in Protecting Confidentiality, Integrity, and
Availability ............................................................................................................80
Risk Management ..............................................................................................................81
Licensing............................................................................................................................82
Service Delivery.................................................................................................................84
Structure of CMDBs ......................................................................................................................84
Definitive Software Library...............................................................................................85
Configuration and Status Data Repository ........................................................................86
Clients and Server Configuration and Status Data ................................................87
Network Management Devices..............................................................................87
Event Monitoring Applications..............................................................................88
Beyond Silos: Integrating Data..............................................................................88
CMDB and Asset Life Cycle .........................................................................................................88
Summary ........................................................................................................................................89
Chapter 5: Implementing System Management Services, Part 1: Deploying Service Support .....92
Elements of Service Support..........................................................................................................93
Interdependent Service Support Processes ........................................................................93

v
Table of Contents

Automated Configuration Management ............................................................................94


Data Collection Procedures....................................................................................94
Centralized Data Repository ..................................................................................95
Process Flow Support ............................................................................................95
Information Retrieval.............................................................................................96
Incident Management.....................................................................................................................98
Characteristics of Incidents................................................................................................98
Severity ..................................................................................................................98
Assets .....................................................................................................................98
Personnel................................................................................................................99
Resolution Method.................................................................................................99
Incident Types....................................................................................................................99
Resolving Incidents..........................................................................................................101
Problem Management ..................................................................................................................102
Trend Analysis .................................................................................................................103
Configuration Management .........................................................................................................103
Planning ...........................................................................................................................103
Identification ....................................................................................................................104
Control .............................................................................................................................105
Status Accounting ............................................................................................................105
Verification and Audit .....................................................................................................105
Change Management ...................................................................................................................106
Ripple Effects of Change .................................................................................................106
Change Controls...............................................................................................................107
Requests for Change ............................................................................................107
Change Advisory Board.......................................................................................108
Release Management ...................................................................................................................108
Planning Releases ............................................................................................................109
Testing and Verifying Releases .......................................................................................109
Software Testing ..................................................................................................110
Data Migration Testing ........................................................................................110
Integration Testing ...............................................................................................110
Software Distributions .....................................................................................................111

vi
Table of Contents

Communications and Training.........................................................................................112


Summary ......................................................................................................................................112
Chapter 6: Implementing Systems Management Services, Part 2: Managing Service Delivery.113
Service-Level Management .........................................................................................................113
Application Functionality ................................................................................................114
Training............................................................................................................................115
Backup and Recovery ......................................................................................................115
Recovery Time Objectives...................................................................................115
Recovery Point Objectives...................................................................................116
Availability ..........................................................................................................117
Access Controls ...............................................................................................................118
Identification and Identity Management..............................................................119
Authentication......................................................................................................120
Authorization .......................................................................................................121
Service Catalog and Satisfaction Metrics ........................................................................121
Financial Management for IT Services........................................................................................122
Cost Accounting...............................................................................................................122
Competing Requirements.....................................................................................123
Cost Allocation ....................................................................................................123
Implementing Charge Backs................................................................................124
Forecasting.......................................................................................................................124
Forecasting at the Appropriate Level...................................................................124
Differing Patterns of Cost Growth.......................................................................125
Accounting for Cash Flow ...................................................................................125
Capital Expenditure Analysis ..........................................................................................126
NPV......................................................................................................................126
ROI.......................................................................................................................127
IRR.......................................................................................................................129
Operations and Project Financial Management ...............................................................130
Operational Management Issues ..........................................................................130
Project Management ............................................................................................131
Capacity Management .................................................................................................................133
Performance Management ...............................................................................................133

vii
Table of Contents

Workload Management....................................................................................................134
Application Sizing and Modeling ....................................................................................134
Availability and Continuity Management....................................................................................135
Availability and SLAs......................................................................................................135
Continuity Management...................................................................................................135
Summary ......................................................................................................................................136
Chapter 7: Implementing Systems Management Services, Part 3: Managing Applications and
Assets ...........................................................................................................................................137
Application Life Cycles ...............................................................................................................137
Business Justification.......................................................................................................139
Requirements Phase .........................................................................................................140
Functional Requirements .....................................................................................140
Security Requirements .........................................................................................141
Integration Requirements.....................................................................................142
Non-Functional Requirements .............................................................................143
Analysis and Design ........................................................................................................145
Solution Frameworks ...........................................................................................146
Buy vs. Build .......................................................................................................149
Detailed Design....................................................................................................150
Development ....................................................................................................................151
Source Code Management ...................................................................................151
System Builds ......................................................................................................151
Regression Testing...............................................................................................152
Software Testing ..............................................................................................................153
Software Deployment ......................................................................................................154
Software Maintenance .....................................................................................................155
Role of Application Development Life Cycle in Systems Management .........................155
Managing Application Dependencies ..........................................................................................156
Data Dependencies...........................................................................................................156
Time Dependencies..........................................................................................................157
Software Dependencies....................................................................................................157
Hardware Dependencies ..................................................................................................157
Application Asset Management...................................................................................................158
Acquiring Assets..............................................................................................................158

viii
Table of Contents

Deploying, Managing, and Decommissioning Applications ...........................................159


Summary ......................................................................................................................................159
Chapter 8: Leveraging Systems Management Processes for IT Governance ..............................160
What Is Governance?...................................................................................................................160
Governance: An Example ............................................................................................................161
Planning a VoIP Implementation.....................................................................................161
Implementing a VoIP Solution ........................................................................................162
Maintaining and Servicing the VoIP Service...................................................................163
Monitoring Operations.....................................................................................................164
Governing IT Services .................................................................................................................164
Planning and Organization...............................................................................................164
Defining the IT Strategic Plan .............................................................................165
Defining IT Architecture......................................................................................165
Defining IT Processes and Organization .............................................................167
Managing IT Investments ....................................................................................168
Managing Human Resources and Projects ..........................................................168
Managing IT Risks...............................................................................................169
Acquisition and Implementation......................................................................................169
Evaluating and Selecting Solutions .....................................................................169
Acquiring and Maintaining Systems....................................................................170
Enabling Operation and Use ................................................................................171
Managing Change ................................................................................................171
Delivery and Support .......................................................................................................172
Managing Service Levels.....................................................................................172
Maintaining Performance and Capacity Levels...................................................173
Ensuring Security of Systems ..............................................................................173
Managing Budgets and Resources .......................................................................174
Providing Training ...............................................................................................175
Providing Service Support ...................................................................................176
Managing Data.....................................................................................................176
Managing the Physical Infrastructure ..................................................................177
Monitoring and Evaluating IT Management....................................................................177
Governance and Maturity Models ...............................................................................................178

ix
Table of Contents

Examples of Varying Levels of Capability Maturity.......................................................178


Capability Maturity Models.............................................................................................179
Summary ......................................................................................................................................180
Chapter 9: Supporting Security with Systems Management .......................................................181
Network Security .........................................................................................................................182
Host Security................................................................................................................................183
Personal Firewalls............................................................................................................183
Anti-Malware...................................................................................................................184
Viruses and Worms..............................................................................................184
Keyloggers and Video Frame Grabbers...............................................................185
Trojan Horses.......................................................................................................186
Remote Control and Botnets................................................................................186
Hiding Malware with Rootkits ............................................................................186
Managing Security Vulnerabilities ..............................................................................................188
Configuration and Patch Management.........................................................................................189
Configuration Management .............................................................................................189
Patch Management...........................................................................................................189
Controlling Access.......................................................................................................................190
Identity Management, Authentication, and Authorization ..............................................190
File and Disk Encryption .................................................................................................191
VPNs and Secure Remote Access....................................................................................192
Security Information Management ..............................................................................................192
Security Policies...............................................................................................................193
Compliance ......................................................................................................................194
Security Management and Asset Management................................................................194
Hardware and Software Asset Management........................................................195
Information Classification ...................................................................................196
Security Auditing and Monitoring ...............................................................................................197
Audit Controls..................................................................................................................197
Security Monitoring .........................................................................................................198
Security Management and Risk Assessment ...................................................................198
Security Management and Business Continuity Management ........................................199
Incident Response ........................................................................................................................200

x
Table of Contents

Incident Response Procedures .........................................................................................200


Training and Incident Response...........................................................................200
Separation of Duties.............................................................................................201
Response Evaluation............................................................................................201
Summary ......................................................................................................................................201
Chapter 10: Managing Risk in Information Systems...................................................................202
The Practice of Risk Analysis......................................................................................................202
Identify Information Assets and Threats..........................................................................202
Servers and Client Devices ..................................................................................203
Network Devices..................................................................................................204
Databases .............................................................................................................205
Application Code .................................................................................................206
Systems Documentation.......................................................................................207
Intellectual Property.............................................................................................208
Determine Impact of Risks ..............................................................................................209
Types of Costs......................................................................................................209
Determine Costs...................................................................................................210
Determine the Likelihood of Threats...............................................................................211
Calculating Risk Measures ..............................................................................................211
Exposure Factors..................................................................................................212
Annualized Rate of Occurrence ...........................................................................212
Annualized Loss Expectancy...............................................................................213
Qualitative Risk Assessment............................................................................................214
Risk Analysis Steps..........................................................................................................215
Understanding Business Impact of Risks ....................................................................................216
Operational Impacts .........................................................................................................217
Compliance Impact ..........................................................................................................219
Business Relationship Impact ..........................................................................................220
Customer Relationship Impact.........................................................................................221
Summary ......................................................................................................................................221
Chapter 11: Benefits of Mature Systems Management Processes...............................................222
Controlling IT Costs ....................................................................................................................222
Labor Costs ......................................................................................................................223

xi
Table of Contents

Automation of Manual Processes ........................................................................224


Cross-Functional Skills........................................................................................230
Improved Support Services..................................................................................232
Capital Expenditures........................................................................................................233
Improved Asset Management ..............................................................................233
Decision Support Reporting.................................................................................235
Operating Costs................................................................................................................235
Improved Management Reporting .......................................................................236
Improved Allocation of Resources ......................................................................237
Improved Predictability of Operations.................................................................239
Improved License Management...........................................................................240
Improved Security Posture...................................................................................240
Cost of Not Controlling IT...........................................................................................................242
Compliance ......................................................................................................................242
Loss of System Integrity and Availability .......................................................................243
Loss of Confidential and Private Information .................................................................243
Business Disruption .........................................................................................................244
Summary ......................................................................................................................................244
Chapter 12: Roadmap to Implementing Service-Oriented Systems Management Services........245
Limits of Traditional Management Models and Emerging Challenges.......................................246
Demise of Device-Centric Systems Management ...........................................................246
Example Benefits of Service-Oriented Management ......................................................247
Roadmap Step 1: Assessing the Current Status of IT Practices...................................................248
Overall Business Strategy and Goals...............................................................................249
IT Alignment and Business Strategy ...............................................................................249
Risk Tolerance of the Organization .................................................................................250
Financial Constraints ...........................................................................................251
Time Constraints..................................................................................................251
Unknown Factors .................................................................................................252
Unknown Frequencies of Risks ...........................................................................252
Technical Limitations ..........................................................................................252
Resource Constraints ...........................................................................................253
Responding to Risks ............................................................................................253

xii
Table of Contents

Roadmap Step 2: Planning Transition to Mature Service Model ................................................254


Prioritizing Needs ............................................................................................................255
New Business.......................................................................................................256
Post-Merger Organization....................................................................................257
Highly Regulated Organization ...........................................................................257
Building a Central Management Foundation...................................................................259
Types of Information in Centralized Repository .................................................259
Methods for Collecting and Verifying Configuration Item Data.........................260
Optimizing Policies and Procedures ................................................................................260
Roadmap Step 3: Implementing a Service Model for Systems Management .............................262
Adapting Best Practices ...................................................................................................262
Measuring Operations......................................................................................................264
Adapting to Changing Business Requirements................................................................264
Summary ......................................................................................................................................265

xiii
Copyright Statement

Copyright Statement
© 2007 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that
have been created, developed, or commissioned by, and published with the permission
of, Realtimepublishers.com, Inc. (the “Materials”) and this site and any such Materials are
protected by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web
site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be
held liable for technical or editorial errors or omissions contained in the Materials,
including without limitation, for any direct, indirect, incidental, special, exemplary or
consequential damages whatsoever resulting from the use of any information contained
in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, non-
commercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent
& Trademark Office. All other product or service names are the property of their
respective owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtimepublishers.com, please contact us via e-mail at
info@realtimepublishers.com.

xiv
Chapter 1

[Editor's Note: This eBook was downloaded from Realtime Nexus—The Digital Library. All
leading technology guides from Realtimepublishers can be found at
http://nexus.realtimepublishers.com.]

Chapter 1: The State of Systems Management


The information technology (IT) in an organization is a dynamic resource that is constantly
adapted to meet changing needs. The rational practice of controlling that change is known as
systems management. As with many other organizational practices, systems management has
evolved from informal ad hoc responses to immediate needs to a well-understood, formalized
practice. This guide examines best practices for systems management with an emphasis on a
modularized approach known as service-oriented management (SOM).

Overview
The book consists of twelve chapters that begin with a background on systems management
practices, then describes SOM in terms of several well-known frameworks for systems
management and related areas, and finally moves on to a detailed discussion of how to
implement SOM. Specifically, the chapters will address:
• Chapter 1 discusses the goals of systems management, typical implementation styles, and
the need for a rationalized process, such as SOM.
• Chapter 2 describes essential parts of systems management, including aligning with
business objectives, managing assets, delivering services, and maintaining compliance.
• Chapter 3 discusses SOM in terms of well-known frameworks such as ITIL, COBIT, and
ISO-17799.
• Chapter 4 describes the infrastructure required to implement a rational, efficient systems
management environment, including a configuration management database.
• Chapter 5 examines the elements of service support, such as incident, configuration, and
change management.
• Chapter 6 explores how to address financial issues, capacity planning, and availability
management issues in SOM.
• Chapter 7 discusses application life cycles, software asset management, and managing
hardware elements of IT infrastructure.
• Chapter 8 looks at systems management as a tool for supporting control objectives and
management guidelines that govern IT operations.
• Chapter 9 examines the role of systems management in threat assessments, vulnerability
management, incident response, and other aspects of security management.
• Chapter 10 describes the practice of risk management and shows how identifying risks,
prioritizing assets, and mitigating risks serve both risk management and systems
management objectives.
• Chapter 11 examines the business case for SOM with particular attention paid to the cost
of not adequately managing systems.
• Chapter 12 describes how to assess the current state of an organization’s systems
management practice and how to plan the transition of a SOM model as well as provides
guidance on how to implement a mature systems management practice.

1
Chapter 1

The responsibilities of systems administrators and IT managers are growing in complexity. The
need to support a growing number of systems with increasing dependencies between those
systems, meet growing quality of service (QoS) expectations, and be prepared for constant
security threats are just a few of the challenges faced by systems managers. Fortunately, as the
demands have expanded so too have the tools and practices for meeting those demands. The
purpose of this guide is to help managers and administrators apply these practices and tools to
their specific systems management challenges.
This chapter presents a high-level overview of the nature of systems management with a
discussion of three aspects of the discipline:
• The goals of systems management
• The spectrum of systems management practices
• Rationalizing systems management with SOM
Let’s begin with a fundamental issue for all IT operations—aligning with business objectives.

Goals of Systems Management


Systems management serves many purposes. To a systems administrator, systems management is
about keeping servers up and running, keeping databases responsive to user queries, and
ensuring the network continues to function. To IT managers, systems management is the means
to another set of ends, including meeting service level agreements (SLAs), controlling operations
costs, and meeting production schedules. To executives, systems management is an area for
controlling the integrity of information and maintaining compliance with any number of
regulations that may apply to the business.
There is quite a bit of overlap between the different perspectives on systems management, and
the key purposes include four fundamental objectives:
• Business alignment
• Technical integrity
• System availability
• Compliance
These objectives provide the measures and criteria to which the systems are maintained and
controlled. These are the goals of systems management.

2
Chapter 1

Business Alignment
The objective of business alignment is to ensure that the information processing needs of lines of
business (LOB) are met by IT applications and infrastructures. This sounds so logical that it
seems like common sense—and it is. Unfortunately, the constraints of real-world organizations
present significant challenges to realizing business alignment.
The challenge with business alignment is not in understanding the need for it or even convincing
business or IT personnel about its importance; the challenge is with the execution. Three
common problems encountered with business strategy execution are:
• Formulating a coherent business strategy
• Addressing multiple objectives
• Meeting dynamic requirements
In many cases, IT managers have to address one or more of these in the course of their systems
management work.

Coherent Business Strategy


The first challenge is articulating a coherent business strategy. Again, what seems simple at first
glance is actually quite involved. Businesses have the general goal of maximizing revenue. They
do so by implementing strategies related to product development, acquiring market share,
forming partnerships, and so on. Large and even midsized organizations constantly run into the
problem of multiple perspectives on a single strategy.
Consider an example of an insurance company selling auto insurance. The company’s strategy
may include building market share in the northeast and mid-Atlantic regions of the United States.
Marketers in corporate headquarters might want to initiate a set of campaigns to build market
share that may conflict with the objectives of executives responsible for third-party brokers in
that region. Similarly, the Underwriting department may want to exclude a certain demographic
group, such as male drivers under 32, even though that is one of the demographic groups that
Marketing would like to target. Add to these issues the fact that the business case for a particular
strategy is based on revenue projections that include measures such as profit margins that are
calculated differently by different LOB. Before IT operations can align with business strategy,
the strategy must be well defined.

3
Chapter 1

Multiple Business Objectives


A second difficulty with business alignment is meeting complementary needs with a minimum
number of applications. Continuing with the insurance example, both internal sales staff and
third-party agents may need to use the same policy origination system to sell insurance. Internal
sales staff may use direct mail and telemarketing as the primary means to generate sales. They
work in company offices and prefer a client/server model for the policy origination system.
Third-party agents, however, sell products from multiple insurers and have no interest in having
a client application installed on their computers from each of their partners. Web-based
applications are preferable to them because applications can be downloaded on demand, do not
require permanent installations, and entail fewer maintenance issues for users. Application
developers are now faced with either maintaining two interfaces, one client/server and one Web,
or trying to implement full client/server functionality in a Web application. The application
interface is just one example of juggling multiple requirements.
In addition to the transaction-oriented operations that involve large numbers of relatively simple
read-and-write operations, such as creating policies and submitting claims, users need
management reporting to understand the overall state of business. Claims adjusters may want to
understand trends in claims by policy type, by characteristics of the policy holder, and by
geographic region. This type of aggregate reporting is common but functions best under a
different set of database design models than transaction-oriented systems. Management
reporting, or business intelligence reporting, works best when done with operational data stores
for short-term integrated reporting and data marts and data warehouses for historical reporting
and trend analysis. Here again, IT is left to create, maintain, and manage another set of systems.

Dynamic Requirements
Business must constantly respond to changes in the marketplace. Some of these changes are
relatively slow, such as the move from strictly internal combustion powered cars to hybrid cars,
some are more moderately paced, like the shift to buying downloadable music from buying it on
discs, and still others are rapid changes, such as price spikes in the cost of petroleum products.
How effectively an organization can respond to these changes is dictated, in part, by the
organization’s ability to change its IT systems. Consider some of the roles IT applications play in
adapting to market changes:
• As partnerships are formed with other business, financial systems must be changed to
accommodate new compensation models
• Following mergers, IT infrastructures must be integrated to accommodate combined
business operations
• Downward price pressures drive process re-engineering and the adoption of greater
automation
• Outsourcing of operations may require changes to network infrastructure, security
policies and practices, and hardware configurations
In addition to these common business dynamics, there are the expected but unpredictable events,
such as natural disasters, that can disrupt operations and shift priorities.

4
Chapter 1

There will always be a time delay between changing a business objective and making the
necessary changes to implement that in IT infrastructure and procedures. The length of the delay,
though, can have a qualitative impact on the ability to execute the new business operations. As
Figure 1.1 shows, IT responses may take so long that not long after they are implemented, new
changes are required. In the worst case, the modifications are not finished before the next round
of changes is defined.
Aligning IT operations with long-term business strategies and shorter-term objectives is a
process; it is not a static state that is ever reached. The goal of IT should be to minimize the time
it takes to align with business objectives, and well-developed systems management practices are
fundamental to reaching that goal.

Figure 1.1: Time delays between the change of business strategy and the ability of IT to implement slow the
ability of organizations to adapt to changing market conditions.

IT and business alignment is based on a number of assumptions, including the technical integrity
of the IT infrastructure.

Technical Integrity
Technical integrity is the quality of information systems that ensures data is accurate, reliable,
and not subject to malicious or accidental changes. Like business alignment, technical integrity is
one of the characteristics that are so logical that we take it for granted. Systems administrators do
not take it for granted, though. Consider some of the challenges to maintaining technical
integrity:
• Malfunctioning applications
• Malicious software
• System configuration vulnerabilities
• Improperly managed access controls
Each of these categories of threats can create substantial disruption to IT operations.

5
Chapter 1

Malfunctioning Applications
Let’s face it, software has bugs. Complex software is difficult to build and programmers know
this all too well; an old quip among programmers is that “if builders built buildings the way
programmers built programs, one woodpecker would destroy civilization.” Although this
statement is an obvious exaggeration, the sentiment reflects the frustration even software
developers have with the practice of programming.

Programmers and software engineers do not just decry the state of programming; much work has
been done to improve software development practices. See for example, software development
maturity models developed by the Software Engineering Institute at http://www.sei.cmu.edu/, spiral
development methodology at
http://ieeexplore.ieee.org/iel1/2/6/00000059.pdf?tp=&arnumber=59&isnumber=6&htry=2, and Agile
development methodology at http://zsiie.icis.pcz.pl/ksiazki/Agile%20Software%20Development.pdf.

Using software development best practices can increase the quality of software, but even with
these methodologies, the impact of tracking and eliminating all bugs would make any reasonably
complex program either too expensive or available too late to be of practical use. As a result,
users, systems managers, and developers have all learned to manage less-than-perfect
applications.
One practice used to manage software deficiencies is patch management. Software developers
release corrections to applications, known as patches, which are applied by systems
administrators or through an automated process to correct known errors in software. Patching is
not a trivial process and several factors should be considered when patching:
• How to implement user acceptability testing (UAT)
• How to roll back a patch if problems occur
• Whether any of the problems corrected by the patch will have an impact on operations
• How to distribute the patch to all systems that require it
• How to track the version and patch level of all applications
In addition to these basic considerations, applications may have issues. For example, a relational
database may be used to support two applications. One application does not function because of
a bug that can be patched but the patch breaks another function required by the second
application. Should the database administrator patch the system anyway? Keep the current
version and work around the bug? Install another instance of the database, patch one instance,
leave the other instance un-patched, then run the two applications on their respective instances of
the database? To find the right answer, systems administrators and database administrators have
to weigh the costs and benefits. Just as business objectives can create competing demands when
addressing IT and business alignment, patch management can leave systems administrators to
choose between equally undesirable options.

6
Chapter 1

Malicious Software
Malicious software, commonly known as malware, includes software that ranges from annoying
to destructive. Some of the best known forms are:
• Viruses—Programs that replicate with the use of other programs or by user interaction
and carry code that performs malicious actions. Viruses consist of at least replication
code and the payload but may also include encryption of code-morphing routines to
improve chances of avoiding detection.
• Worms—Similar to viruses in payload and obfuscation techniques, worms are self-
replicating.
• Spyware—Software that installs on devices without user consent and collects private
information, such as usernames, passwords, and Web sites visited.
• Trojan horses—Programs that purport to do one thing but perform malicious activities.
For example, a Trojan horse might advertise itself as a program for synchronizing
computer clocks to atomic clocks, but it may deploy a remote control program that listens
on a specific chat room or Internet Relay Chat (IRC) channel for commands from the
malware developer.
• Keyloggers—Programs that intercept operating system (OS) messages sending keystroke
data to applications. The more sophisticated versions of these programs filter most
activity to focus on usernames, passwords, and other identifying information.
• Frame grabbers—Software that copies the contents of video buffers, which store data
about the contents displayed on the computer’s screen.
The list of malicious software categories is intimidating. To make matters worse, malware
writers seem to be in a constant cycle of responding to anti-malware developer’s
countermeasures who then respond to the malware writer’s new tricks, and on and on. On the
positive side, systems administrators are able to keep malware at bay with effective anti-malware
devices.
The use of anti-malware software, such as desktop antivirus software, and network appliances,
such as content filters, can provide adequate protection for most needs. The need for these
countermeasures introduces additional responsibilities for systems administrators. In addition to
the desktop productivity applications, application servers, databases, routers, Web servers, and
all the other business-specific and supporting tools that must be managed in an IT environment,
systems administrators must now manage a large class of mission-critical security applications
and devices. Malicious software often takes advantage of poorly secured configurations.

7
Chapter 1

System Configuration Vulnerabilities


Another threat to system integrity is configuration vulnerabilities. It has been a common practice
to ship complex software with standard configuration that works “out of the box” on most
systems. The goal is to make software as easy to install and use as possible. This goal is
understandable; we do not expect to have to master the intricacies of a new application just to
install it. The problem with this approach is that to make software functional with minimal user
intervention on a wide variety of machines, applications tend to install services that are not
always needed, highly privileged accounts, or other potential security vulnerabilities:
• A mini-computer OS commonly used in the 1980s and 1990s installed with three
privileged accounts with commonly known passwords.
• Some older database systems install with well-known privileged accounts with known
passwords.
• Selecting predefined OS configurations, such as a server configuration, can install
programs services, such as ftp, that may not be needed and may introduce security
vulnerabilities.
Software vendors are getting better about installing more secure default configurations, but as
systems grow in complexity, the potential for insecure configurations grows as well.
Another aspect of configuration vulnerabilities is that installed systems may require patching.
Such is especially the case when an OS is first installed; security patches created since the
software was released should be installed immediately. This task can be relatively easily and
automated or it may require significant effort on the part of systems administrator.
Microsoft, for example, provides the Microsoft Update service for Windows OSs, which can be
configured to automatically download and install critical patches (see Figure 1.2).

8
Chapter 1

Figure 1.2: The Microsoft Update service can determine necessary critical patches and install them on
Windows OSs.

For more thorough vulnerability scanning, the Microsoft Baseline Security Analyzer (MBSA)
can detect configuration vulnerabilities as well as missing patches, and MBSA allows systems
administrators to scan multiple systems in a single session.

9
Chapter 1

Figure 1.3: The Microsoft Baseline Security Analyzer scans systems for vulnerabilities as well as missing
patches.

In other cases, systems administrators must keep abreast of critical patches by subscribing to
mailing lists or checking vendors’ support Web sites to finds patches. Even when patches are
released, they should not be installed in production without first testing on a quality control
platform. Although installing a critical security patch for Windows can create unanticipated
problems, in most cases, the risks are far outweighed by the benefits. Nonetheless, systems
administrators should have a contingency plan in place for restoring the original configuration if
unanticipated problems occur.

Two useful security-oriented mailing lists are NTBugtraq (http://www.ntbugtraq.com/) and Secunia
(http://secunia.com/).

10
Chapter 1

It should be noted that even with security analyzing tools, one type of malware—rootkits—are
particularly difficult to control once they have compromised a system. Rootkits hide their presence
and activities by changing registry settings or other system parameters, hiding files, erasing log
entries, and other techniques. Once installed, it is difficult to guarantee that rootkits have been
removed without performing a full reinstall of system software.

Even with updated patches, securely configured servers and desktops, and malware
countermeasures, systems managers must contend with yet another threat: inappropriate access
controls.

Improperly Managed Access Controls


Access controls grant privileges to employ information resources to users typically based on
users’ role in the organization. A CFO is likely to have full read access to any data in the
financial system while an accounts payable clerk would have no access or highly restricted
access to accounts receivable information. This variance in access based on roles reflects one of
the goals of security administrators, which is to ensure that users and processes have the least
privilege required to accomplish their functions. Although the CFO and accounts payable clerk
examples are relatively simple, determining least privilege can become more complex. For
example
• What rights to an inventory system should a third-party partner have? Should he or she be
able to reserve inventory or just check the status of products?
• Should a manager in the southwest region have data warehouse reports on the northeast
region?
• When an average user searching an enterprise portals for “salary” see a listing for a file
called Current_Salary.xls even though the file can only be read by HR personnel?
• If a network administrator is terminated, how can management be sure all of his rights to
systems are revoked?
Managing access controls is a complicated and, in some cases, time-consuming process. One of
the objectives of systems management is to ensure that access controls are synchronized with
organizational policies at all times. These policies should include descriptions of who is allowed
administrator access to systems, password strength and term of use, and audit procedures.
Maintaining the technical integrity of multiple servers, desktops, mobile devices, and network
components requires a thorough understanding of application and OS configurations,
countermeasures to threats from malicious software, and effective access controls and other
security measures. Closely related to the issue of technical integrity is systems availability.

11
Chapter 1

System Availability
Disasters happen. Some are natural, such as hurricanes, and some are technical, such as the SQL
Slammer worm that effectively shut down large segments of the Internet in 2003. In both cases,
businesses and organizations lose some degree of access to their systems. The practice of
business continuity planning has evolved to address these kinds of disasters as well as other less
dramatic events that can nonetheless have an impact on systems availability.
Systems administrators play a key role in continuity planning because of their knowledge of
systems organizations, dependencies between systems, and the process that operate on those
systems. As IT infrastructure becomes more complex, business continuity planning become more
difficult. Information about the state of IT systems is needed to adequately prioritize services (for
example, in the event of a service disruption, restore payroll systems and then customer service
systems, leaving other production system for later) and ensure that all necessary systems and
processes are accounted for. A centralized and up-to-date database with information on the IT
infrastructure is required for cost-effective business continuity planning and execution. Figure
1.4 shows an example of how IT assets can be centrally managed in relation to other assets and
organizational structure.

Figure 1.4: Centralized information about IT assets is a key enabler of effective business continuity planning.

12
Chapter 1

For more information about the structure and function of configuration management databases, see
Chapter 4.

Compliance
The term compliance is getting a lot of press these days, perhaps to the point where we’ve
stopped paying attention to it, but doing so would be a mistake. The importance of maintaining
the privacy and accuracy of information is becoming more broadly recognized. Some of the best
known regulations make that clear:
• The Health Insurance Portability and Accountability Act (HIPAA) defines categories of
“protected healthcare information” and strict rules governing how that information is
gathered, stored, used, and shared. The act also defines stiff penalties for violating these
rules.
• The Sarbanes-Oxley Act, enacted in the wake of Enron, WorldCom, and similar
corporate scandals, raises the bar on ensuring accuracy in corporate reporting. CEOs and
CFOs now have to sign off on the accuracy of the information or face penalties.
• The California law, State Bill 1386, was passed in response to fears of the growing threat
of identity theft. Under this law, if identifying information of a California resident is
stolen or released in an unauthorized manner, the resident must be notified of the
disclosure.
A host of other IT-related regulations have been enacted by governments around the world. In
addition, non-governmental or quasi-governmental bodies have adopted standards and
frameworks related to financial reporting and security best practices. Table 1.1 lists some
relevant but less well-known regulations and frameworks that apply to particular countries or
industries.

13
Chapter 1

Regulation/Framework Description For More Information


BASEL II Regulates credit and risk reporting for http://www.bis.org/publ/bcbsca.htm
banks
21 CFR Part 11 Regulates pharmaceutical information http://www.fda.gov/ora/compliance_
management ref/part11/
Computer Fraud and Makes unauthorized access to http://cio.doe.gov/Documents/CFA.
Abuse Act information in federal and financial HTM
institutions illegal
Electronic Signatures in Recognizes the use of electronic http://www.ecloz.com/ecloz/Electron
Global and National signatures in commerce ic%20Signatures%20in%20Global
Commerce Act %20&%20National%20Commerce
%20Act-%20H_R_%201714.htm
FFEIC Business Regulates business continuity http://www.ffiec.gov/ffiecinfobase/ht
Continuity Planning planning in financial institutions ml_pages/bcp_book_frame.htm
Gramm-Leach-Bliley Act Regulates consumer privacy in http://banking.senate.gov/conf/grml
banking each.htm
FISMA Regulates information security http://csrc.nist.gov/sec-
planning in federal agencies cert/index.html
EU Directive 95/46/ EC - Controls the use and distribution of http://europa.eu.int/comm/internal_
Data Protection and EU personal information of EU citizens market/privacy/law_en.htm and
Directive 2002/58 EC – http://europa.eu.int/eurlex/pri/en/oj/d
Directive on Privacy at/2002/l_201/l_20120020731en003
70047.pdf.

Canada’s PIPEDA Protects personal information of http://www.privcom.gc.ca/legislation


Canadian citizens /02_06_01_01_e.asp.
Australian Federal Enacts Australian privacy principals http://www.privacy.gov.au/publicatio
Privacy Act ns/npps01.html

Table 1.1: Additional regulations related to IT.

These and similar regulations are placing new demands on IT mangers and systems
administrators to not only comply with these regulations but also demonstrate that they are in
compliance. As with business continuity planning, compliance requires a centralized
management view of all information assets to meet these demands efficiently.
The goals of systems management range from maintaining technical integrity and system
availability to achieving compliance with regulations and aligning with the strategic plans of the
business. These are demanding goals and reaching them is not guaranteed, especially if systems
management practices are not sufficient for the task.

14
Chapter 1

Spectrum of Systems Management Practices


Systems management, like other areas of IT management, has a range of management
philosophies. Some are loosely coupled procedures that are created—and evolve—as needed in
response to immediate needs, while others are highly structured and formalized around written
policies. For the purposes of this guide, it helps to examine three high-level categories of systems
management practices:
• Ad hoc systems management
• Controlled systems management
• Continuously improving systems management
This list is not meant to be exhaustive and there are nuances within each category that will not be
examined. The goal is to understand that not all systems management practices are the same—
some are more viable than others—and that the level of effort required of and effectiveness
realized from different practices can vary widely.

Ad Hoc Systems Management


At one end of the systems management spectrum is the ad hoc style, which is characterized by a
lack of formal policies and procedures. This approach to systems management responds to
requirements, incidents and problems as they arise without the benefit of planned responses. The
results are suboptimal, at best.

Ad Hoc Systems Management in “Practice”


Consider scenarios that can arise when ad hoc systems management is used:
• A user needs a utility for managing ftp transfers, so he downloads several evaluation
versions from the Web. After some time, he decides to purchase one while a colleague,
with a similar problem, decides she likes a different program. Both purchase the
programs outside their departmental budgets with no review or approval from IT. When
problems arise with the programs, the users call the IT department, which is unaware the
programs are used in the organization.
• Policies are not in place governing email use. Old messages are not automatically
archived, content is not filtered for inappropriate material, and rules are not defined
regarding appropriate use of company email. As email servers run out of disk storage,
more is purchased; all message folders are backed up and backups are taking longer and
longer to run. Email server software is patched when a systems administrator reads about
a new threat in weekly trade magazines.
• Users dictate means of collaboration, which typically involves sharing local folders and
creating common use folders on the network server. Folders are shared when a user needs
to share files and folder, and owners can change directory permissions at will. There is no
centralized repository of information about who has access to which directories.
Employees that left the company several months ago still have permissions to sensitive
directories and files.

15
Chapter 1

• A department decides that the reporting from the financial system is insufficient for their
needs and installs a database and reporting tool on a high-end desktop computer running
in their office. One of the staff in the department just read a book on data marts and
decides to implement one. The department uses an extraction, transformation, and load
tool that came with the database to pull data from the financial system every night. This
task puts additional load on the financial system at the same time it runs close-of-day
batch jobs and delays the generation of morning financial reports. The reports generated
from the data mart use different calculations, so performance measures from the financial
system do not agree with those from the data mart.
Although these examples are fictitious, the consequences described will probably sound familiar
to many IT professionals. A lack of central planning, uncoordinated decision making, and the
willingness to make changes to IT infrastructure to meet an immediate need without concern for
the ripple effects on the rest of the organization are the hallmarks of ad hoc systems management
practices. The consequences are predictable.

Effects of Ad Hoc Systems Management


The lure of ad hoc systems management is that it appears responsive and unencumbered by
unnecessary bureaucracy. If a user needs a program, he gets one. When an analyst needs better
reporting, she develops her own Microsoft Access database. If the server runs low on space, the
administrator buys another disk. Decisions are made quickly and executed just as rapidly. The
consequences from these decisions, like the effects of a degenerative disease, accumulate over
time.

Poor Management Reporting


One of the first noticeable impacts of ad hoc management is a lack of overall understanding of
the state of the IT infrastructure. There is no place a manager can go to find a list of assets, their
patch levels, the licenses associated with them, the state of their backups, or the applications
running on particular devices.

Lack of Compliance
Auditors would quickly point out that a lack of well-defined policies and procedures leave the
company potentially in violation of regulations governing information integrity (for example, the
Sarbanes-Oxley Act) and privacy (such as HIPAA).

Inefficient Allocation of Resources


Poor management increases costs as well. Rather than manage storage, some would rather
“throw another disk” at the problem, adding the immediate cost of new hardware and the
ongoing cost of supporting the additional hardware to IT spending. When infrastructure grows,
known as “server sprawl,” without an overall plan, there is the potential for a new hardware
configuration that requires a new software configuration, which just adds to the management
headaches of administrators. And because an organization is dealing with an ad hoc management
attitude, tools such as a configuration management database are not available to shoulder the
burden of additional configurations. Other costs creep in because of poor use of resources. For
example, instead of sharing a disk array among multiple departments, each department may buy
their own direct connect storage with each server, along with backup devices for each
department. Without centralized planning, there is little opportunity to leverage the economies of
scale.

16
Chapter 1

Poor Security
Security suffers because of poor management practices. Malicious software, information theft,
and other threats are a constant problem for systems administrators and IT managers. At the very
least, basic information security management requires:
• Comprehensive inventory of hardware and software in use on a network
• Configuration details on all servers, desktop, mobile devices, and network hardware
• Detailed information on users and access controls protecting assets
• The ability to audit and monitor system and network activity and to identify anomalous
events
• The ability to deploy patches and critical updates rapidly to all vulnerable devices
Clearly, the lack of centralized management information and ineffectual or poorly implemented
procedures that characterize ad hoc management undermine even the most basic security
requirements.
Lack of management controls, poor use of resources, lack of compliance, and the potential of
security threats should be motivation enough to move beyond ad hoc management to a well-
defined and centrally controlled management model.

Controlled Systems Management


As IT organizations grow and mature, the need for planning and control becomes more pressing.
The disadvantages of ad hoc management are apparent; the advantages of well-defined policies
and procedures are equally obvious. Industry best practices for systems management have been
defined in two frameworks: the IT Information Library (ITIL) and Control Objectives for
Information and Related Technologies (COBIT).

For more information about ISACA’s COBIT, see http://www.isaca.org/cobit/, and ITIL at
http://www.itil.co.uk/. These topics are also covered in more detail in Chapter 3.

17
Chapter 1

In a controlled systems management environment, the core IT operations are performed


according to a series of policies that attempt to realize broad business goals by providing reliable
and responsive computing and network resources at the most efficient cost. The core processes
within IT that should be governed by policies are:
• Acquiring infrastructure, such as hardware, software, and networking services
• Developing, customizing, and configuring applications
• Implementing security controls
• Managing service providers
• Monitoring performance and capacity planning
• Ensuring system availability
• Managing end user support
• Training, both IT and non-IT staff
• Maintaining regulatory compliance
The policies addressing each of these areas define the purpose of the operation, identify the
scope of the process, and offer guidelines for implementing the policy. For example, the purpose
of implementing security controls is to ensure the integrity, confidentiality, and availability of
information services. The scope includes areas such as physical security, access controls,
telecommunications, and network security, as well as auditing and system monitoring. The
guidelines surrounding security might include a discussion of the need to balance security with
ease of use and the risk level acceptable to the organization.
Procedures are a series of specific tasks that implement a policy. For example, the procedure for
acquiring software might include a review of the business requirements and deficiencies in
existing software and an assessment of buying instead of building a solution. Next, the procedure
might call for a review of the proposed solution. Which OSs does the application run on? What
kind of server is required? What changes to the network configuration are required? Will the
application be accessible from beyond the firewalls? Does it depend on other services, such as a
Web server, on the network? After these steps are performed, the final stage may be a review by
a change control committee made up of application owners, network administrators, and systems
managers who make one last review for potential problems with the solution.
To some, these steps may sound like bureaucratic overkill, but it is better than the alternative
presented by ad hoc management. Controlled systems management builds on policies and
procedures that recognize that IT operations consist of a series of services, such as client
management, network management, application management, and so on—not just a collection of
independent, unrelated devices and programs as understood within the context of ad hoc
management (see Figure 1.5).

18
Chapter 1

Figure 1.5: Controlled systems management depends on well-defined policies and procedures that address
each of the key services provided in an IT environment.

It is not the goal of controlled management to slow IT operations or impose arbitrary


bureaucratic overhead. In fact, controlled management improves responsiveness and
effectiveness in the long run even though it may seem to slow changes or acquisitions in the
short term.
There are two ways to streamline controlled management, and they dovetail well. First,
centralize management information and automate routine tasks. A centralized configuration
management system is the foundation for this method and will be discussed in detail in Chapter
4. The second is to leverage the information in the centralized configuration management system
(along with data about how procedures are implemented) to improve the way you implement the
tasks of systems management.

Continuous Improvement
Much has been written in the popular business press about quality and improvement. These
topics do not garner the press they once did, but the principals of quality improvement and
innovation are still relevant, even in systems management. Perhaps the best-known and most
well-established approach to realizing continuous improvement is Six Sigma, a data-centric
quality approach; another practice, Management by Fact (MBF), also emphasizes the importance
of managing by using measurements of performance.
Once an IT group has implemented controlled systems management practices, the group will
have information about assets, business use of those assets, and the changes those assets
undergo. In essence, the organization will have procedures for effectively managing assets as
well as data about the performance of those assets and related operation. With this, IT managers
and systems administrators can find ways to improve on operations.

19
Chapter 1

For example, by measuring the time from identifying the need for a new application to deploying
the solution, as well as key milestones in between, the organization can better understand the
average time to deploy, common bottlenecks in the process, and shared characteristics of failed
efforts. These and other key performance indicators (KPIs) form the foundation for measuring
improvement.
Another example relates to security. If a security breach occurs, a well-managed environment
will have audit trails and logs to help diagnose the breach as well as recovery procedures for
getting operations back online and data restored to its correct state. Configuration management
information, along with audit trails and logs, can help identify both specific and general
vulnerabilities in the current environment, which can be addressed to prevent future breaches of
the same sort.
Exceptionally well-run IT operations are not the result of one or two geniuses formulating a
perfect solution; they are instead the product of disciplined policies and procedures tightly linked
to business objectives. Like the business objectives themselves, the policies and procedures are
not static but are subject to innovation. As the well-respected management researcher and writer
Peter Drucker noted,
The purposeful innovation resulting from analysis, system and hard work is all that can
be discussed and presented as the practice of innovation. But this is all that need be
presented since it surely covers at least 90 percent of all effective innovations. And the
extraordinary performer in innovation, as in every other area, will be effective only if
grounded I the discipline and the master of it (Source: Peter Drucker, “Principals of
Innovation” in The Essential Drucker (Harper Business, 2001).
The discipline of systems management can be mastered and the practice of systems management
can be adapted and improved to meet the specific needs of different organizations.
The spectrum of systems management ranges from the reactive, uncontrolled ad hoc approach,
through a controlled, procedure-guided method to an adaptive model built on well-defined
controls that use performance measures to improve operations. Although there are many ways to
organize systems management operations, one of the most promising for the complex
heterogeneous IT environments of today is the SOM model.

20
Chapter 1

Rationalizing Systems Management: SOM


Systems management is a discipline that can be mastered and perfected. To get to high-
performance levels in systems management, organizations must implement policies and
procedures that systematically define and control how changes are made, how new infrastructure
is introduced, how systems are maintained, and a number of other areas. The diverse nature of
systems management requires a classification scheme that allows you to organize the discipline
into domains that can then be analyzed and optimized. For example, systems management
includes both client management and security management. There are clearly overlap between
the areas, but treating them as distinct operations will prevent you from being overwhelmed with
the complexity of dealing with all the issues in both domains at once. One way of controlling the
complexity is to think of systems management as a series of services that need to be delivered.

Elements of SOM
The domains of systems management can be viewed as services provided to users, applications,
and the organization as a whole. These services are managed within an umbrella framework that
is both modular and open. Some of the most important are:
• Service level management
• Financial management for IT services
• Capacity management
• Change management
• Availability management
• IT service continuity management
• Application management
• Software and hardware asset management
At first glance, these domains seem unrelated—such as financial management and change
management—but they are all required for effective systems management and therefore must be
included in any framework that purports to support the full breadth of demands in systems
management. The details of these domains are beyond the scope of this chapter; instead, this
chapter will examine the defining characteristics of a service-oriented architecture:
• Unified management framework
• Modular services
• Open architecture

The details of how these services are managed are addressed in Chapters 4 through 8.

21
Chapter 1

Unified Management Framework


A unified management framework is a common set of tools and services that support a variety of
systems management operations. The centralized configuration management database is the best
example of a constituent component of the unified management framework. Others include
communication protocols, reporting systems, and client interfaces. As Figure 1.4 showed earlier
aspects of different services are available from a single Web interface when a unified platform is
used. Within the unified management framework, a series of modular services are available.

Modular Services
It is important to treat domains within systems management as distinct areas with their own set
of requirements. For example, change management requires information about the state of
software and hardware configurations throughout the enterprise. Before a port is closed on a
firewall, a network administrator needs to know whether an application is using that port. This
type of information is not required to manage the financial aspects of systems management and
should be isolated from financial functions. At the same time, however, some of the
configuration information has a definite impact on financial matters. For example, knowing the
number and versions of OSs running within the organization is essential to managing licensing
costs.
In addition to isolating information complexity, a modularized approach to systems management
enables the framework to incorporate new services and management models as needed. A
midsized company might not need capacity planning services initially, but as the company grows
and the complexity of the IT infrastructure increases, manual methods for capacity planning may
no longer be efficient or sufficient. For this reason, it is critical that a SOM framework be open.

Open Architecture
An open architecture is one that uses common, well-known protocols that are not proprietary to
any one vendor or organization. In the world of systems management, an open architecture lends
itself to incorporating multiple modules from a single vendor as well as leveraging services
available from third parties. For example, a router vendor may provide data on router
performance through the Simple Network Management Protocol (SNMP), which is collected in
the centralized configuration management database, then integrated with other data collected
from other network devices used in management reports generated by the systems management
reporting module. By combining the benefits of a unified management framework, modular
services, and an open architecture, organizations can realize the benefits of service-oriented
systems management.

22
Chapter 1

Benefits of Service-Oriented Systems Management


There are several benefits of service-oriented systems management beginning with improved
understanding of the state of IT infrastructure and better control of that infrastructure. From
there, organizations can realize improved cost effectiveness and improved QoS by standardizing
infrastructure and procedures. With a centralized configuration management system in place,
there are opportunities for automating manual processes—another area of potential cost savings.
The benefits are not limited to just operation efficiencies.
Better systems management and standardized infrastructure and procedures lend themselves to
improved security. When systems administrators can automate routine tasks—such as checking
audit logs, distributing security patches, detecting which laptops are not running personal
firewalls, and other mundane and time-consuming but essential, tasks—they have more time to
spend on high-level security issues, such as risk management and compliance. As Figure 1.6
shows, the basic benefits enable organizations to meet higher-level needs as well as day-to-day
operational requirements.

Figure 1.6: Service oriented systems management enables organizations to meet both operational and
strategic objectives.

23
Chapter 1

Summary
Any organization with IT systems practices some form of systems management. How well they
do so varies. The goal of systems management ultimately is to meet the strategic objectives of
the organization, which include aligning with business operations, preserving the integrity of
systems and information, and adapting to the changing needs of users. Systems management is a
broad discipline with many domains; some of the domains are similar, some are less so.
Underlying the entire practice, though, is a common set of information, processes, and
procedures that are best managed as a unified whole. At the same time, the complexity of
systems management requires a modularized approach to enable cost-effective and manageable
solutions.
SOM builds on the best practices of systems management, information security, governance, and
related areas. The remaining chapters of this guide will describe in detail the elements of these
best practices, the tools needed to implement the best practices, and the organizational direction
and policies needed to realize the benefits of SOM.

24
Chapter 2

Chapter 2: Core Processes in Systems Management


Systems management is a multifaceted practice. The responsibilities of this domain range from
ensuring servers are up and running to planning for future growth, which requires meeting the
needs of business within the constraints of IT budgets and resources. This chapter examines the
core processes entailed in enterprise systems management including:
• Aligning business objectives and IT
• Planning and risk management
• Business continuity and operational integrity
• Security and compliance
• Capacity planning
• Asset management
• Service delivery
These areas do, of course, overlap. For example, one cannot align IT operations with business
objectives without planning for growth and potential risks. At the same time, these processes can
be treated as distinct because best practices have emerged for each of these processes. In fact,
much of this guide is devoted to elucidating the fundamental elements of these processes and
describing the best practices that provide for effective and efficient implementation of those
processes.
The best place to start a discussion about systems management is with its reason for being:
leveraging IT to support business or organizational objectives.

Throughout this guide the words “business” and “organization” are both used to describe enterprises
that implement systems management practices. Even when the word business is used, the
discussion can equally apply to government departments, agencies, and non-profit organizations.

Aligning Business Objective and IT Operations


IT is a means to an end for most organizations. IT is employed to increase productivity, improve
communications, increase the reliability and reach of services, advance quality, and a host of
other objectives. These objectives are what prompt businesses to deploy the collections of
servers, desktops, mobile devices, and specialized network equipment that make up a
contemporary IT infrastructure.

25
Chapter 2

Ad Hoc Growth of IT Infrastructure


A common problem arises as organizations grow and shift their business focus: the IT
infrastructure does not always change with the change in business objectives. Consider a simple
case. A medical device manufacturer begins in business building a limited range of specialized
products. The salient characteristics of the company are:
• It has a small sales force and each sales person tracks their leads with a contact
management program installed locally on their laptops. Because the sales force is
assigned to different markets, there is no overlap between them and no need to sales share
information.
• A central office manages order fulfillment, inventory, accounts payable, and accounts
receivable using a small and midsized business financials package installed on a local
area (LAN) network server.
• An email server is hosted in-house on the LAN.
• The operations manager for the manufacturing process has installed and configured a
database system to track production operations and track information needed to remain in
compliance with government regulations.
• A Web site with basic company and product information is maintained by a local Web
hosting company.
This infrastructure illustrates a typical small business IT scenario and it may work well for many
businesses—at least until they start to grow. This type of organization can confront problems
with:
• Management reporting—How will sales managers generate consolidated leads reports
when their sales staff use standalone databases that are not integrated?
• Information sharing—Is data from the operations database re-keyed into the financials
package to complete orders?
• System maintenance and trouble shooting—Who is responsible for fixing problems with
the operational database?
• Systems administration—What are security policies regarding email use and antivirus
scanning?
• Leveraging IT to expand business instead of simply reacting to immediate needs—How
can a Web site be updated to offer online ordering and technical support for customers?
Often, hardware and applications will be procured and deployed to address a narrow problem. If
the email server runs out of storage, buy more disk space. If the sales staff need the latest price
sheets, export a list for the financial packages to a spreadsheet and email it to all the sales staff.
These kinds of ad hoc solutions work in the short term but create an environment that is highly
brittle and difficult to maintain. The problem is not that the IT staff is not providing a solution
but that they are providing a solution to a series of small problems rather than providing a
solution to one over-arching problem: IT operations are not aligned with business objectives.

26
Chapter 2

Managing IT to the Big Picture


The quality of any decision is a product of the quality of the information used to make that
decision. If an IT manager is asked to set up a Web site and is provided with the content for that
site, the manager might deploy in a way that meets only those requirements. However, if the
manager were aware of a competitor who has an online ordering system, the manager can plan
for a Web site that includes applications as well as content. This simple example highlights this
idea for a small organization—think of the complexity of this idea applied within large
enterprises.
In organizations with hundreds and thousands of employees, multiple divisions and departments,
a diverse range of product offerings and a geographically dispersed staff and customers, the need
to align IT and business objectives is even more pressing. To effectively deploy IT in large
organizations, IT managers must have a complete understanding of the current infrastructure,
technical limits of existing systems, unmet business requirements, and business plans for future
operations. IT cannot be a department that is kept out of the information loop when planning
strategic initiatives. IT provides services to other departments and lines of business and must
have adequate information to plan for change.

Planning and Risk Management in IT


Planning is a central function of IT. Many of us think of planning as it relates to new
acquisitions: new servers, additional applications, extra disk storage, and so on. This is certainly
a major part of the planning process, but it is not the whole picture. Planning of this nature works
under the assumption that other parts of the IT operation and the business are functioning as they
should, but such is not always the case. Business disruptions can result from many causes and
planning for those events, known as risk management, is a core IT process.

Basics of IT Planning
Once an IT department understands the objectives of the enterprise and has aligned the strategic
plan of IT with those of lines of business, the planning phase can begin. The planning process
entails a number of areas, including:
• Technical architecture
• Organizational structure
• Budget and staff management
• Communications
Of these, the technical architecture is the one most often addressed in IT planning.

27
Chapter 2

Planning Technical Architecture


The technical architecture of an IT organization encompasses the hardware, software, data
models, and network platforms that comprise the infrastructure. To successfully develop a
technical architecture, one should begin with an enterprise data model.

Enterprise Data Models


Enterprise data models are logical descriptions of the data that is used to conduct the business of
an organization. This model can include customer information, order data, employee records,
sales information, performance measures, and other pieces of information that describe the state
of operations. An enterprise data model should include entities and descriptions of processes that
correlate with entities and processes in strategic plans. For example, if an objective of the
strategic plan is to increase market share in a particular region, the enterprise model should
include products, sales volumes, time periods, channel partners, distribution channels, and other
entities involved in meeting the objective.
The data model will also have to include descriptions of how information flows throughout the
organization. Continuing with the same example, the data model would need to describe where
data originates (for example, in an order entry system), when and where it flows to other systems
(for example, to an operational reporting system or data warehouse), how it is backed up and
archived, along with any other changes to it or uses for it during the data’s life cycle.

Perhaps one of the most complex data reference models is the U.S. Federal Enterprise Architecture
Data Reference Model designed for cross-agency information sharing and analysis. For more details,
see http://xml.coverpages.org/ni2005-12-28-a.html.

The other part of planning technical architecture focuses on the systems that manipulate
enterprise data.

Hardware, Software, and Network Components


Today, most IT departments deploy distributed systems based on open standards. This is a great
advantage for IT managers as well as designers and developers, as they are no longer restricted
to a single vendor’s hardware platform, operating system (OS), or application offerings. It is
common to mix mainframe hardware from IBM with Sun Microsystems and HP UNIX servers
as well as Dell servers running a variety of Windows and Linux OSs. Applications may be
custom built or purchased from ERP vendors, such as SAP and Oracle, or a plethora of other
software vendors that provide specialized applications.
Key enablers of this flexibility in mixing-and-matching components include:
• The widespread adoption of Internet protocols, such as IP, TCP, UDP, HTTP, and others
• The general use of just two types of OSs: the UNIX/Linux family and the Windows
family
• The emergence of distributed applications based on Java 2 Platform, Enterprise Edition
(J2EE) and .NET frameworks
• The use of standard communication protocols and data exchange formats based on XML

28
Chapter 2

The OASIS organization coordinates a large number of XML standards in a wide range of areas, from
e-government and financial services to printing and plumbing. For more information see
http://www.oasis-open.org/home/index.php.

Planning enterprise architecture is no longer a matter of committing to a particular vendor’s


product line, it is more a process of adopting one or more frameworks for organizing a collection
of hardware and software components.

Organizational Structure
Planning around organizational structure is about answering questions related to who is
responsible for parts of IT infrastructure and services. Common assignments include:
• Help desk support
• Network management
• Server and storage management
• Training
• Security and compliance
• Application and database administration
• Auditing
One goal of organizational structure planning is to ensure that all critical functions are identified
and clearly assigned to a business unit. This does not necessarily mean there is a department or
group within IT dedicated solely to a single task, but that all tasks are covered. For example,
auditing may be assigned to the same group as security and compliance, while Help desk support
and training are managed by the same staff.
Another goal of organizational planning is operational efficiency. For example, it is far more cost
effective if a single group evaluates anti-malware systems and selects applications best suited to
the organization than if every department purchases their own antivirus software. Similarly,
allowing disparate lines of business to install different database systems will increase
development and support costs as well as introduce application dependencies that can drive up
cost long after the initial purchase.
Clearly demarcating lines of authority and responsibility is essential for efficient and effective IT
resource management. With an overall organizational structure in place, the next step is to
address budgeting and staffing.

29
Chapter 2

Budget and Staff Management


Budgeting and staff management is one of the most difficult areas of system management to
discuss in general terms. Obviously, resources should be allocated according to availability and
priority of objectives served, but finding the ideal balance is rarely easy. Consider the following
issues in the planning process.
First, budgets will be allocated to staff and to tools (or labor and capital in economist’s parlance).
From a cost management perspective, IT managers do not want to have a 1:1 increase in staff as
the size of their infrastructure grows. Enabling staff to manage larger infrastructures requires
appropriate tools to:
• Monitor hardware and software systems
• Resolve issues remotely
• Apply patches and upgrades systematically from a central source
• Receive and manage support requests effectively
• Collect data on current uses and capacities to aid in other planning operations
A second consideration is cross training and rotation of duties. This serves two purposes: it
enables backup staff to take over in the event a primary support person is unable to meet
demands, and it helps reduce the likely success of an internal security breach, such as fraud.

Communications
Communications across lines of business and operational units is sometimes difficult. Each part
of the organization has its own priorities and they are not always in sync. What is important to
one department is a marginal issue to another. At the same time, vertical communications up and
down the organizational structure is an important aspect of keeping IT operations aligned with
business objectives. By formally planning and implementing a communication plan, IT systems
managers can keep executives informed of the status of operations and projects and keep lines of
businesses appraised of service changes, development backlogs, and dependencies on systems
that can impact their performance.
Communications across the organization must include more than technical details, project plans,
and delivery schedules. Understanding and planning for risks is major factor in IT planning.

30
Chapter 2

Risk Management in IT
Risk management is the process of identifying and assessing potential loss to an organization.
This process includes three main steps:
• Prioritizing business objectives
• Assessing risks and impact
• Mitigating risks
Together, these provide the means to identify risks as well as options for dealing with them.

Prioritizing Business Objectives


The first step in risk management is prioritizing business objectives. This has nothing to do with
hardware, software, or network infrastructure. The driving question here is, “What are the core
operations of the business, and what is the order of priority for protecting those?” An example
prioritized list might include:
• Ability of customers to place orders
• Ability of customers to check order status
• Order fulfillment
• Customer support
• Operational reporting
• Management reporting
A prioritized list such as this focuses attention on the most important operations. The next
question, what threatens these operations?

Assessing Risks and Impacts


Risks to IT operations come from a number of sources and include damage to physical
infrastructure, operator error, hardware and software failure, and security breaches by hackers or
malware. Physical infrastructure can be damaged by fire, flood, earthquakes, hurricanes, and
other natural disasters. They can also be damaged by failures of other systems; for example, a
spike in the electric grid that overloads power conditioners and surge suppressers can do serious
harm. The impact of physical risk can range from moderate to total loss.

31
Chapter 2

Operator errors are less likely to cause the loss of physical assets but more likely to result in the
loss of information. For example, an operator might accidentally overwrite a backup tape that
contains necessary data, or a data entry clerk might accidentally delete records from a transaction
processing system. In the first case, the information may be permanently lost or recoverable from
other backup tapes. In the case of a data entry error, the lost data might be recovered from
database redo logs if caught before changes are committed or from backups in other cases. The
recovery methods range from quick and inexpensive to slow and costly procedures.
Hardware and software failures as well as security breaches can range from annoyances to
significant disruptions. When assessing the impact of these types of failures, one should address
both the direct consequences—for example, an order entry system is down—as well as
dependencies, such as the data warehouse cannot be updated and management reports cannot be
generated because of the delay in getting operational data. Clearly, the range of impacts is broad;
mitigation strategies should be selected based on that range.

Mitigating Risks
Risk mitigation is a balancing act. Formally speaking, risk mitigation strategies should not cost
more than the value of the lost resource multiplied by the probability that loss will occur.
Unfortunately, quantifiable measures are only available for a small set of risks. For example,
hardware manufactures can cite mean time between failure statistics about a device, but there are
not good statistics on the mean time between significant bugs in an ERP system, or the
likelihood of Denial of Service (DoS) attack, or the chances an operator will accidentally corrupt
a backup script that then fails to execute the backups properly. Often, risk mitigation strategies
are based on best guesses and past experience.
Risk mitigation strategies, therefore, tend to fall into general approaches that address a number of
different risks. Typical examples include:
• Multiple, overlapping backups of critical data
• Failover servers in the event of a hardware failure
• Off-site storage of backups and alternative servers in case of physical damage
• Preventive measures, such as firewalls, intrusion prevention systems (IPSs), and content-
filtering applications to prevent breaches and the introduction of malware
• Application user interfaces (UIs) designed to prevent accidental destruction of data
• Database integrity constraints to prevent accidental loss of information—for example,
deleting a customer record when the customer has open orders in the database
Understanding the types of risks that confront IT operations is a fundamental part of the planning
process. It is also closely related to another core IT process: business continuity planning.

32
Chapter 2

Business Continuity
The goal of business continuity planning and management is to minimize the chance of a
business disruption. The risks outlined can lead to an outage of business service. Although the
risk management planning process tries to minimize the chance of these risks actually disrupting
operations, business continuity addresses what to do when those risks are realized.
Business continuity planning creates policies and procedures that dictate what to do in the event
of a business disruption. These plans leverage the resources put in place as part of the risk
mitigation strategy. For example, offsite backups can be restored to a backup server at a remote
site in the event the primary site is destroyed by fire. To be effective, these plans must be:
• Detailed, application managers and network administrators should not have to think of
undocumented but necessary steps to restore operation (for example, updating a DNS
record to point to a backup instead of a primary server)
• Tested to ensure the procedures accomplish the proscribed goals
• Rehearsed so that staff are not executing these procedures for the first time during a
disruptive event
Business continuity is not an isolated set of tasks that are done at one time, documented, and put
on the shelf until the next audit. They are tightly linked to the risk management aspects of IT
planning as well as to the security operations of an IT organization.

Maintaining Security and Ensuring Compliance


The fundamental objective of information security is to ensure confidentiality and integrity of
information while maintaining the availability of the systems and applications that manage and
process that information. Complying with government regulations is largely a matter of meeting
the first two objectives—confidentiality and integrity of information.
These objectives are defined as follows:
• Integrity is a property of information that ensures that information is not changed, either
intentionally or unintentionally by unauthorized users, or unintentionally by authorized
users. In addition, different copies of the same information are consistent.
• Confidentiality is a property of information that ensures that information is not disclosed
to unauthorized users.
• Availability is the property of systems that ensures that they are functioning and
accessible to users according to Quality of Service (QoS) requirements established for the
system.
As the goals of the two are so close, with some variation in specific requirements, it is often
helpful to consider security and compliance together. Consider some of the regulations targeted
to maintaining individuals’ privacy and others designed to ensure integrity in public reporting.

33
Chapter 2

Compliance
Security

Integrity

Availability

Confidentiality

Figure 2.1: Information security and compliance share the common goals of information integrity and
confidentiality.

Regulations and Compliance


Regulations related to privacy and information integrity have been established by a wide range of
governing bodies. Many of these are formal regulations with the force of law; others are
established frameworks within an industry or discipline that are largely accepted and expected to
be followed by members of the industry.
Regardless of the source of the regulation or framework, they all require a governance process
within IT to ensure full compliance. This is a significant challenge, but fortunately a widely
recognized set of best practices, known as Control Objectives for Information and Related
Technology (COBIT), provides a set of controls, activities, and measures for governing IT
operations. When supported with comprehensive information about the operational state of an IT
infrastructure, such as provided by a configuration management database, COBIT is an effective
means for maintaining compliance.

COBIT is discussed in detail in Chapter 3.

Privacy and Confidentiality


In the case of government regulations, there are many regulations addressing privacy and
confidentiality of information:
• Health Insurance Portability and Accountability Act (HIPAA), which addresses protected
healthcare information
• California S.B. 1386, which defines procedures that must be followed if personally
identifying information about California residents is compromised
• European Privacy Directives (95/46/EC and 2002/58/EC), which define minimum
standards for protecting private information about EU citizens
• Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA) and
the Australian Federal Privacy Act, which protects citizens of those countries

34
Chapter 2

It is worth noting that the United States has adopted a decentralized approach to privacy,
protecting, for example, healthcare information at the federal level while leaving general privacy
regulations to states. Unlike the U.S., many other countries, including the European Union
members, Australia, and Canada have adopted comprehensive privacy legislation at the national
and transnational levels.

Information Integrity
Maintaining the integrity of business and government information is essential to maintaining the
trust of markets, constituents, and others outside those organizations. This reality became
abundantly clear with the fiscal reporting scandals that occurred at Enron, WorldCom, Tyco, and
other large businesses just a few short years ago.
In response to the growing awareness of the importance of maintaining the integrity of publicly
reported information, governments passed a number of regulations to minimize the chance of any
further corporate accounting debacles. The most well-known legislation is probably the
Sarbanes-Oxley Act (SOX), which defines responsibilities for maintaining and reporting
accurate information on publicly traded companies in the United States.
In addition to SOX, some less well-known integrity measures include:
• Computer Fraud and Abuse Act
• Electronic Signatures in Global and National Commerce Act
• Gramm-Leach-Bliley Act
Like privacy protections, the movement to preserve accurate business reporting is a transnational
undertaking. For example, the Bank for International Settlements established the Basel II
standards to ensure that banks accurately report risks associated with their investments.
Information integrity regulations have also targeted other industries. The U.S. Food and Drug
Administration (FDA), for example, has established policies governing the recording, reporting,
and storing of information related to the production of pharmaceutical products in the 21 CFR
Part 11 regulations.

For more information about compliance from an IT perspective, see the IT Compliance Institute at
http://www.itcinstitute.com/.

With so many regulations, it is easy to become overwhelmed. Fortunately, many of these


regulations are seeking the same objective: preserve the integrity of information that others
depend on and protect personal privacy. The steps that are needed to do so are encompassed by
practices in place to ensure a secure information infrastructure. Thus, if one has a well-managed
IT environment, it is probably secure and close to, if not already, in compliance with a number of
regulations.

35
Chapter 2

Information Security
Of all the areas that comprise systems management, information security is the largest and most
difficult. It is the most difficult because there are adversaries who are trying to compromise
security measures. It is the largest because there are so many areas that have to be addressed;
virtually every aspect of IT is touched by security issues or play a role in security maintenance.
The areas of information security most closely associated with systems management include:
• Threat assessment
• Vulnerability management
• Managing countermeasures
• Auditing
• Incident response
• Change control
• Information security management
These domains within information management require individual planning and management yet
depend on each other to be effective.

Threat Assessment
Threats to IT seem ubiquitous since the widespread adoption of the Internet. A threat is a person,
program, or process that can compromise the confidentiality, integrity, or availability of
information or systems.

Threats should not be confused with vulnerabilities, which are weaknesses, deficiencies, or errors in
applications, OSs, network devices, or procedures that can be exploited by a threat. Vulnerabilities
are addressed in the next section.

Threat assessment is the practice of determining who and what can damage an IT system. Of
course, a human is ultimately responsible for all threats, but direct actions carried out by a hacker
trying to break into a system require different responses than a malware writer who unleashes a
virus to delete randomly selected files from victim’s hard drives. For this reason, it is useful to
think in terms of categories of threats, such as:
• Information theft, a threat to confidentiality
• Information tampering, a threat to integrity
• DoS attacks, a threat to availability
• Viruses, worms, and other malware, potential threats to confidentiality, integrity, and
availability
• Spam, a threat to availability
• Phishing attacks, a threat to confidentiality
• Spyware and other potentially unwanted programs (PUPs), a threat to confidentiality and
availability

36
Chapter 2

With an understanding of the broad category of threats, the next step is to understand how these
threats are executed. For example, information theft can occur when a hacker compromises a
database server and steals credit card information; it can also occur when a disgruntled employee
uses legitimate access rights to collect data for unauthorized purposes. In the case of malware,
virus can be downloaded along with email through an organization’s email server; it can also
occur when a laptop user browses a compromised Web site from a poorly secured network at
home.
Threat assessment is the practice of discovering potential threats and understanding the motives
for those threats. In general, you cannot prevent threats—they exist outside of your control. You
can, however, minimize the chances that a threat can successfully compromise your
infrastructure. This is the role of vulnerability management.

Vulnerability Management
Vulnerability management is the practice of identifying and compensating for weaknesses in
systems, applications, and procedures that can be exploited by threats to breach a system. Like
threats, there are a variety of types of vulnerabilities, including:
• Misconfigured network software that allows hackers to use those programs to gain access
to protected resources
• Errors in OS software that allows malware writers to gain elevated privileges and execute
destructive programs on a compromised host
• Poorly designed programs that do not check for proper parameters and result in a
commonly exploited condition known as a buffer overflow
• Organizational policies and procedures that do not account for the potential for attacks or
thefts from internal personnel—for example, not rotating duties of employees in critical
functions
There are several ways to combat vulnerabilities. First, keep OSs, applications, and network
software up to date with security patches. Some systems, such as Microsoft Windows, make it
relatively easy for single users or small organizations by offering tools such as Windows Update.
As the size of an enterprise increases, more sophisticated tools are required that include
centralized management and rollback capabilities. Of course, not all applications have tools for
automatically downloading patches from a vendor site. For example, updating a database
typically requires a manual download of a patch, which is then applied by a database
administrator.

Applying patches to production systems can introduce as well as resolve problems. See the section
on change management for more details and caveats.

Second, employ code reviews and software analysis tools to check for common vulnerabilities in
custom developed software. This is more within the realm of software engineering than systems
management, but systems administrators should be confident that reasonable and prudent
measures have been taken to ensure the quality and safety of any application before they deploy
it on their networks.

37
Chapter 2

Third, implement organizational policies and procedures that minimize the chance of a breach or
theft by an internal staff member. Unfortunately, these crimes are more common and serious than
you might expect. For example, a Florida man who was the controlling owner of a Internet
advertising company was recently convicted and sentenced to 8 years in federal prison for
stealing information about more than 1 billion records containing personal information, such as
names, physical addresses, and email addresses from Axciom Corporation, a personal
information repository and distributor (details at http://www.cybercrime.gov/levineSent.htm).

For more examples of internal-based breaches, see the U.S. Department of Justice Cybercrime site
at http://www.cybercrime.gov/cccases.html.

Finally, understand that vendors are not always the first to detect a vulnerability in their
software. Researchers, developers, systems managers, and others may discover and report
vulnerabilities to the public through one of the large, public repositories of system
vulnerabilities.
Tracking Vulnerabilities
A number of public databases and related tools are available to systems managers in addition to
information provided by vendors. These include:
● The National Vulnerability Database (http://nvd.nist.gov/) is a government-sponsored database of
all publicly known vulnerabilities. It contains tens of thousands of vulnerabilities as well as a
number of cybersecurity alerts cross referenced from the U.S. Computer Emergency Response
Team (CERT) from http://www.us-cert.gov/cas/techalerts/.
● The Open Source Vulnerability Database (OSVDB—http://www.osvdb.org/) project also maintains
a database of known vulnerabilities. The OSVDB includes support for exporting entries to XML
files for importing into other databases.
● The Common Vulnerability and Exposure dictionary (http://cve.mitre.org/) is a standard naming
convention for identifying vulnerabilities. It is not a separate database of vulnerabilities but a tool
for sharing information across vulnerability databases and making it easier for systems
administrators, developers, and other users to query those databases.

Once a vulnerability is found, it should be addressed by either patching the vulnerable code or
deploying a workaround. This part of vulnerability management overlaps with some of the tasks
associated with change control.

Change Control
Change control in IT is like maintaining a plane while it is in flight. Too often, systems
administrators do not have the luxury of shutting down systems and keeping them offline to
update software and hardware, test it thoroughly, and bring back users in a controlled manner.
Instead, software patches, upgrades, and software installations have to be done with minimal
disruption to operational systems.

38
Chapter 2

Information for Change Control


To effectively manage changes to software, hardware, and configurations, systems
administrators should have:
• Detailed configurations of deployed systems, including desktop, servers, and network
systems
• Information about dependencies between applications, OSs, and hardware components
• An understanding of operational patterns—for example, the frequency and duration of
large data loads, batch reports, peak usage of servers, expected growth patterns in storage
space, and so on.
• Troubleshooting history for applications, OSs, and hardware. (It is one thing to know
how a system is supposed to work; it is another to know how it actually is working.)
The diversity of this kind of information makes it difficult to track in an ad hoc manner. Ideally,
systems manager would have a centralized configuration management database that maintains
relevant details about the IT infrastructures as a basis for managing change.

Accounting for Dependencies


Although it is easy to track an inventory of devices and applications, it is much more challenging
to track dependencies between these components. Some dependencies are clear, such as when an
application states that it must run on Windows Server 2003 (WS2K3) or Red Hat Linux AS.
Other dependencies are more difficult to discern. Consider a financial software management
package that runs on an Oracle database on a UNIX server. The database requires a particular
version of a C library. This may or may not be documented, and the database administrator may
discover requirements the hard way during installation. Dependencies like this are notorious for
throwing off the best laid plans and schedules.
Other dependencies are even more difficult to work around. For example, a server may run two
different applications that both use the same database management system. However, a patch to
the database may correct a problem encountered by one of the applications but introduce a bug
that breaks the other applications. There are no good options in this case. One could continue
without the patch and tolerate the vulnerability, bug, or missing feature that the patch corrected,
or the systems administrator could run two instances of the database, one patched and one
unpatched. Although the latter option solves an immediate problem, it introduces another
application instance that must be maintained and managed.
Change management is an especially vexing challenge in systems management and will be
addressed in depth throughout this guide.

Configuration management databases are essential to efficient change management. See Chapter 4
for more information about this topic.

Although change management tools help to plan for infrastructure-level changes, auditing helps
understand what is happening within those systems now.

39
Chapter 2

Auditing for Security and Systems Management


Auditing is the process of reviewing significant events at various levels, including:
• System
• Application
• User levels
The goal of auditing is to ensure that systems perform as expected, policies are enforced, and
unusual and potentially disruptive activities are detected.

System Events
System events occur within OSs. Three of the most important types of events are access control
events, configuration change events, and performance measurements. Access control events
include:
• Successful and failed logins
• User lockouts due to multiple failed attempts
• Failed file access due to access control violations
When these events occur, the identity of the user as well as the time and device (for example, IP
address) should be tracked.
Configuration change events occur when, in the case of Windows, a registry setting is changed
or, in UNIX OSs, when configuration files are changed. The identity of the user making the
change, the old and new values of the change, and the time and the device from which the
change is made are some of the characteristics that may be tracked.
Performance measurements indicate levels of system activity. There are a wide variety of
performance measurements that may be collected, including:
• Disk I/O rates
• Page fault rates
• Percent of CPU time in different modes
• Number of files open
• Number of network connections established and connected
• Network segments received per second
These measures are specific to OS and network performance; individual applications may be
monitored as well.

40
Chapter 2

Application-Level Auditing
The type and volume of audit information tracked by applications varies widely. Some
applications will log details of startup and shutdown processes, error events, database accesses,
files opened, and other details of normal operations.
In addition, some applications support a detailed, debugging level of auditing that provide much
more detail than is normally recorded in audit logs. Debugging detail is designed to log
information about the execution path of a program, indicating which modules are executed,
conditions of key variables at the time of execution, and other details that help programmers and
support personnel identify problems. This level of detail is not normally needed for application
monitoring, only for problem resolution.

User Auditing
In some especially secure environments, it is important to have a record of user activity. This
record can include login attempts, use of various resources (including files and applications), and
programs executed. It may also record details of commands issued. For example, if someone
attempts to copy a file from a secure server to another server using ftp, the file name, the target
ftp site, and the date, time, and user identity should be recorded.
Auditing information is useful for systems management as well as for security purposes. It can
be especially useful for incident response.

Incident Response
The purpose of incident response is to limit the damage caused by a security breach. Ideally,
organizations will have incident response plans in place that dictate how IT staff and
management should respond to a security incident. Depending on the type of incident (for
example, a virus infection, a database break-in, or a DoS attack), the incident response plan
should describe the steps to mitigate the risks of damage. These steps can include:
• Removing a compromised server from the network
• Blocking traffic at a firewall
• Monitoring user activity if an unauthorized action is underway
• Notifying management
• Securing audit logs for forensic analysis
Like so many other security and systems management activities, incident response is most
effective when a comprehensive set of information is available about servers, applications, and
other devices within the IT infrastructure. An accurate and up-to-date centralized configuration
database is as important to enterprise security management as it is to operational systems
management.

41
Chapter 2

Capacity Planning and Asset Management


Capacity management deals with the problem of having enough resources to accomplish a given
task in the required amount of time. Asset management is closely related but deals more with the
details of particulars of acquisition, configuration management, and asset life cycles.

Capacity Planning
Capacity planning is one of the better examples of a systems management domain that leverages
the information and practices of other domains. To accurately gauge how much storage space,
how many CPUs, or how much bandwidth will be required to support operations at some point in
the future requires information about:
• Current loads on servers and the network, which is gathered during performance
monitoring
• Growth in application loads, which in part, is determined when aligning IT operations
with business strategy
• Dependencies between existing systems and proposed additions to infrastructure, which
uses data from change management practices
• Trends in security issues, such as the rate of growth in spam and malware targeted to the
enterprise network
Capacity planning requires a combination of looking backward for data and looking forward to
anticipated changes. It also requires a firm understanding of existing resource and their levels of
use. This is one of the elements of asset management.

Asset Management
Assets are hardware and software components that provide for particular services within the IT
infrastructure. Servers, desktops, routers, firewalls, databases, ERP applications, LDAP directory
servers, and a range of other devices and applications fall into this category. The scope of asset
management, at a minimum, includes:
• Acquiring assets
• Deploying assets
• Configuring assets
• Maintaining assets
• Retiring assets
The specific details of each of these will vary with the type of asset but some general principals
hold for all.

42
Chapter 2

Acquiring Assets
The acquisition of assets is closely tied to capacity planning. During capacity planning, when a
finding is made that additional resources are required, the acquisition process is initiated.
Requirements are defined, designs are formulated, configurations are determined, and the
necessary assets are purchased. Also during this phase, dependencies are analyzed to determine
how the introduction of the new asset will impact other parts of the infrastructure.
This process is especially important with assets that serve multiple business services. For
example, firewalls provide a core network service and could potentially affect every other
service and device on the network. A single-user desktop application, however, would have
limited impact on others in the organization and could be introduced with less thorough
planning.

Deploying and Configuring Assets


Once an asset is acquired, putting it into place may sound relatively straightforward, but that is
not necessarily the case—especially when the asset is deployed to a production environment. It
may be useful to note at this point that many organizations use development, testing, and
production environments for their software development efforts. The development environment
is used to create or configure new systems and, after passing basic unit and integration testing,
they move to the test environment for user acceptance testing. At that point, representative end
users work with the system to determine whether it works as expected and will adequately meet
their needs. Only after passing that acceptance testing is the application moved to production.
Moving an asset into production can be a challenge, especially when the time windows for such
operations are minimal or when rolling back in the case of problems is difficult. For example, if
a new version of a database is to be deployed to production, database administrators and systems
managers may have to:
• Replicate the operational database to a secondary, mirror database that will continue to
operate during the deployment steps
• Take the operational database offline and create a backup
• Install the new version of the database
• Validate the installation
• Restore the backup to the new version of the database
• Replicate changes made to the secondary database while the primary was down
• Place the primary database back online
Testing and releasing an asset into production should be a highly structured process. The release
management processes described in the IT Infrastructure Library (ITIL) defines a method for
controlling this process. As with the COBIT framework for governance, the successful
implementation of ITIL processes is dependent on adequate operational information such as that
found in a configuration management database.

For more information about ITIL, see Chapter 3.

43
Chapter 2

If testing was thorough, the configuration of the new asset should function in the production
environment, but there is always the potential for overlooking a configuration parameter or
missing a dependency, and configuration changes may be needed after an asset is deployed in
production. These steps begin to boarder on maintenance.

Maintaining and Retiring Assets


Although an asset may not change functions during the course of its use, other assets that either
depend on that device or the device depends on may change. Maintenance is a routine part of
systems management. Sometimes maintenance is driven by outside factors, such as the release of
a security patch, or by internal factors, such as the need for additional capacity or a change in
network architecture.
Even if a device does not change and its related assets do not force changes, there are still tasks
that must be attended to. License management is a prime example. Regardless of what the device
is doing, if the software or hardware is licensed from a third party, organizations must ensure that
they are in compliance with license renewals and other terms of their agreements.
A short list of aspects of asset management that systems managers must attend to includes:
• Application life cycles
• Dependency management
• User management and security
• Asset acquisition
• Asset deployment and management
• Asset decommissioning
• License management
• Leases
• Warranties
Again, the details necessary to effectively manage this diverse array of tasks are best managed in
a centralized configuration management database. The functions of systems management clearly
require a broad range of overlapping and multi-functional information. That same information is
useful for one other area of systems management: service delivery.

Service Delivery
Service delivery is the process of ensuring that functions and resource needed by the
organization are provided in a reliable and cost-effective manner. As is common in systems
management, there is some overlap with other core processes. The main components of service
delivery are:
• Service level management
• Financial management for IT services
• Capacity management
• Availability management
• IT service continuity management

44
Chapter 2

Service Level Management


Service level management ensures that business units have the amount of resources needed. For
example, the e-commerce group in a retail company will need peak network bandwidth and
server response time during the holiday shopping season. The financial management group might
need significant network bandwidth late in the night to move large data files to the data
warehouse, and the customer call center will need the lowest query response time from the
database during normal business hours. Balancing the needs of different groups and ensuring
each receives the levels and QoS required to do their jobs is the responsibility of service level
management.

Financial Management of IT Services


IT is a business within a business. It has both significant labor and capital costs that must be
managed. As IT has both operational and project-oriented work, the financial management of the
group must support both. Some of the key elements of financial management in IT include:
• Labor, recruiting, and retention
• Support service contracts
• Procurement management
• Consulting and staff augmentation
• Project management
• Capital investment
• Financial risk management
Financial management does not occur in isolation. Many of these tasks, such as project
management and risk management are closely tied to other areas of systems management.

Capacity and Availability Management


Capacity is the amount of a resource for performing and operation and meeting a need;
availability is having that resource operational when it is needed. A core operation of IT systems
management is planning for the future by understanding current usage, trends in growth, and
significant changes in business operations that will impact the need for IT resources.
Availability management ensures that resources are functional and providing the level of service
required. Clearly, this is closely related to service level management; the distinction lies in the
focus. In the case of availability management, the focus is on the minute-to-minute continuity of
basic service. For example, if Internet access is functional and the network is operational,
networking services are available; if the network latency is too long or there is not enough
bandwidth to meet the peak demands, that is an issue addressed as a service delivery problem.
Closely related to availability management is continuity management.

45
Chapter 2

IT Service Continuity Management


IT service continuity management deals with potential business disruptions. There are many
causes of business disruptions, from power failures to natural disasters. Businesses plan for such
disruptions, and one of the major elements of that planning is how to continue IT operations. The
solution entails a number of factors, including:
• Ensuring off-site facilities are available outside the geographic area affected by the
disruption
• Brining backup servers and other hardware online
• Restoring backups or ensuring replicated data is up to date at the off-site facility
• Getting key staff to the off-site facility
• Switching services, such as e-commerce servers, to the backup facility
Continuity management must also assess the disruption and determine when it is feasible and
prudent to return to the main site as well as plan for the transition back from the backup facility.

Summary
The core operations of systems management are designed to support the strategic objectives of
an organization. That is the starting point for the core services of systems management. With a
clear and well-defined alignment of business objectives, IT professionals can plan for the
capacity needs of the organization, weigh potential risks and mitigate appropriately, and insure
the continuity and operational integrity of IT operations. Systems management professional have
always had significant responsibility in the area of systems security, and those responsibilities
have expanded to support organizational efforts to remain in compliance with a host of
government regulations. Other areas of systems management attend to the needs for capacity
planning, asset management, and service delivery. As this chapter has demonstrated, the range of
systems management is broad and extends beyond the boundaries of the traditional IT
department into the business units which they serve.

46
Chapter 3

Chapter 3: Industry Standard Practices and Service-Oriented


Management
Civilizations advance by preserving, passing on, and building upon existing knowledge. If we
had not leveraged the advances of previous generations, our world would be a far different place.
In a similar fashion, although on a far less expansive scale, IT practitioners have developed,
formalized, and documented best practices in several areas related to managing IT services,
particularly in the following arenas:
• Technology management
• Governance
• Security
• Risk management
Starting with best practices saves us from “reinventing the wheel” with commonly required
procedures and methodologies. Many of us would never think of building computers from
scratch when we can buy them off the shelf, perhaps with some customization. In the same way,
we can take a number of best practices and adapt them to our organizational requirements
without reasoning from first principals to determine the best way to accommodate change, plan
for capital spending, secure the infrastructure, and a host of other tasks demanded of IT
managers and systems administrators.
This chapter will begin with a brief discussion of the need for organizational frameworks and
standards. It then provides an overview of four broadly applicable standards:
• IT Infrastructure Library (ITIL)
• Control Objectives for Information and related Technology (COBIT)
• ISO 17799 security standards
• National Institute of Standards (NIST) Guide for Technology Systems related to risk
management
The chapter concludes with a discussion on the contribution these frameworks make to the
practice of service-oriented management (SOM) practices.

47
Chapter 3

Organizing IT Operations Around SOM


The skills required to manage IT services are a mosaic of technical, business, and organizational
talents that can take years to acquire, develop, and hone. Some can be taught; some we learn the
hard way. Regardless of how one learns how to manage IT services, to successfully control and
protect complex IT operations, we have to know a few things:
• First, what factors within IT service operations need to be managed?
• Second, what controls must be in place to ensure those factors are managed?
• Third, how do you balance competing needs, such as the need for accessible yet secure
systems?
One of the first steps to getting a handle on the complexity of managing IT services is to
organize the constituent processes of IT operations into a comprehensive model. Although there
many ways to organize the constituent processes, the one that is used in this guide is SOM.
Within the SOM model, IT operations are divided into several distinct services:
• Network management
• Server and application management
• Client management
• Incident response
• Change control
• Monitoring and event management
• Asset management
• Application development
These divisions will appear familiar to many in IT because organizations often structure their IT
departments along similar lines. For example, large organizations often employ distinct groups
that manage the network, servers, and client hardware. There are also cross-functional teams to
address issues that span the organizational structure. For example, change management and
incident response will require expertise in all areas of IT, not just a single one.
In addition to following common organizational structures, the SOM model fits well with best
practice frameworks in use within IT. ITIL and COBIT address the breadth of topics covered by
SOM. Other more specialized frameworks—such as ISO 17799 security standards and the NIST
Guide for Technology Systems related to risk management—provide a focused set of best
practices for subdivisions of IT operations.
When organizing IT operations around the SOM model, it makes sense to leverage best practices
that fit with that model. The next section will examine common characteristics of several IT
management frameworks; this will be followed by discussions of the individual frameworks.

48
Chapter 3

Overview of Best Practice Frameworks


As discussed in Chapter 1, there are various approaches for determining what works and what
does not work in information management. Three approaches were identified: ad hoc systems
management, controlled systems management, and continuously improving systems
management. In theory, one could start from scratch with controlled systems management and
continuously improving systems management. (In practice, ad hoc approaches to system
management always start from scratch). A better approach is to build on what has already been
developed.

For more information about the three IT management methods, see Chapter 1.

You can take much from what others have learned if you keep in mind several principals about
the use of best practices as they apply to SOM:
• IT services have much in common
• IT services are interdependent
• IT services can and should be measured
• IT services are repeatable processes
• IT services are broadly applicable
These principals speak to management of IT services within as well as across organizations.
They are also embodied in the four frameworks described in the following sections.

Best Practice Principal 1: IT Services Have Much in Common


No matter how different your business or organization may be from others, it surely also has
much in common with them. Whether an IT group is supporting a startup professional services
business, a long-established manufacturer, or a government agency, there are IT requirements
common to all:
• The need to define, procure, and manage hardware and software
• The need to manage changes to infrastructure and applications
• The need to maintain a secure and dependable environment
• The need to plan for future needs
• The need to manage the financial aspects, including risks, of IT operations
None of these requirements change depending on the type of operating system (OS) run on your
servers, the volume of data pushed through your network, or the kinds of applications used by
your employees.

49
Chapter 3

Best Practice Principal 2: IT Services Are Interdependent


Large IT organizations are often divided into specialized groups: one group supports network
operations, another manages database administration, a third group supports desktop
applications, and still another group is responsible for software development. This division of
labor is essential to having the depth of knowledge required to master the different disciplines
within IT.
For example, a network administrator might spend a fair amount of time learning a network
monitoring tool that allows the administrator to analyze traffic at the packet level. A database
administrator is the person that understands the details of database listeners that support
communications between database instances and client applications. A desktop support specialist
may be the first point of contact for an employee with a problematic application. So, when a user
gets a message that his or her applications cannot connect to the database, who is responsible?
They all are.
The desktop support specialist can probably quickly isolate the problem as having to do with the
application configuration, a network problem, or an issue with the database server. If the
configuration files and registry settings appear correct and basic connectivity with the database
server is available, it may be time to call the database administrator.
The database administrator, in turn, can verify whether the necessary database processes are
running, the user has proper authorization on the database application, and the proper protocols
are configured on both the client and the server. If the database seems to be functioning properly
and the client still cannot connect, it is time to dig deeper and bring in the network manager.
At this point, the network manager might want to monitor traffic between the client and the
database. Are all the protocols that should be running actually in use? Are there any problems
with firewalls or routers between the client and the server? Together, an application support
specialist, a database administrator, and a network manager can diagnose problems that span
multiple domains more effectively than if they worked in isolation. Coordinating among
different areas is a crucial factor in the successful delivery of IT services.

Best Practice Principal 3: Measure IT Services


Regardless of the type of IT service being provided—whether it is network bandwidth or
financial and project management services—the service can and should be measured.
Measurements have several characteristics:
• Key performance indicators (KPIs)
• Baselines
• Trends

50
Chapter 3

KPIs
KPIs are events or attributes that are measurable and correspond to the level of service delivered.
There are several types of KPIs with varying characteristics:
• Technical
• Financial
• Organizational
Best practices for a particular area might include more of some of these than others, but the most
comprehensive best practices address all the main types.

Technical KPIs
Some KPIs are easily identified, especially technical ones, such as megabytes of data transmitted
over a network segment in a given period of time, the latency on a network, the storage utilized
on a disk array, and the percent of available CPU time utilized for application processing. By
their very nature, technical KPIs are easily quantified. They are also easily gathered, relatively
speaking. Applications, OSs, and dedicated appliances can generate large amounts of data about
performance and capacity.
The ease with which data on technical measures is generated is both an advantage and a
disadvantage; information overload is a constant problem when managing with technical
elements of IT services. Thus, the goal of measuring IT services is not to measure all services or
every dimension of an operation but to focus on a small number of key measures that are
indicative of the overall performance of the service.
As Figure 3.1 shows, even simple operations, such as measuring CPU and disk activity, can
generate too much data to allow for quick assessments of the state of an operation. KPIs for
server performance might include:
• Percent of non-idle CPU time
• Disk reads and writes per second
• Total bytes received and sent per second from a network interface
• Number of page faults per second
This set of measurements provides one measure per major functional area of a server (CPU, disk,
network, and memory) and can be monitored nearly continuously or polled at longer intervals
with the data aggregated to provide a performance measure for a specific period of time.

51
Chapter 3

Figure 3.1: Information overload is a common problem when measuring technical performance.

As important as technical measures are, they do not provide a complete picture of the state of IT
operations. Financial measures are another critical component of IT operations management.

Financial KPIs
Financial KPIs allow managers to assess the value of specific IT operations and services relative
to their costs. Unlike technical measures, financial measures do not tend to lend themselves to
the massive amount of data found with machine-generated measures.
Financial measures tend to focus on the cost of labor and equipment, the return on investment
(ROI) of proposed purchases, and financial management issues, such as budgeting and cash flow.
These tasks are well understood and documented elsewhere; the focus here is on topics that are
too often overlooked or under-addressed in textbook discussions of IT management.

For information about other aspects of IT financial management, see resources such as
ComputerWorld’s IT Management Knowledge Center at
http://www.computerworld.com/managementtopics/management, and CIO Magazine’s CIO Resource
Center at http://www.cio.com/leadership/itvalue/.

52
Chapter 3

When formulating financial measures, be sure to understand the scope of the measure. For
example, the “cost” of a server may be stated as $20,000, when in fact that is the cost to purchase
the server from the vendor. The full cost of introducing that server into the organization would
have to include at least the vendor invoice amount, plus:
• Labor costs to install and configure the server and its OS
• Staff time dedicated to change management operations, including plan review for the
server
• Information security staff time spent locking down the server and auditing it as needed
• Compliance management staff time spent understanding implications of the use of the
server—for example, will confidential financial information be stored on the server?
• Network services support time spent updating routers, firewall, intrusion prevention
systems (IPSs), and other services that must be aware of the presence of new devices
• Server support staff time required to add the server to the backup and disaster recovery
process
• Application support time required to install and configure packaged or custom
applications running on the server
• Additional software licenses incurred because of the new server
Accurate financial measures are often difficult to formulate and, in reality, we often settle for
estimates. In addition to understanding the breadth of costs related to IT, it is important to avoid
unintentionally equivocating about the meaning of terms.
Related to identifying the scope of terms appropriately, you also must use terms precisely. Too
often within an organization, a single term will take on multiple meanings, depending on the
context. For example, to the sales department, the cost of goods sold may include the price paid
for a good, shipping costs, and storage and inventory management costs; the finance department
may include all those factors as well as the sales commission paid to the salesperson that made
the sale. It is not the case that one group is wrong and another is right. The problem lies in
multiple uses of the same term. Using multiple terms, such as pre-sales cost of goods sold and
post-sales costs of good sold can help avoid this confusion. As difficult as financial measures are
to formulate precisely, they are not as challenging as organizational KPIs.

53
Chapter 3

Organizational KPIs
Organizational KPIs are soft measures; they do not have obvious quantifiable aspects, as
technical and financial measures do. Technical measures are relatively easy to grasp. The
problem with them tends to be too much information. In the case of financial measures, you must
define terms precisely and with appropriate scope to accurately reflect the costs and benefits of
investments. Just defining organizational KPIs is difficult. Some of the areas that are included in
organizational KPIs are:
• Training level of staff
• Ability to incorporate emerging technologies into existing infrastructure
• Ability to execute new organizational models, such as partnering and outsourcing
• Ability of IT to meet needs and expectations of business units
• Level of overall compliance with government regulations
Although difficult to quantify, organizational measures reflect the ability of an organization to
execute strategies and perform operations.
Another aspect of these different types of KPIs is that they are not independent of each other.
The ability to effectively provide key technical services depends upon the ability to fund the staff
and equipment needed; having a well-trained staff that understands change management
procedures and executes them appropriately is an organizational KPI that has direct impact on
technical operations.

Figure 3.2: The three types of factors that are measured by KPIs interact and influence each other.

Measurement is a key process in SOM, particularly in the frameworks and best practices that
support it. Another characteristic of these best practices is the ability to leverage repeatable
processes.

54
Chapter 3

Best Practice Principal 4: Utilize Repeatable Process


Processes, as an object of organizational study, have received quite a bit of press in the last
decade. “Business process reengineering” was one of the techniques advocated by leaders in
organizational management and corporate strategy as a way to shift corporations away from
internally directed behaviors to more customer-focused activities that leverage information
technologies. Reengineering entered the IT lexicon primarily as a process that other parts of the
organization, especially front-line business units, had to understand, implement, and manage. IT
was playing a supporting role in the beginning, but that has changed.

For more information about organizational reengineering, see Michael Porter’s Competitive
Advantage: Creating and Sustaining Superior Performance (New York: The Free Press, 1985), Peter
Drucker’s “The Coming of the New Organization” (Harvard Business Review, Jan-Feb. 1988), and M.
Hammer and S.A. Stanton’s The Reengineering Revolution: A Handbook (New York, Harper
Business, 1995).

Process reengineering has had its counterpart in IT with the widespread adoption of standard
process management policies and procedures. The goal is typically to improve consistency and
quality of services while controlling costs. Many of the frameworks described in this chapter
emphasize specific processes, including service level management, change management, disaster
recovery, capacity planning, security management, and a host of other essential IT services.
The focus on processes within IT has been driven by several advantages provided by their
adoption:
• Ability to deliver consistent and predictable performance—For example, with simple
tasks such as adding users access rights to an application to more complex processes,
such as incident response
• Ability to measure performance and compare results—With consistent, repeatable
processes, KPIs can be identified and measured
• Ability to improve procedures—Again, with consistent procedures, organizations can
measure performance, analyze performance data, and identify weak areas in those
processes
• Ability to justify budgetary needs—With hard numbers on system capacity and trends in
growth of users and applications, IT managers can more effectively defend their requests
for appropriate funding
Processes are common to virtually all IT operations, so it is not surprising to find them
prominently in best practice frameworks, especially those so closely associated with SOM
practices. This fact highlights another aspect of these frameworks—that is, they leverage broadly
applicable models across industries.

55
Chapter 3

Best Practice Principal 5: Leverage Broadly Applicable Models


The practices of SOM and related frameworks are not specific to any one industry. Certainly,
some areas of a framework may receive more emphasis in some industries than others.
For example, the role of business continuity will have a high priority in financial services and
healthcare operations; consulting businesses, although concerned with business continuity, are
less prone to centralized business disruptions because of the distributed nature of their
operations. Similarly, government agencies managing sensitive information will implement
security measures most of us would consider cumbersome and unnecessary for routine business
operations.
SOM is not an industry-specific model but a general model for understanding and managing core
IT services. It also recognizes the shared goals found in many IT organizations:
• Improved visibility and control of operations
• A standardized IT infrastructure and supporting services
• Improved automation and quality control
• Improved security
• Attaining and remaining in compliance with government regulations
SOM recognizes that the common goals of IT operations and best practice frameworks provide
guidance on policies, procedures, and processes needed to attain those goals.

Frameworks and SOM


Although SOM defines what should be done in IT management, the best practice frameworks
described in this chapter provide information about how to effectively implement a SOM
approach. As Table 3.1 shows, each of the best practice frameworks has much to offer in the way
of SOM guidance.
Although there is much overlap—for example, all the frameworks have something to say about
network management—some frameworks provide more detail than others: COBIT addresses
incident response, but ISO 17799, with its focus on security, has much more to say about this
critical area of information security management.

56
Chapter 3

Service Oriented Management Best Practice


Area Framework
ITIL COBIT ISO 17799 NIST Guide
for
Technology
Systems
Network management X X X X
Server and application X X X
management
Client management X X X
Incident response X X
Change control X X
Monitoring and event X X X X
management
Asset management X X X X
Application development X X X

Table 3.1: Best practice frameworks address multiple areas of SOM.

The following section will examine the particulars of each of these best practice frameworks.

Best Practice Frameworks and SOM


There are many best practice guidelines and frameworks within IT. Many are focused on narrow
aspects of systems management, application development, or a particular type of operation.
Although useful in some situations, those frameworks will not be addressed in this section,
which focuses on broadly applicable guidelines:
• ITIL
• COBIT
• ISO 17799 security standards
• NIST Guide for Technology Systems related to risk management
ITIL is particularly relevant to technology management. COBIT is as well but also addresses
many essential aspects of governance and compliance. ISO 17799 is more narrowly focused on
security, but that topic is so wide-ranging that it warrants inclusion in this set of broad
frameworks. Finally, the NIST Guide for Technology Systems addresses risk management,
which, like security, spans all other IT management operations.

57
Chapter 3

Technology Management and ITIL


ITIL was defined under the auspices of the Office of Government Commerce within the British
government. The practices defined within ITIL have been codified within the ISO standard, ISO
20000. ITIL defines several disciplines within IT management:
• Service delivery
• Service support
• Planning to implement service management
• Security management
• Infrastructure management
• Business perspective
• Application management
Two disciplines, service delivery and service support, are perhaps the most widely used of the
disciplines.

ITIL is an open standard, so it can be freely adopted by organizations. The content of the ITIL
references is copyright protected, however. To purchase ITIL framework books, see
https://securewsch01.websitecomplete.com/itilsurvival/shop/showDept.asp?dept=17. Community
support is available at http://www.15000.net/.

Service Delivery within ITIL


Service delivery topics within ITIL focus on elements of IT processes that are needed to ensure
services are available and meet the needs of IT customers. Service level agreements (SLAs) play
a central role in service delivery. They constitute the set of requirements that IT must meet.
SLAs depend upon adequate measurements (discussed earlier) to determine whether agreements
are met. These measurements are also used to assess the consequences of changes in the IT
environment to the quality and level of service. For example, if the network services group has
an SLA to provide a set level of network bandwidth availability to one business unit, it cannot
then enter into an agreement to provide additional bandwidth to another business unit without
first determining the impact on the first customer.
Similarly, service delivery must address capacity planning. Capacity of resources spans
computational resources, network resources, storage services, and applications provided.
Measuring current utilization as well as planning for future needs are both part of capacity
planning.
Continuity management and availability management are also elements of service delivery. The
focus of these areas is the ability to continue to provide IT services in the event of a business
disruption, such as a natural disaster. This entails planning, monitoring, testing, and execution of
business continuity plans.

58
Chapter 3

The final element of service delivery addressed in ITIL is financial management with an
emphasis on understanding the total cost of ownership (TCO) of IT resources. As described
earlier in the section on financial KPIs, comprehensive measures, which take into account all
costs, is fundamental to financial management.
Although service delivery tends to address longer-term planning challenges in IT, the service
support discipline of ITIL concentrates on shorter-term needs and issues.

Service Support within ITIL


Within the ITIL framework, a single point of contact is provided for end users. This point of
contact, commonly called a service desk, coordinates multiple activities for users:
• Help desk support
• Problem escalation
• Change management
• Status reporting
The benefit of integrated service support for end users is the single point of contact for all IT-
related issues. In addition, this support helps IT professionals through the service desk’s broad
perspective. For example, if a user calls with an application problem, the service desk contact
would have information about recent changes to application servers, reports on network
performance, and information about other events within the IT environment that could impact the
user’s application.
Contrast that with typical Help desk interactions in which users are asked for their user IDs,
application names and versions, and a host of other information that should be readily available
to the support personnel. Help desk support often have limited access to information about the
current state of operations and have to depend on libraries of past incidents to help solve
problems based on those incidents.
Service support within ITIL shifts the focus from narrowly defined Help desk-like problem
resolution to a more comprehensive approach to customer support.

This shift from a narrow, problem-centric approach to a more comprehensive view only works when
service support staff has comprehensive information. A central aspect of SOM is the use of a
centralized repository of information in the form of the configuration management database (CMDB).
Without a CMDB or similar database, service support reverts to a less-effective silo-based problem
management practice.

Chapter 4 will provide details about CMDBs and their role in SOM.

A centralized approach to information sharing supports other areas of service support, including
problem management, configuration management, change management, and release
management.

59
Chapter 3

Planning to Implement Service Management


The third discipline with ITIL, planning to implement service management, addresses business
alignment. This topic examines the need to:
• Understand the strategic plan of the organization, and IT’s role within that strategic plan
• Assess the current state of IT services
• Establish objectives for meeting strategic needs
• Implementing the processes, policies, and procedures
• Measuring performance relative to objectives
Again, the topics addressed within ITIL dovetail well with SOM, which is driven, in part, by
fundamental business objectives, including business alignment.

Chapter 1 includes a more detailed discussion of business alignment with a discussion of coherent
business strategies, managing multiple objectives, and dynamic requirements.

Security Management
ITIL has adopted ISO 17799 as a basis for security management. That framework is discussed in
more detail in a bit.

Infrastructure Management
ITIL’s section on infrastructure management addresses four elements: design and planning,
deployment, operations, and technical support. The design and planning part of infrastructure
management spans business requirements to technical and architectural issues surrounding the
development of IT infrastructure. Tasks include developing business cases for plans, conducting
feasibility studies, and designing architectures. The deployment operations include project
management and release management procedures to improve the likelihood of a successful
rollout of new hardware and applications. Operations management addresses the day-to-day
activities that keep an IT infrastructure operational. These include system monitoring, log
review, job scheduling, backup and restore operations, and utilization monitoring. Technical
support encompasses a number of services, including documentation, specialist support for
problem resolution, and support for technical planning.

Release Management
Once software components have been acquired or developed, and tested in a quality assurance
environment, they are ready for production release. Release management is the practice of
moving software components into operation; this entails several steps, including:
• Adding software to a definitive software library
• Analyzing dependencies in the production environment and ensuring that the new
software is configured to function properly
• Scheduling resources to install and configure software
• Coordinating with training, Help desk, and other support personnel
Release management is a bridge process that moves software from project to operational status.

60
Chapter 3

Other ITIL Disciplines


Other disciplines in ITIL are equally linked to SOM operations. The business perspective
discipline, for example, includes continuity planning and change management. It also extends
beyond the scope of SOM to cover topics such as outsourcing. (Of course, outsourcing would
have an impact on operations governed by SOM, but SOM does not address business structures,
such as outsourcing, directly). Application management within ITIL discusses software
development methods and practices. ITIL is a broadly adopted framework for IT management;
another similar framework is COBIT.

COBIT
Governance has grown in importance along with increasing demands for compliance with
government regulations. For publicly traded companies and government agencies in particular,
ad hoc management procedures are no longer sufficient. Well-defined policies and practices that
support specific objectives defined in regulations are demanded of IT professionals.
COBIT was developed by the Information Systems and Audit Control Association (ISACA) as a
framework for controlling IT operations. Although there is less emphasis on execution than ITIL
offers, much of COBIT can help improve operations. COBIT is well designed to support
governance and complements ITIL’s focus on operational processes.
COBIT is a process-centric framework with four broad subdivisions:
• Planning and organizing
• Acquiring and implementing
• Delivering and supporting
• Monitoring and evaluating
Like the disciplines in ITIL, these processes are common to IT operations regardless of size or
industry. Within the COBIT framework, these processes are managed through a series of
controls. Each control includes an objective that is to be achieved, a method for achieving it, and
metrics for measuring the success of the control objective.
As the name implies, controls are in place to ensure objectives are met and processes can be
improved. These controls help to define the operational tasks that must be performed to maintain
compliance with both internal and external process requirements. Although COBIT is not
designed for a particular regulation, the breadth and focus of the framework makes it well suited
for meeting the demands of many regulations.

For details on COBIT, see the ISACA Web site’s COBIT offerings at
http://www.isaca.org/Template.cfm?Section=COBIT6&Template=/TaggedPage/TaggedPageDisplay.c
fm&TPLID=55&ContentID=7981.

61
Chapter 3

Planning and Acquiring


The planning and acquiring area of COBT includes business and organizational-oriented control
objectives. As noted earlier in this chapter in the discussion of KPIs, organizational objectives
are sometimes difficult to quantify. COBIT accommodates that challenge by defining a set of
high-level control objectives that are further refined into more quantifiable objectives. The key
areas of planning and acquiring processes include:
• Defining a strategic plan and determining technical direction
• Defining an information architecture and the processes and organization that support it
• Managing IT investments
• Communicating aims and managing human resources
• Managing quality, risks, and projects
Within these areas of planning and acquiring, each control objective includes a set of goals,
activities and metrics for measuring KPIs. Goals establish what is to be done, activities define
how to accomplish those goals, and the metrics are used to understand the effectiveness of the
activities. For example, within the objective of establishing a strategic IT plan, there are several
activities, including identifying critical dependencies, documenting IT strategic plans, and
building tactical plans.
These activities are measured with KPIs. Again, using the strategic planning process as an
example, the KPIs include the delay between modifying business strategy and updating IT
strategy, the percent of IT projects directly linked to IT tactical plans, and the degree of
compliance with business and government regulations.

Acquiring and Implementing


Acquiring and implementing processes focus on bringing technology into the organization and
enabling its use through proper change management and installation procedures. The key control
objectives within this area include
• Identifying IT solutions and maintaining the associated software and hardware
• Ensuring documentation and training are available to enable the use of procured systems
• Managing the changes to infrastructure and operations
• Validating and accrediting the installation of new systems
COBIT defines procedures focused on controlling these processes to ensure that they follow
established procedures. For example, before one can accurately identify needed software, one
must define the business requirement for the application first. Similarly, to validate and accredit
a hardware installation, the new system must be tested in an appropriately configured test
environment.
Metrics in this area focus on the delivery of functional systems and include a number of
emergency change requests, availability and accuracy of documentation, and percent of
requirements met by acquired systems.

62
Chapter 3

Delivering and Supporting


Delivering and supporting processes have the most control objectives of all COBIT activities.
This is not surprising because delivering and supporting constitute the bulk of IT activities. The
control objectives defined by COBIT for this activity include:
• Defining and managing service levels as well as managing outside service providers
• Ensuring continuous operations with appropriate levels of security and adequate capacity
• Providing technical support for users and configuration management for infrastructure
• Managing data as well as physical environment
• Managing day-to-day operations, such as job scheduling and output generation
The goals within this activity are similar to service areas discussed earlier in the ITIL section.
They include capacity planning, system monitoring, defining security plans, and establishing and
managing financial controls. Key metrics in delivery and support include percent of assets
included in capacity planning operations, the frequency of business disruptions due to
unavailable IT services, and the time between recognizing the need for training and the delivery
of that training.

Monitoring and Evaluating


Monitoring and evaluating is the fourth activity area within COBIT. The focus of these activities
is four closely related objectives:
• IT performance
• Internal control
• Compliance
• Governance
In each of these control objectives, the goal is to ensure that the service levels, standards, and
other requirements on IT operations are actually met. Internal controls and compliance focus on
ensuring IT activities meet audit requirements as well as government regulations. IT
performance and governance objectives control the alignment of IT with business objectives and
ensure that IT operations remain synchronized with the goals and objectives of the organization’s
leadership. Key metrics include the percent of critical processes that are actively monitored, the
time between identifying a process deficiency and the time it is corrected, and the time between
the issuance of a regulation and the time IT comes into compliance.
COBIT is a comprehensive framework that can be used with SOM models to implement controls
over IT services. The focus of SOM is to identify key services and enable their efficient and
effective management. COBIT is a framework that helps to meet that goal by defining thorough
and detailed control objectives. COBIT is structured with a clear definition of goals, activities,
and KPIs for the breadth of IT activities.
In addition to frameworks, such as ITIL and COBIT, that address the breadth of IT operations,
other useful frameworks focus on targeted areas within IT. The ISO 17799 security standard is
one such framework.

63
Chapter 3

Information Security and ISO 17799


The ISO 17799 standard, also known as the Code of Practice for Information Security
Management, is a set of control measures focused on preserving information security in a wide
range of organizations. It includes several subdivisions, each of which is composed of a series of
controls for preserving information security.

For more information about ISO 17799, including training material, articles, and compliant policies,
see the ISO 17799 Information Security Portal at http://www.computersecuritynow.com/. Two user-
supported sites provide additional information—the ISO 17799 Guide at http://iso-
17799.safemode.org/ and the ISO 17799 Community Portal at http://www.17799.com/. The full
standard can be purchased and downloaded from http://17799.standardsdirect.org/.

The main subdivisions are:


• Security policy
• Security organization
• Asset classification and control
• Personnel security
• Physical and environmental security
• Communications and operations management
• Access control
• System development and maintenance
• Business continuity management
• Compliance
Security policy, security organization, and asset classification and control address the need for
well-defined policies and procedures that protect the confidentiality and integrity of information
and the availability of infrastructure. Personnel security addresses the need for user awareness
training, and the physical and environmental security section covers the protection needs of
physical assets and safeguards for ensuring their integrity. Communications and operations
management deals with network security, and access control covers areas such as identity
management, authentication, and authorization of users.
System development and maintenance focuses the security needs of software development,
particularly those related to ensuring systems are developed with minimal risk of introducing
vulnerabilities when the system is deployed.
Business continuity and compliance address the same areas as their counterparts in ITIL and
COBIT—namely, ensuring that businesses will continue to operate despite disruptions and will
operate in compliance with relevant regulations.
The parallels between ISO 17799 and SOM are obvious. SOM and ISO 17799 share a number of
common areas, including network management, asset management, application development,
and server and client management. Although SOM applies to other aspects of IT management in
addition to security, the practices and information gathered during SOM operations are relevant
to security management. Again, as with other best practices described in this chapter, ISO 17799
can help guide management processes to realize the greatest benefit from a SOM model.

64
Chapter 3

Risk Management and NIST Guide for Technology Systems


The risk management guide published by the United States NIST is a set of best practices for
protecting organizations from IT-related risks. As the guide clearly notes, “The risk management
process should not be treated as primarily a technical function carried out by the IT experts who
operate and manage IT systems, but as an essential management function of the organization”
(“Risk Management Guide for Information Technology Systems,” NIST Special Publication
800-30, p. 1). This directive is derived from the same perspective common to all four best
practice frameworks discussed in this chapter; that is, that IT operations and services must be
aligned with business and business drivers must be IT drivers.

The full Risk Management Guide for Information Technology Systems is freely available at
http://csrc.nist.gov/publications/nistpubs/800-30/sp800-30.pdf.

While recognizing the need to align business and technical objectives of risk management, the
guide defines three processes in risk management:
• Risk assessment
• Risk mitigation
• Evaluation and assessment
The first step, risk assessment, is comprised of seven steps:
• System characterization, which defines the scope of the risk management effort and
identifies the assets and organizational units (OUs) involved in the effort.
• During threat assessment, threats, or potential agents of disruption, are identified along
with their sources.
• Vulnerability assessment discovers weaknesses in existing infrastructure that leaves the
system predisposed to disruption by threats.
• Control analysis, the fourth step, examines the controls, or countermeasures, in place or
planned for deployment that mitigate the potential for disruption by threats.
• Likelihood determination tries to pin down the probability of disruption given a set of
threats and vulnerabilities. This process takes into account motivation and capabilities of
the potential perpetrators, the nature of system vulnerabilities, and the effectiveness of
existing controls.
• Impact analysis determines the cost of disruptions caused by a threat being exercised
against an organization.
• Risk determination takes into account the impact of a threat and the likelihood to
determine the risk to the organization from that threat.
With the outcome of the risk determination phase, an organization can move to the next stage,
risk mitigation.

65
Chapter 3

Risk Mitigation
During the risk mitigation phase, information learned in the risk assessment phase is used to
determine appropriate measures for reducing risk for the least cost and with the least disruptive
impact on the organization. The risk mitigation phase has several components:
• Understanding risk mitigation options
• Developing and implementing risk mitigation strategy
• Conducting cost benefit analysis and dealing with residual risk
There are several risk mitigation options outlined in the NIST guide:
• Risk assumption, which essentially accepts the risks or provides for some controls to
reduce the risk
• Risk avoidance, which requires steps to remove the cause of the risk
• Risk limitation, which lessens the impact of a risk by use of preventive controls
• Risk planning, which introduces prioritized controls
• Research, which entails investigating the risk in an effort to discover new controls
• Risk transfer, which entails purchasing insurance to transfer the risk to a third-party
The guide provides several rules of thumb for risk mitigation strategies. First, if a risk does exist,
try to reduce the likelihood the vulnerability will be exercised by applying layered protection and
other architectural devices and administrative controls. Second, increase the cost to the potential
perpetrator so that the cost exceeds the value of the information stolen. Finally, when the cost is
great, purchase insurance to mitigate risk.
The risk mitigation strategy is implemented through a series of technical, management, and
operational controls. Technical controls contain some element of hardware, software, or
architectural countermeasure to mitigate risks. Management controls focus on policies,
procedures, and guidelines that work in conjunction with other types of controls to mitigate risks.
Operational controls focus on the governance of security measures and the identification of
weaknesses in the existing security posture of an organization.
Cost benefit analysis studies help to identity the set of controls in place, their cost, and their
impact on reducing risk. The purpose of conducting a cost benefit analysis is to find the best
combination of controls that mitigate the greatest risks for the least cost. However, even with
properly implemented controls and solid governance processes, risks may still remain. These are
known as residual risks.

66
Chapter 3

Risk Evaluation and Assessment


The risk evaluation and assessment component of the NIST risk guide focus on two components:
good security practices and keys to successful risk management. The recommended good
security practices include:
• Integrating risk mitigation into the software development life cycle
• Developing a schedule for assessing and mitigating risks
• Conducting risk mitigation studies when there are major changes in the IT infrastructure
or when there are major changes to policies
The guide also identifies key success factors, many of which are common to other best practices
and IT methodologies. These include:
• Commitment by executive and IT management
• Knowledgeable risk management team familiar with the particular IT environment as
well as risk management methodologies
• Cooperation of users
• Ongoing evaluation and assessment practices
The Risk Management Guide for Information Technology Systems is a framework for
addressing the problem of risk in IT systems. Unlike ITIL and COBIT, this framework is
narrowly focused on one management process within IT. It does, however, demonstrate that
specialized frameworks have much to offer IT management.

Leveraging SOM to Support Frameworks and Standards


The frameworks and best practices described in this chapter have distinct benefits and
advantages. ITIL emphasizes the improvement of executing IT operations. COBIT tackles the
problems of governance and control within IT. The ISO 17799 security standard and the NIST
risk management guide focus on particular processes within IT. As diverse as these frameworks
are, they have common characteristics and requirements.
These frameworks define repeatable processes that are broadly applicable to IT operations across
industries. They define controls, goals, and metrics for implementing and measuring the
effectiveness of those controls. They also leave implementation details to IT practitioners—and
this is where SOM makes its contribution.
These frameworks are guides for running IT operations, but they depend on raw information
about IT assets and processes. The ability to gather, analyze, and leverage that information is an
outcome of SOM. SOM structures include a centralized change management database and a set
of operationally oriented processes that parallel many of the tasks outlined in the frameworks
described in this chapter.

67
Chapter 3

Summary
There is no need to reinvent the wheel of IT management. Best practice frameworks, ranging
from broad frameworks covering all major areas of IT management to more targeted guidelines,
have been developed and are readily available for adoption by IT practitioners. These guidelines
provide details about what should be done. The next chapter begins to analyze how to implement
these practices using the tools of SOM.

68
Chapter 4

Chapter 4: Moving to a Service-Oriented System


Management Model
IT infrastructures are like ecosystems, they grow incrementally and in response to changing
conditions. Usually, but not always, IT infrastructures grow in response to an emerging business
or organizational need. Consider some typical scenarios:
• Is there an opportunity to expand into another geographical area? Remote offices, new
staff, and expanded network services will be needed.
• Is the company growing through acquisitions? How does an acquiring company know the
true value of the company being acquired? Can IT make the acquisition more seamless?
The financial industry is a perfect example of this growth model.
• Is the company downsizing and realigning divisions in response to maturing market
conditions? Hardware resources will have to be reassigned, software licenses re-allocated
and retired, and access control and other security policies revised to account for changes
in the organizational structure.
• Will a number of agency departments merge with another agency? The assets allocated to
those departments must be inventoried, software licenses reassigned, hardware moved,
and inventories updated.
• Has an audit discovered shortfalls in IT practices? New policies and procedures may be
implemented, additional security countermeasures might need to be deployed, and a new
monitoring process may need to be established.
In each of these scenarios, changes are made to an existing infrastructure that must continue to
function and provide services during the transition period. In fact, many IT organizations are in a
constant state of transition. This chapter will address the question: What methods and resources
are required to execute and manage those transitions in an efficient and effective manner? This
chapter lays the groundwork for implementing service-oriented architectures—a topic that is
addressed in detail in following chapters.

69
Chapter 4

Building a Foundation for Enterprise IT Systems Management


IT systems management depends on two fundamental principals. First, IT serves broader
organizational strategies, and IT policies define how IT operations will support those objectives.
Second, keeping IT operations in compliance with policies requires constant attention because of
an almost inevitable tendency of IT systems to change.
How to keep IT in alignment with organizational objectives is a broad and challenging question
that is well beyond the scope of this chapter. For the purposes of this guide, we will have to
assume that fundamental principal is met. Thus, assuming the direction of IT is synchronized
with broader business objectives, you can focus on more tangible objectives; specifically:
• Defining policies to direct IT operations
• Implementing procedures and practices to enforce those policies
With these two pieces in place, you have the foundations for enterprise-scale IT systems
management.

Figure 4.1: System management depends upon tools for implementing policies and procedures that are
based on an IT strategy that is aligned, along with other division strategies, with the overall organizational
strategy.

The policies, procedures, and related management tools are links from strategy to
implementation. Today, common application development models are distributed and service
oriented. The procedures and tools for managing those applications should be as well.

70
Chapter 4

The remainder of this chapter will begin a detailed examination of what is required to move to a
service-oriented management model and will address the following areas:
• Asset tracking
• Structure of configuration management databases (CMDB)
• CMDBs and asset life cycles
• CMDBs and service-oriented architecture
Together, these describe the fundamental components and processes that will support other
aspects of service-oriented systems management.

Asset Tracking
IT management has dual aspects by nature: it is both a process-oriented system closely aligned
with business objectives and it is an asset-centric system of highly interdependent devices that
are often in a state of change. Although the process-oriented aspects of IT are critical to the
successful use of IT, this chapter will focus on asset tracking. Asset tracking is the discovery and
tracking process of asset management; asset management also includes the management of the
financial and contractual details of the assets in the IT environment. Asset tracking can be
divided into several sub-topics (with some overlap with other areas of systems management,
which will be addressed in future chapters):
• Inventory management
• Patch management
• System security
• Risk management
• Licensing
• Service delivery
Together, the management areas constitute the fundamental areas of asset management.

71
Chapter 4

Inventory Management
Inventory management sounds like a pretty basic operation. After all, how difficult can it be to
count a bunch of PCs, servers, and peripherals? If only it were that easy. Inventory management
is as complex as the devices encompassed in the process and as diverse as the allocation of
assets. The term inventory management might conjure up images of a warehouse stocked with
boxes, clearly labeled and bar-coded, organized in an optimal way to make the most efficient use
of storage. In the world of manufacturing or distribution, this might be the norm, but IT
inventory is more varied.
Consider a desktop workstation. What needs to be tracked to ensure the device is properly
inventoried? This is essentially the same question as what are the physical and electronic units
that must be tracked? A workstation is not a single monolithic device; it consists of multiple
devices that could be tracked:
• Monitor
• Keyboard and mouse
• Case, motherboard, and CPU
• Memory
• Hard drives, CD drives, and DVD drives
• Peripherals such as speakers, Web cams, and similar devices
And that is just the hardware. A workstation will also have software, including:
• Operating system (OS)
• Productivity software, such as word processors, spreadsheet, and presentation
applications
• Utility software, such as file transfer programs
• Security software, such as antivirus and personal firewalls
• Custom applications
All of these components can be tracked as part of a workstation or may be tracked individually
depending on your needs. It is difficult to imagine, though, a realistic case in which all these
items could be successfully managed by grouping them into a single inventoried entity.

72
Chapter 4

Controlling Change with Standardization


You could, for example, imagine a scenario in which every salesperson in a company is given an
identically configured notebook. Because the notebooks are exactly the same, they can be
inventoried at the notebook level without concern for tracking details that are identical across all
machines. That argument will hold true for a while, but in organizations, time has a way of
introducing change.
For example, after the initial distribution of notebooks, how will IT ensure that new hires will
have the identical configuration 3, 6, or 9 months later? Of course, they might purchase the same
model from the same manufacturer and, with some luck, the manufacturer will use the same
components as used in the initial batch. The OS will be the same, or at least the same major
release. However, notebooks purchased 6 months after the initial set might incorporate an
additional service pack that had been released in the meantime. Should all the older systems be
upgraded to include the new service pack or should the service pack be removed from the new
systems? We are starting to introduce drift from the “one and only one” configuration.
Consider another example in the same scenario. Suppose that the notebooks are preconfigured
with antivirus software and a 1-year license for updates to the antivirus software. At the end of
that year, the organization might face a choice of upgrading to a new version of the antivirus
software or continuing with the current version and just renewing the update service. The
notebooks purchased 6 months after the initial batch are still under license for updates and no
action needs to be taken on those. What should the IT manager do?
If the manager upgrades the original set of notebooks, he or she will have to manage two
different configurations. If the manager does not upgrade, he or she might forfeit useful new
features and more effective versions of existing features. You could argue in favor of each
option, but the implications are that even in relatively simple scenarios, the pace of change and
the lack of synchronization among changes will create the tendency to veer from a single,
standard configuration.

Figure 4.2: Over time, the cumulative affects of small changes in configurations can result in compounded
changes and great variation.

73
Chapter 4

Fine-Grained Inventory Controls


It might not be what you want, but the trend toward greater complexity in inventory management
seems to be a ubiquitous influence on IT operations. You can leverage some advantages of
standardization to prevent completely ad hoc inventory growth and configuration management,
but even with standards, there will be variations within a standard that will need to be tracked.
For this reason, a fine-grained inventory management model is required.
With fine-grained inventory management, you make minimal assumptions about the
configuration of hardware and software. Similar workstations might have different hard drives,
two routers of the same model may have different configuration parameters, and application
servers might run different versions of the same OS. Rather than manage IT operations with rules
of thumb, like “All notebooks in Sales are configured with Windows 2000 SP3,” operations can
be managed with data.
In fact, fine-grained inventory control works only when broad and in-depth data about devices
and configurations is available. This, in turn, depends on the ability to collect, store, and query
this data cost effectively. To implement fine-grained security controls, moderate to large IT
organizations will have to implement:
• Automated collection of device information
• Robust storage mechanism for the collected information
• Reporting tools to provide administrators and managers with the necessary information
• Data integration to provide a comprehensive view of inventory and configuration data
In addition to knowing what assets are deployed, it is important to be able to keep those assets up
to date with the latest software releases and patches.

Patch Management
Enterprise software—such as application servers, databases, and OSs—is regularly updated.
Patches (relatively small amounts of application code and configuration data) are released by
developers and vendors to correct flaws and vulnerabilities in deployed software. Unlike
upgrades, patches do not generally contain significant new functionality. To effectively manage
patches, there are several steps systems administrators should conduct as part of the patch
management process. These steps include:
• Assessing the relevance of patches
• Testing patches
• Scheduling patch installations
• Implementing change control procedures for patches
• Deploying patches
Figure 4.3 shows a high-level flowchart describing the decision points and sequence of steps in
the patch management process.

74
Chapter 4

Figure 4.3: This patch management decision flowchart includes several decision points related to patch
relevance and the organization’s tolerance for partial functionality.

Assessing the Relevance of Patches


The first step in the patch management process is assessing the relevance of patches. Patches are
released to solve specific, known problems. When patches are targeted to specific applications
and flaws with the functionality of features, the assessment can be relatively straightforward. For
example, a Linux OS patch might correct a flaw in a file system that caused problems when
directories are shared through Samba with Windows users. If your environment does not use
Samba, there is no point in installing the patch. Other scenarios are not as clear cut.

75
Chapter 4

Many patches are released to address vulnerabilities in applications and OSs. Without these
patches, vulnerabilities can be exploited resulting in disclosure of data, tampering with data, and
a compromise of system control. When security patches apply to an application that is not used,
there is no need to install the patch. Systems administrators must be careful to distinguish
policies that dictate certain services are not used (for example, “Windows desktops will not use
Trivial File Transfer Protocol”) and what is actually done on devices (for example, TFTP was
mistakenly not removed from a desktop, and a user, unaware of the policy, uses it to transfer
files).

The fact that discrepancies can exist between policies and implementations is an argument for the
need for centralized management of assets. Although organizations can have well-defined policies,
without auditing and enforcement, organizations can experience a drift between policies and
implementations. Automated collection of asset information and a centralized repository of integrated
information are essential for cost-effective systems management.

Testing Patches
Patches should be tested before deployment even when applying security patches. Never forget
that patches are software components and, like all other software, may contain errors. The errors
may be within the patch themselves—that is, bugs in the program—or the errors may be at the
system level at which the patch replaces code on which other applications were dependent.
Dependencies are not always obvious. For example, an OS patch might replace a library routine
with a more robust version but in the process change the behavior of some of the undocumented
routines within that library. Side effects of the undocumented routines, for instance, may not
have detected some errors in applications that use the routine. The patched version may detect
those errors and now an application that has worked for months is suddenly broken. What should
be done? Systems and applications managers have basically three options:
• Install the patch and tolerate the loss of functionality in the dependent application
• Install the OS patch and, assuming it exists, a patch for the broken application
• Do not install the patch and tolerate the security vulnerability that prompted the release of
the patch
The first option is appropriate when the security vulnerability is so serious it outweighs the cost
of disrupting the other application and there is no time to investigate patches for the application.
An example of such a case would be when a fast-spreading piece of malware threatens to disrupt
services across the organization.
The second option highlights the chain reaction that patching can sometimes trigger. In today’s
complex array of distributed applications and service-oriented programming models, applications
cannot be considered monolithic, self-contained units. Applications are clusters of modules and
services that depend on other programs, so testing is essential to maintain the integrity of
distributed applications. Scheduling patch installations is also affected by the nature of
distributed applications.

76
Chapter 4

Scheduling Patch Installations


Scheduling sounds like a straightforward, almost trivial task, but it is not. The same
characteristics of IT environments that compel you to perform comprehensive testing on patches
also require you to carefully schedule patch installations. The need to test patches is driven, in
part, by the structural interdependence of software components. Scheduling is driven by the
functional interdependence of software.
Just as you would not want to change a jet engine part while a plane was in flight, you would not
want to patch an application while it is in use. Patches should be scheduled around peak load
times and regularly scheduled batch jobs; these times are known as “schedule outage windows”
and “scheduled maintenance windows.” Consider some typical scenarios:
• A retailer needs their point of sale system online during store hours.
• The same retailer needs applications online during the end-of-day closeout when the
financial records are updated.
• Business intelligence reporting is based on a data warehouse that is updated nightly with
extracts from the financial system.
Clearly, the interactive and batch loads on an application will influence when a system may be
patched as well as the time window for completing the patch.
Another factor that must be taken into account is the dependencies on other patches being in
place. For example, a customer support Web portal may need to be patched, but first the
relational database storing metadata and other portal content must be upgraded. That database
instance may support multiple applications, so any changes to it will require coordination with
other applications.
As with the testing of patches, it is imperative that systems managers be able to quickly
determine dependencies and assess the requirements of dependent systems and operations.
Again, a centralized repository of configuration information provides the foundation for tracking
and analyzing those dependencies.

Deploying Patches
Once the patches have been identified, tested, and scheduled, the last step is deployment. At this
point, the bulk of the work shifts from analyzing the patch and other software to actually getting
the patch where it is needed. The key goals of this stage are to ensure:
• All devices that require the patch receive it
• Patches install correctly
• Software is cataloged in the organization’s definitive software library
A CMDB can provide a list of devices that should receive the patch based on characteristics of
those devices. For example, the listing could be based on the OS, service pack level, browser
version installed on the device, combination of applications running on the device, and so on.

77
Chapter 4

Distributing Patches
In relatively small environments, manual distribution of patches may be feasible, but in any but
the smallest of organizations, an automated software distribution system is generally preferable.
Automated systems have a number of advantages:
• An automated tool will apply the same logic to distributing the patch. This reduces the
chance of inconsistent implementations.
• Automated tools can respond to unexpected events or conditions in a predefined manner.
For example, if a patch cannot be installed because a device is powered down, the
distribution server could reschedule the patch delivery for the next day. If a patch cannot
be installed because the disk is full, an error alert can be sent to the systems
administrator.
• Automated tools can log the installation of the patch. Such logs can be an important part
of the organization’s compliance regimen.
• Automated tools can deploy patches much more rapidly than manual distribution. Not
only does this result in reduced costs but can also improve security. Deploying patches in
hours instead of days could conceivably prevent the exploitation of a known
vulnerability.

After the SQL Slammer worm spread through large segments of the Internet in less than 15 minutes,
do not underestimate the speed at which malware or distributed attacks can spread. See Paul
Boutin’s article “Slammed: An Inside View of the Worm that Crashed the Internet in 15 Minutes” at
http://www.wired.com/wired/archive/11.07/slammer.html.

Getting a patch to a device is the first step of deployment. Ensuring it installed correctly is the
next.

Verifying Patches
During the installation process, status checks should be done to ensure patches are installed
correctly. There are many things that can go wrong during a patch installation:
• The process installing the patch does not have sufficient privileges
• The patch process depends on a network service that is not available
• The configuration of the target device does not match the data in the configuration
database
• The device runs out of disk space
• The device is rebooted during the installation process
If a technician were installing the patch, these problems could be addressed immediately or
avoided all together; however, given the need for automated patch distribution, the distribution
process will need sufficient logic to check for these and other failure conditions.

78
Chapter 4

Cataloging Patches
The end of the patching process occurs when the patch is checked into the definitive software
library. The patch is now an asset of the organization; systems and operations depend on it. The
definitive software library can provide version control, reporting services, and most importantly,
a secure copy of the patch should it be needed again. The patch life cycle sometimes intersects
with an area of asset tracking: system security.

System Security
System security is driven by three goals:
• Protecting the confidentiality of information
• Ensuring the integrity of systems and operations
• Maintaining the availability of applications and services
Asset management plays a significant role in security efforts to realize these three goals.

Protecting Confidentiality of Information


When discussing confidentiality with respect to system security, many will think of encryption,
digital signatures, and public key infrastructures (PKIs). Asset management is not necessarily the
first thing that comes to mind, but it is an essential part of protecting confidentiality.
The euphemisms “data spills” and “information leakage” have come into IT parlance as a way to
describe what is becoming a familiar event: poorly secured servers and related devices are
compromised by attackers, personal information is stolen, and organizations contact customers,
clients, and constituents to inform them of the possible disclosure of their personal information.
Banks, credit card companies, and government agencies that lose data are especially likely to
make headlines. The loss of confidentiality can usually be traced to one or more weaknesses,
either in asset management or with security policies.

Not all losses of confidential information are due to poor system configuration; sometimes it is a lack
of policy enforcement. One of the worst cases on record was the theft of names, Social Security
numbers, and dates of birth of 26.5 million veterans and some spouses from the U.S. Veterans
Administration (VA). An employee had taken a notebook, which contained the records, home, against
VA policies. The employee’s house was burglarized and the notebook stolen. See “Department of
Veterans Affairs Statement Announcing the Loss of Veteran’s Personal Information May 22, 2006” at
http://www.va.gov/opa/data/docs/initann.doc#May22Statement.

Information can be lost in a number of ways, ranging from social engineering techniques that get
people to reveal details about systems and accounts, to probing for known vulnerabilities in OSs
and applications.

79
Chapter 4

Ensuring System Integrity


Applications and OSs must be protected along with data. Systems can be compromised by a
variety of malicious software, such as viruses, worms, Trojan Horses, key loggers, and rootkits.
Systems administrators have a number of countermeasures at their disposal, including antivirus
software, personal firewalls, and content filtering systems. However, to be effective, these
countermeasures must be properly configured and kept up to date. This task is especially
important with signature-based detection systems used in antivirus and anti-spyware
applications.

Maintaining Availability
The best system is of no use when it is unavailable. Managing assets and their configurations is
especially important for maintaining availability. An improperly configured router or firewall
might not respond properly to a Denial of Service (DoS) attack. An intrusion prevention system
with out-of-date attack signatures may miss a new type of attack. As with preserving
confidentiality and protecting the integrity of systems, maintaining system availability is
dependent, in part, upon asset management practices.

The Role of Systems Management in Protecting Confidentiality, Integrity, and


Availability
Systems management practices can help reduce exposure by allowing system managers to track
their own vulnerabilities. Tracking vulnerabilities is a two-step process. First, systems managers
must keep updated on the latest vulnerabilities. A number of databases are available to help with
this task:
• The National Vulnerability Database at http://nvd.nist.gov/nvd.cfm
• U.S. CERT Vulnerability Note Database at http://nvd.nist.gov/nvd.cfm
• Microsoft TechNet Security Bulletins at
http://www.microsoft.com/technet/security/current.aspx
• Open Source Vulnerability Database at http://www.osvdb.org/
In addition, many vendors support mailing lists and forums for exchanging information about
vulnerabilities.
The second step in vulnerability tracking is identifying the existence of the vulnerability in your
IT infrastructure. Tools are available for vulnerability assessment and remediation; combined
with security configuration policies, tools can ensure all devices are secure and possess secure
configurations. This goal is achieved through multiple functions—endpoint security,
vulnerability assessment, and security configuration management. This method is the fastest way
to enforce system security.

80
Chapter 4

Figure 4.4: Both external and internal information is needed to properly assess the threat of system
vulnerabilities.

The practice of security management identifies vulnerabilities and countermeasures to protect


against those vulnerabilities. Unfortunately, there are often more vulnerabilities than an
organization can address adequately—enter risk management.

Risk Management
Risk management techniques are used to determine the appropriate response to risks:
• The risk of having a large number of desktops infected with a virus
• The risk of a DoS attack shutting down a customer support site
• The risk of natural disaster shutting down operations
• The risk of an attacker stealing trade secrets and proprietary designs

81
Chapter 4

Risk management practices take into account the cost if a risk is realized; for example, if a virus
does infect a large number of desktops and disrupts operations; the cost of countermeasures, such
as antivirus software, and the likelihood a risk will be realized.
Configuration management and asset management information is essential raw data for risk
assessments. Evaluators need to know how many devices may be vulnerable to a particular risk.
What is the value of those assets? What other systems could act as backup or temporary
replacements in the event of a local disaster at one site? Questions such as these can be answered
but only with up-to-date and comprehensive information about the state of IT assets.

For more information about risk management, see the Software Engineering Institute’s Risk
Management information on the topic at http://www.sei.cmu.edu/risk/index.html and the National
Institute of Standards and Technologies’ Risk Management Guide for IT Systems at
http://csrc.nist.gov/publications/nistpubs/800-30/sp800-30.pdf.

Licensing
Software licenses are a special type of asset—one that can easily be overlooked. Licenses are not
concrete assets, so you do not see them as you walk through the office or into the data center.
Sometimes, the physical manifestation of a license is little more that a contract filed away
somewhere. The way the paperwork is often managed belies the complexities of license
management.
To begin with, there is no single, standardized form of licenses. Software vendors have
developed a number of licensing models in response to market demands and their own quest to
maximize revenues. Example licensing models include:
• Licensing by the number of concurrent users
• Licensing by the number of named users
• Licensing by site
• Enterprise licenses
• Leased licenses
• Licensing by CPU
• Pay-per-use
• Feature-based licenses
• Evaluation licenses
The variety of licensing models combined with the sometimes high rates of change within IT
environments, it is easy to imagine how quickly license management can become a management
burden. The license management life cycle is another dimension of complexity to license
management. As Figure 4.5 shows, once software is procured, it can pass through multiple states
before it is finally retired.

82
Chapter 4

Figure 4.5: The multiple paths through the software license life cycle compound the complexity of licensing
models to make license management especially challenging.

Like other aspects of asset tracking, to successfully and cost effectively manage software
licenses, it is essential to have a detailed database of information about licenses, their
deployment, and how they are being used, if at all. Understanding how licenses are being used
allows IT and procurement to know how many licenses need to be procured and whether unused
licenses can be harvested and reallocated to get the most of the organization’s investments.
Organization should have neither too few licenses, and be out of compliance with their software
vendors, nor should they have too many, and incur unnecessary expenses.
As licenses often move with other assets—for example, a Web server is moved from one
department to another—it makes sense to manage the hardware and software together. This is yet
another example of a common systems management problem that can be effectively addressed
by a comprehensive asset and configuration management database. Another area of asset
management that benefits from configuration management information is service delivery.

83
Chapter 4

Service Delivery
Service delivery entails a broad range of tasks related to managing IT operations, including
service level management, capacity management, contingency planning, and financial
management. Knowing the state of assets and their deployment is part of each of these
operations. For example, to maintain service levels at expected rates of growth, systems
managers will need information about the assets deployed for a particular service and their
current utilization as well as the utilization of similar assets that may be redeployed in service to
another function. Asset management also benefits from accurate and up-to-date reporting about
the allocation of assets, especially when IT departments use a charge-back model to bill
departments and lines of business for IT services.
Asset management is a broad topic. A common requirement of the tasks within asset
management is the need for comprehensive information about the state of hardware, software,
and license assets. A CMDB serves that purpose. By no means is a CMDB a panacea for IT
management problems, but it is a fundamental tool for effective systems management. As
applications become more distributed and service oriented, it is only appropriate that system
management practices align themselves in a service-oriented model. The CMDB is central to that
model.

Structure of CMDBs
The purpose of a CMDB is to store and integrate information about IT assets—known as
configuration items (CIs)—and their configuration status as well as process-related information.
The CMDB model consists of four logical layers: the service layer, such as Human Resource
services; the system layer, which is the system, such as SAP, that implements a service; the sub-
system, or components of the system; and finally the physical layer, which consists of IT assets
that support the service (see Figure 4.6).
In addition to supporting the technical aspects of asset management, the CMDB model supports
financial management. Most CIs are shared resources, making it impossible to easily allocate
costs to business units, cost centers, or departments. If an IT organizations is able to model their
services, they can then allocate costs to departments based on the services, not the asset.

84
Chapter 4

Figure 4.6: The CMDB is a multi-layered model that links configuration items to define what is included in a
service.

The database consists of two parts: a definitive software library and a configuration and status
data repository.

Definitive Software Library


The definitive software library stores the authorized, production version of all software running
within an organization. The definitive software library is a logical data store and may consist of
one or more databases as well as physical storage, for example, for offsite media.
All items in the definitive software library are under change control and are managed separately
from development and test versions of code. Only tested and validated code should be kept in the
definitive software library; only code that is in the definitive software library should be released
for production use.
The purpose of the definitive software library is to have a single repository for production
software. This may sound like an extra layer of bureaucracy to developers and project managers
who are used to informal release management practices. Systems managers who are used to
archiving software themselves may have a similar reaction. A definitive software library may be
an additional step in the software development and deployment process, but it is essential for
maintaining control.

85
Chapter 4

Figure 4.7: The DSL is a logical construct with multiple physical instantiations.

Consider an analogy: Any company large enough to have multiple departments has a single
finance department that keeps official finance records. Individual departments do not keep their
own set of books (at least not an official set of books). Financial information of public
companies must adhere to strict standards relating to what kind of information is tracked, how it
is reported, and how it is audited. For financial reporting, there must be one set of books.
Similarly for software, there must be a single set of software that constitutes the body of
applications functioning in an organization. Without it, there would be no way to confidently
reconstruct the state of applications across an organization or confidently release code to
production.
The definitive software library is relatively static. Only when code is ready for production
release is the definitive software library updated. The configuration and status data repository,
the second part of the CMDB is much more dynamic.

Configuration and Status Data Repository


Configuration and status data is collected in real time or near real time. Data is collected from
multiple types of sources:
• Client devices
• Servers
• Network management devices
• Event monitoring applications
• Asset management data (financial and contractual details, SLAs, and so on
• Human Resources information (user, owner, cost center, location, department)
• Security standards
This diverse array of devices and applications are collectively known as configuration items that
share some common characteristics but have distinct attributes as well.

86
Chapter 4

Clients and Server Configuration and Status Data


Client devices and servers have several common attributes that should be tracked, including OS
and patch levels, applications installed and their versions, hardware configurations, and disk
utilization. In addition, the CMDB should track relationships between configuration items, as
Figure 4.8 shows.

Figure 4.8: A CMDB includes relationships between configuration items.

Network Management Devices


Network management devices will have software version information such as client devices and
servers as well as detailed configuration information. For example, routers will have
configuration information such as DNS servers, domain suffixes, multicast routing settings, and
virtual LAN (VLAN) and tunneling settings. These settings are highly device dependent and will
vary with the type of network device. The CMDB can store and manage change-controlled
versions of these settings in addition to information about the software running on the network
device.

87
Chapter 4

Event Monitoring Applications


Event monitoring applications can track a number of event types, ranging from day-to-day
system monitoring events to potential security threats. Many network devices will generate
detailed log files. The sheer volume of the raw data can render it almost useless unless tools are
used to filter and summarize data into manageable units, which can then trigger notification of
significant events. Such events can include:
• Low disk space on a database
• Excessive number of dropped packets on a router due to heavy traffic
• Unusually high volumes of network packets from external sources, which could indicate
a DoS attack
• Web servers that do not respond to ping requests
The breadth in types of events that can be tracked in a CMDB is limited only by the types of
events that occur on a network.

Beyond Silos: Integrating Data


CMDBs must not only collect information but also integrate it. Although it is helpful to have
disparate information collected in one place, it is far more useful when it is structured in such a
way so that users can query the database for data drawn from multiple sources. For example, no
single device on the network can list all devices running a particular version of software that
have triggered a particular event in the past 24 hours; a centralized and integrated CMDB could.
A CMDB ideally provides both a definitive software library and a repository for asset,
configuration, and process event tracking. In addition to supporting day-to-day management
activities, it helps to manage the asset life cycle.

CMDB and Asset Life Cycle


Up to this point, the discussion of CMDB has focused on long-term storage of production
software and tracking the state of devices and network operations. CMDBs, however, also play a
role in tracking the assets through their life cycle.
Earlier, the chapter examined an analogy between financial recordkeeping and software
management. Like finances, software should be centrally managed according to sound best
practices. That discussion skirted a subtle but essential distinction between funds and IT assets:
money is fungible and IT assets are not. Although a dollar is a dollar, servers and PC are not
interchangeable, in general.

88
Chapter 4

Assets should be individually tracked through their entire life cycles. Keeping the lineage of
devices can be useful in a number of ways, including financial recordkeeping, compliance
reporting (for example, was a device with export controlled security software ever used outside
the United States?), and version control, and usefulness of the CI in service delivery.
The stages of the asset life cycle are similar to the software license life cycle depicted in Figure
4.5 and include:
• Procurement
• Deployment
• Transfer to another organizational unit (OU)
• Transfer outside the organization
• Decommissioned
In addition to the coarse-grained changes, finer-grained changes, such as software updates or
hardware upgrades, are tracked by other processes within asset management services. Asset
management is just one of the services that are both central to effective systems management and
supported by CMDBs.

Summary
This chapter opened with the question: What methods and resources are required to execute and
manage transitions in an efficient and effective manner? The answer to that question is
multifaceted. The methods that are required include the various processes of asset management:
• Inventory management
• Patch management
• System security
• Risk management
• Licensing
• Service delivery

89
Chapter 4

These methods are coupled with resources, such as the definitive software library and the
configuration and status repository of the configuration management library, to support systems
management. As first described in Chapter 1, systems management consist of several disciplines:
• Service level management
• Financial management for IT services
• Capacity management
• Change management
• Availability management
• IT service continuity management
• Application management
• Software and hardware asset management
Each of these services depends in some way on the data collected, integrated, and aggregated in
the CMDB. Although each of these disciplines presents different aspects of IT management, they
are all subject to common constraints, especially the rapid and persistent state of change that
characterizes many IT infrastructures.
Moving to a service-oriented management model depends on the effective use of polices that are
aligned to overall business strategy and mechanisms for ensuring those policies are implemented
and enforced. One of the first tasks that must be addressed in a service-oriented management
model is the ability to track assets, both their physical characteristics and their life cycles. Many
of the sub-disciplines of asset management depend upon a common base of information, which
should be stored and integrated in a CMDB.
CMDBs are not tied to a particular aspect of systems management. Rather, they are designed to
house information on the breadth of IT operations to allow for an integrated and service-oriented
view of operations. Organizations can support the various disciplines of systems management
with individual applications that essentially become silos of information.

90
Chapter 4

Figure 4.9: CMDBs integrate the multiple facets of service-oriented management.

Service-oriented management is a mosaic of interconnecting parts. This chapter has introduced


the fundamental elements of service-oriented management and their linkages through the
CMDB. The next chapter turns attention to the elements of service support, such as incident,
configuration, and change management.

91
Chapter 5

Chapter 5: Implementing System Management Services, Part


1: Deploying Service Support
Much of the work in systems management is service support—keeping devices and applications
functioning and ensuring that they continue to meet the changing needs of the organization. This
task entails managing changes as new assets are added and others are retired; reconfiguring
systems in response to changes in the infrastructure, such as growing demands for network
bandwidth; and releasing new versions of applications to geographically distributed users.
Service support is especially challenging because of the breadth of services that are typically
supported by IT operations and the depth of detailed information required for service support.
The breadth of operations, from upgrading operating systems (OSs) and reconfiguring routers to
planning software releases and responding to security incidents, can be labor intensive. For
example, upgrading the OS on one desktop computer might take one hour in a simple case.
Coordinating times to install the upgrade with users and dealing with unexpected consequences
of the change add to that time.
Ensuring the Quality of Service (QoS) delivery depends upon detailed information about the
state of devices and processes running on those devices. A systems manager cannot simply
install a new application or upgrade an existing application without understanding how the
system is currently used. For example, a Java application server may depend upon one version of
the Java runtime environment (JRE), but another application, about to be in installed on the same
server, requires a different version of the same runtime environment. The systems manager
cannot uninstall one version of the runtime environment and replace it with another without
disrupting the application server operations.
Clearly, to be effective and efficient, service delivery operations must be built on a foundation of
well-defined processes and, ideally, automated operations. The previous chapter introduced the
configuration management database (CMDB) as a central component of service-oriented
management. This chapter builds on that with a discussion of processes that leverage the CMDB.
In particular, this chapter will cover the common characteristics of the multiple elements of
service support, as well as details about:
• Incident management
• Problem management
• Configuration management
• Change management
• Release management
The chapter concludes with a discussion of the unifying elements of service-oriented
management with respect to service delivery.

92
Chapter 5

Elements of Service Support


Service support is about responding to change. The needs of users change. Configurations
change. Unexpected events occur. The specific details of these changes will vary, but how
systems managers respond should not. A set of well-defined processes are at the core of service-
oriented systems management. Those processes—incident management, problem management,
configuration management, change management, and release management—are discussed in
detail later in this chapter. In this section, the focus is on the shared attributes and
interdependencies of these processes.

Interdependent Service Support Processes


Service support processes have different specific goals depending on the area of service delivery
they address (see Figure 5.1). For example, in incident management, something has disrupted the
normal flow of operations, and the goal is to restore normal services as soon as possible. Problem
management, in contrast, takes a more holistic approach and attempts to prevent incidents by
detecting patterns in incidents and identifying root causes. Even with their different goals,
service support processes are highly interdependent.
In some cases, those root causes could be the result of improperly configured devices, in which
case, configuration management processes must be examined. Were they applied properly? Is
there a deficiency in the process that allowed a flawed configuration to enter production service?
Perhaps the correct configuration specifications had been defined but they were not properly
installed; in that case, the release management procedures require review.

Figure 5.1: Service support processes are interdependent and can support or trigger other each other.

93
Chapter 5

It is clear from these simple examples that information relevant to one process can be vital to the
proper implementation of other processes. It is also clear that errors in one set of procedures can
have ripple effects that cause other processes to be activated. For both of these reasons,
automated configuration management services can improve service support.

Automated Configuration Management


The objective of automated configuration management is twofold. First, it is to obtain and
maintain information about the state of devices and applications deployed throughout the IT
infrastructure. The second objective is to support multiple IT operations, especially service
support.

Automated configuration management is a mechanism that supports all parts of service support, not
just traditional configuration management. This mechanism should not be confused with early
configuration management tools that provided limited information and provide little support for related
service support operations.

To meet these objectives, automated configuration management applications use several


modules, including:
• Agents for collecting configuration and other status information
• Centralized data repository
• Process flow support
• Information retrieval
Together, these modules provide the core services of automated configuration management.

Data Collection Procedures


Data is gathered from devices using agents, or applications that collect information locally and
transmit it to a central repository. These agents should be relatively lightweight and autonomous;
once installed and configured, they should require little systems manager intervention.
The configuration processes entail setting a number of characteristics, including:
• Data collection policies
• Frequency of data collection
• Data transmission information
• Authentication mechanism
The data collection policies define what information is gathered and how frequently it is
transmitted. The information gathered can include local security policy settings, storage
utilization, significant system events, and other audit-related details.

94
Chapter 5

The frequency of data collection will determine how often the agent sends information to the
central repository. There is a tradeoff with this setting. Devices that frequently update the central
repository are less likely to have outdated data, but the data collection process places additional
demands that can adversely impact the performance of other applications.
When depending on agents, it is important for the central repository to accept data only from
authenticated agents. Distributed applications such as these are vulnerable to spoofing—that is,
an attacker or an attacker’s program pretending to be the real agent. An attacker, for example,
might want to cover his or her tracks by sending false information about failed login attempts or
the amount of disk space in use. By using cryptographic techniques, such as digitally signing all
transmissions, the repository can significantly reduce the chance of attacks. (See the sidebar,
Digital Signatures, for details about this security measure).

Centralized Data Repository


A centralized data repository for configuration management is one that supports multiple
functions related to service delivery, including managing configurations, which, in turn, support
both service delivery and security enforcement. Although the basic role of the repository is to
answer queries about the state of devices, it must be designed to support queries from multiple
domains. For example, from an incident management perspective, the database might be queried
about the software installed on a particular device and the dependencies between those
applications. This is useful in cases in which a newly installed application is not working
properly but works correctly on other similarly configured clients. In such a case, one of the first
questions to answer is: What are the differences between clients with a working installation of
the application and clients without?
In the case of problem management, support personnel may discover that a particular version of
a browser add-in causes parts of an application interface to fail. They may also find that rolling
back to an earlier version of the add-in resolves the problem. In this case, the configuration
database could be used to determine all devices that have both the problematic plug and need to
run the thin-client application. After the correct plug-in has been deployed, the database can be
queried to verify installation (assuming agents have updated the repository). These are relatively
simple examples; other more complex issues may require multi-step procedures.

Process Flow Support


Configuring IT devices often entails dependencies between components. Mechanisms for
supporting process flow can help control procedures that must be aware of those dependencies.
Consider requirements for rolling out a Web-based application that uses Java-based technologies
in clients’ browsers. In addition to updating client browsers with the latest security patches, the
release requires that the JRE is installed to a particular revision level. Once the browser is
patched and the JRE installed, a plug-in must be installed within the browser as well. Each of
these steps must be done in sequence and if one step fails, the succeeding steps should not occur.
The results of the installation must be verifiable.

95
Chapter 5

A process flow engine within a configuration management system could meet these requirements
if it supports:
• Ordered deployment of modules
• Tests for success of each step
• Conditional processing—for example, if the browser does not contain a particular patch,
it is installed; otherwise, it is not
• Detail logs of each step
Logged information about the deployment process should be available by querying the CMDB.

Information Retrieval
Information retrieval sounds trivial—you simply want to display data that is stored in a database.
What is not trivial is precisely specifying what data it is that you want displayed. At one end of
the information retrieval spectrum, there are query languages used by database developers and
the occasional power user. Even for relatively simple queries, this is not a reasonable tool for
most users. Consider the following query: a systems manager wants to list all resource
associations, the associated resource type, the name of the resource, and a brief description,
sorted by resource type. The corresponding database query would look something like (the
details depend on the database structure, but the example holds for a typical normalized
relational database):
SELECT
ra.resource_assoc_name,
rt.resource_assoc_type_name,
rt.resource_type_name,
r.resource_name
r.resource_descr
FROM
resources r,
resource_type rt,
resource_associations ra
WHERE
r.resource_id = ra.resource_id
AND
r.resource_type_id = rt.resource_type_id
ORDER BY
rt.resource_type_name
Query languages are not practical tools for working with CMDBs—they require an
understanding of the underlying data model and knowledge of the database query language,
typically a variation on ANSI standard SQL. However, query languages are quite flexible and
with the right query, one can find anything that is in the database.
Static reports lie at the other end of the information retrieval spectrum. They require no
knowledge of the implementation details of the database, but they are limited in their usefulness.
Static reports provide information about a limited amount of data and typically represent
designers and developers’ best guess at what information a systems manager will need.

96
Chapter 5

Between the two extremes lies parameterized reports. They provide some of the flexibility of
query languages along with some of the ease of use of static reports. Properly configured, these
reports can help guide users to the information they need (see Figure 5.2 for an example).

Figure 5.2: Information retrieval from complex data structures should use a combination of search and
guided querying.

Automated configuration management tools provide several mechanisms important for efficient
service support, including a centralized data repository, automated data collection, support for
process flow, and flexible reporting. The following sections describe how automated
configuration management can support the particular requirements of several service support
areas.

97
Chapter 5

Incident Management
Incidents are events outside of the normal operations that disrupt those operational processes. An
incident can be a relatively minor event, such as running out of disk space on a desktop machine,
or a major disruption, such as a breach of database security and the loss of private and
confidential customer information. Incident management is a set policies and processes for
responding to incidents, the goals of which are to:
• Restore normal operations as quickly as possible
• Track information about incidents for further analysis
• Support problem management by analyzing patterns of incidents
Incident management begins with defining what constitutes an incident, categorizing those
incidents, and measuring there occurrences.

Characteristics of Incidents
Something as generalized as “any event outside of normal operations” covers quite a large space
of possible events. By focusing on just those that are so disruptive that they cause a call to the
Help desk or other IT support services, you can limit the discussion to a manageable domain.
Within this domain of incidents, you can categorize incidents by several characteristics:
• Cause of problem
• Severity
• Asset or assets causing the incident
• Role of personnel experiencing disruption
• Resolution method
The cause of problems covers a wide range of topics.

Severity
Incidents should be categorized by severity; at the very least a three-point scale of minor,
moderately severe, and severe should be used. For each level of severity, IT organizations should
define acceptable resolution times, escalation procedures, and reporting procedures. For
example, minor incidents, such as password resets, should not consume too much time or
resources from the Help desk. A security breach, however, should immediately escalate, trigger
reporting to management and executives, and require rapid resolution.

Assets
The asset or assets causing an incident are important dimensions for tracking incident trends. If a
particular version of desktop application is causing an inordinate number of support calls, IT
managers should be able to detect this during problem management procedures. (There is more
information about problem management later in this chapter.)

98
Chapter 5

Personnel
Just as assets involved in incidents should be tracked, so should the users encountering the
disruptions. If a large number of personnel from a single department are generating a large
number of Help desk calls, there might be a problem with training or an application specific to
that department.

Resolution Method
The method for resolving an incident should also be tracked. This data can help determine
guidelines for selecting the appropriate response to an incident. For example, data about
resolution methods reveal that most OS problems that require more than 2 hours to solve
eventually require reinstallation. Given that, a support desk policy is instituted requiring that OS
errors that cannot be resolved within 2 hours will be addressed by formatting the OS drive and
restoring it from an image backup. These characteristics are especially useful when measuring
incident rates and analyzing trends by these characteristics.

Incident Types
Defining the cause of a problem can be more difficult than it seems at first because there are
sometimes multiple pre-conditions that must be in place for an incident to occur. Consider a few
examples. Password resets are one of the most common incidents reported to Help desks. The
causes of this type of incident include users allowing passwords to expire and forgetting
passwords—especially when users are expected to remember passwords to multiple systems
while not re-using passwords. All of these causes can factor in a single password reset incident.
In another example, an employee is saving a document to a network drive when the save
operation fails. An error message is displayed stating the network drive cannot be found.
Because the employee had been saving the document regularly, something must have occurred
since the last save operation. After the user has contacted the service desk, the service desk
technician tests several possible causes and determines that the problem is a failed network
interface device. In this case, determining the exact cause of the failure is not relevant unless the
problem occurs repeatedly; hardware has well-known mean times between failures (MTBFs) and
further root cause analysis is not likely to help reduce these types of incidents.

99
Chapter 5

The final example is more complex. A security breach results in a large number of customer
account and credit card numbers being exposed to attackers. The causes could include:
• Improperly configured firewalls that allow traffic on a port that should have been closed
• An un-patched database listener (a program that accepts requests to connect to the
database) that is vulnerable to known attacks
• Access controls within the database that do not adequately limit read access to sensitive
data
• Vulnerability in a database management system that allows for escalation of privileges
• Lax OS privileges that allow execute privileges on database administration tools
• Poorly designed applications that use over privileged database accounts
A database breach is a case in which a series of vulnerabilities must be in place for a successful
attack to occur. Had one of the vulnerabilities been compensated for with adequate
countermeasures, the attack would not have occurred as it did. For example, had the access
controls on database tables and views been sufficiently restrictive, the attacker could not query
the sensitive data even though he or she had made it through network, OS, and database
authentication security measures.

Information security is often compromised by a weak link; however, one effective countermeasure
can stop an attack that has exploited a number of other weaknesses. A best practice in security,
known as defense in depth, deploys multiple countermeasures to protect assets even if, in theory,
only one is needed. Security practitioners know that no single measure is perfect and multiple
countermeasures are needed to reduce of information theft and other threats.

The general categories of incident causes that cut across these examples include:
• Improper documentation
• Insufficient user training
• Configuration errors
• Previously unknown bug
• Known but un-patched vulnerabilities
• Unexpected changes in operating loads
• User error
Determining the cause of incidents is essential to understand both how to resolve the problems
and how many resources to commit to reduce the likelihood of those problems in the future.

100
Chapter 5

Resolving Incidents
Of all the topics in service support, the most time could be spent on resolving incidents; in fact, it
could be the topic of a very long book. The problem with resolving incidents is that there are so
many types and each can require a customized response. In some ways, resolving incidents is
like cooking—there is a different recipe for every dish, and there is a different response to every
incident. At the same time, general principals can be found that apply to a broad range of
challenges, whether culinary or technical.
The general principals for resolving incidents include:
• The time, effort, and resources committed to incident resolution must be commensurate
with the impact of the incident.
• Responses should be formalized with well-defined procedures that are more frameworks
than strict, precise sets of steps. Formulating such procedures would be too time-
consuming to be practical.
• All incidents and the response should be documented. In some cases, this can be as trivial
as incrementing a count of simple incidents, such as password resets, or as complex as a
detailed report describing a security breach.
• As with other service support operations, coordinate incident resolution information with
other asset information.
Consider examples from the extremes of resolving incidents: Password resets are one of the
simplest types of incidents to resolve. Many organizations now use self-service methods to
address them. One could attempt to drive down the number of password resets, but after a certain
point, the economics do not justify the effort to do so because the marginal cost of resetting a
password with a self-service system is small. As the next section on trend analysis will show,
password vulnerabilities could become a factor in broader security management issues in which
the costs of poor password management grow much higher.
Security incidents are some of the most costly. According to the FBI/Computer Security Institute
(CSI) Computer Crime and Security Survey, 639 respondents reported a total loss of almost $43
million due to virus attacks and more than $31 million due to unauthorized access. Individual
incidents can be extremely costly. For example, 40 million credit card accounts were
compromised at CardSystems Solutions, a credit card processor, causing it to lose major credit
card customers.

For more information about the CardSystems Solutions breach, see Clint Boulton’s “MasterCard: 40M
Credit Card Accounts Exposed” at http://www.crime-research.org/news/28.06.2005/1321/. The
FBI/Computer Security Institute 2005 Computer Crime and Security Survey is available at
http://i.cmpnet.com/gocsi/db_area/pdfs/fbi/FBI2005.pdf.

Resolving incidents requires detailed information, whether one is dealing with password resets or
security breaches. A centralized repository of configuration information is especially helpful
when the incident is caused, in part, to hardware, software, or system configurations.

101
Chapter 5

Problem Management
Problem management is focused on reducing incidents and their impact on an organization’s
operations. Problem management and incident management, although tightly coupled, differ in
several ways:
• A problem is the underlying cause for multiple disruptions; an incident is one of those
disruptions.
• Problem management addresses the underlying cause of multiple incidents; incident
management entails responding to an instance of disrupted operations caused by a
problem.
• Problem management attempts to detect and address root causes of problems; incident
management attempts to restore normal operating functions, possibly without fully
correcting the underlying cause.
Problem management depends on data from multiple incidents, so a CMDB and incident
repository can support the investigation and analysis of root causes. For example, if an end user
application repeatedly crashes on some but not all client devices, the CMDB can be used to
determine what the affected systems have in common that are not found in the unaffected
devices.

Figure 5.3: CMDBs can help to rapidly identify common characteristics of devices affected by an incident,
thus supporting root cause analysis and problem management.

Once the cause of a problem is identified and a solution developed, the problem and solution
should be documented for future reference. Even if the identical problem is not likely to occur
again—for example, all servers are patched for a known vulnerability—the solution description
may help to solve other somewhat similar problems.

102
Chapter 5

Trend Analysis
Another part of problem management, and closely related to incident management, is trend
analysis. The function of trend analysis is to determine the frequency of particular types of
problems and determine which, if any, incident types are increasing. Trend analysis can lead to
introducing new methods or devices. For example:
• The increasing number of password resets, coupled with the cost of staffing Help desks,
can create a cost justification for a self-service password reset.
• Rapid growth in email storage requirements may justify the use of a network appliance to
filter spam.
• Discovery of an increasing number of conflicts between newly deployed applications and
legacy applications can lead to changes in software testing methodology.
Trend analysis in itself does not solve problems but identifies categories of problems that are
growing in severity or frequency. A general problem that can have ripple affects throughout an
IT infrastructure is errors in configuration management.

Configuration Management
Configuration management is the process of controlling changes to device configurations in an
IT environment. There are five basic operations in configuration management:
• Planning
• Identification
• Control
• Status accounting
• Verification and audit
Together, these operations provide the means to control the establishment and maintenance of
device configurations.

Planning
Planning within configuration management is similar to other IT operations; that is, the focus is
on setting an overall strategy, defining the policies and procedures necessary to implement that
strategy, and identifying configuration items that should be tracked within the CMDB. The
configuration management strategy defines the scope and objectives of the configuration
management process. For example, the scope of a typical plan includes all managed devices
within an organization; the objectives include maintaining the availability and integrity of
devices, ensuring efficient use of resources, and minimizing maintenance and training costs.

103
Chapter 5

Managed devices are those that are under the control of an organization and function within the IT
infrastructure; unmanaged devices function within the IT infrastructure but are uncontrolled by the IT
department. Examples of unmanaged devices include servers used by business partners and
desktops used by customers to access online services.

The planning process also defines roles and responsibilities. A single device may be maintained
by several roles. A server, for example, may be the responsibility of a systems manager who is
responsible for the OS and access controls, a network administrator who is responsible for
configuring network hardware and protocols, and an application administrator who maintains
services provided by the server. The CMDB is used across service support operations, but its
function and maintenance fall under the scope of configuration management planning.

It should be noted that configuration management planning is not a one-time event. These plans are
typically subject to change as business and organizational requirements change. A comprehensive
review of configuration management plans is recommended twice year.

Although the planning process focuses on the overall configuration management process, the
identification process addresses the details of the operation.

Identification
Any entity tracked by configuration management is known as a configuration item (CI). Several
characteristics of configuration items are recorded:
• Name and description
• Owner of item
• Relationships to other items
• Versions
• Unique identifiers
It is important to identify CIs to the level of independent change. For example, if laptops are
treated as a single unit and hard drives are not moved among laptops, there is no need to track the
hard drives independently of the laptop. However, optical drives used for backups and moved
among servers should be managed as distinct devices.

104
Chapter 5

Control
The control process ensures that all configuration items are properly identified, their information
is recorded in the CMDB, and any changes are done in accordance with change management
procedures. (Change control is discussed in detail later.)

Status Accounting
Status accounting is the process of recording state changes to a configuration item. The most
common states are:
• On order
• Received, pending testing
• Under test
• Installed to production
• Under repair
• Disposed
All state changes should be recorded so that the CMDB always has an accurate representation of
the IT infrastructure. This information is also useful for problem management, especially for
detecting devices with high incidents of repair or long repair periods.

Verification and Audit


During verification and audit, the contents of the CMDB are compared with the physical
configuration items to ensure that information about them is correctly recorded. Documentation
about changes to configuration items should also be verified during audits.
Configuration management is an ongoing operation. Some processes, such as planning and
verification, occur at regular intervals. The other processes are continuous.

The Configuration Management II Community at http://www.cmcommunity.com/ provides resources,


forums, and other material related to configuration management.

105
Chapter 5

Change Management
Change management is the process of controlling modifications to configuration items so as to
minimize incidents that disrupt normal operations. The reason change management is so
important is that one change can have ripple effects through multiple other assets (see Figure
5.4).

Figure 5.4: Changes in one configuration item can have ripple effects through other items.

Ripple Effects of Change


Consider a change to a hypothetical management reporting system. Until now, managers have
received static reports in Adobe PDF format once a week summarizing the business activity of
their units. The IT department is deploying an interactive reporting system. Here are some of the
changes related to the new system:
• Client software must be installed on power users’ desktops and laptops; basic users will
use a Web-based system to retrieve reports.
• The new client software for power users will have to be added to the patch management
process to ensure clients receive all relevant security updates.
• Middleware software, including a downloadable applet, will have to be installed on Web
servers.
• Firewall rules will be modified to allow traffic on a newly opened port to the Web
component of the reporting system.
• New groups and roles will be added to the database access control system to allow
managers to retrieve information using their network login ID.
• Additional database servers may be required to support the additional load of ad hoc
querying.
• Failover servers may be added to ensure managers access to the database if a Web server
should fail.

106
Chapter 5

Clearly, what appears at first to be a software change can quickly propagate ripple effects to
other software components, hardware devices, and network settings.

Large numbers of emergency change requests is an indication of failures in other processes, such as
planning, testing, patch management, and security management.

Change Controls
Formal change control procedures are one way to ensure that the effects of a change are
understood before the change is implemented. Formal methods are often based on a standardized
change request mechanism and a change review board.

Requests for Change


A request for change is a documented description of a change to the IT infrastructure. A change
can be as simple as installing a new version of a word processor on a user’s laptop to deploying a
new application for hundreds of users. Regardless of the complexity, several characteristics of
changes should be documented:
• Originator of request for change
• Configuration item to change
• Implementation plans
• Back out plans
• Type of change, such as a change in requirement, bug workaround, additional hardware
for increased demand, and so on
• Reason for the change, such as compliance, policy change, defect, new business
requirement
• Priority, which can include emergency, urgent, or routine
• Detailed description of change
• Estimated time and resources required to implement
• Impact analysis for complex changes
• Approvals
Some of these items, such as the change description and the reason for change, are defined by the
user or department making the request; others, such as the time and resource estimate and impact
analysis, are determined by technical staff. The approvals required will depend on the priority
and the complexity of the change. A relatively simple change, such as a desktop application
installation, may require line manager approval, but a major software release will require
approval from several managers, both on the business and technical sides of the organization.

107
Chapter 5

Change Advisory Board


A change advisory board (CAB), sometimes called a change control board, is responsible for
reviewing complex requests for changes. A CAB should include managers responsible for
business operations as well as technical staff familiar with systems administration, network
operations, security, and software development and management. The purpose of the CAB is to
provide visibility to changes before they are made—not to slow the change process.
CABs are sometimes seen as bureaucratic road blocks to getting work done—they should not be.
Their role is to provide a checkpoint to ensure that the implications of changes are thought
through as completely as possible. Correcting a problem with a proposed change is much easier
and less expensive when done at the planning stages than after the modification has been made.
In organizations with mature, functional IT operations, change is managed as a formal process.
There is no avoiding change, but you can determine how we deal with it.
After changes have been reviewed, revised and approved, the next step is implementation. This
domain is addressed by release management practices.

Release Management
Release management is a demanding operation. The goal of release management is to preserve
the integrity and availability of production systems while deploying new software and hardware.
Several processes are included under the umbrella of release management:
• Planning software and hardware releases
• Testing releases
• Developing software distribution procedures
• Coordinating communications and training about releases
Release management is the bridge that moves assets from development into production.

Figure 5.5: Release management is the bridge between two high-level IT life cycles: development and
production.

108
Chapter 5

Planning Releases
Planning releases is often the most time-consuming area of release management because there
are so many factors that must be taken into consideration. For example, when deploying a new
sales support system, the release managers must address:
• How to distribute client software to all current users
• How to migrate data from the current applications database to the new database with
minimal disruption to database access
• How to verify the correct migration of data
• How to uninstall and decommission the applications replaced by the new system
• Verifying all change control approval are secured
Each of these issues breaks down into a series of more granular tasks. Consider distributing
client software. Release managers must account for variations in OSs and patch levels of client
devices, the need for administrative rights to update the registry if software is installed, and the
possibility of conflicts or missing software on clients.

One of the often-discussed advantages of Web-based applications is that client software does not
need to be permanently installed. This is true for the most part, but some software is still required to
support Web applications, including browsers, browser helper objects (BHOs), plug-ins, and, in some
cases, a JRE. The supporting software is subject to some of the same constraints and limitations as
client/server software—they sometimes require administrative privileges to install, they must be
patched as needed to maintain security, and they are subject to their own upgrade life cycles. Web-
based applications may ease some of the burdens associated with release management, but they do
not eliminate them.

Testing and Verifying Releases


Release managers can play an important part in the testing phase of the software development
life cycle; the key areas for testing and verification are:
• Software testing
• Data migration testing
• Integration testing
In each case, the testing process should constitute the final test and primary verification that the
newly deployed applications operate as expected in the production environment.

109
Chapter 5

Software Testing
It goes without saying that software should be thoroughly tested before it is released. In the ideal
world, software developers work in the development environment and deploy their code to a
testing and quality assurance environment that is identical to the production environment. It is in
the test environment that integrated module testing and client acceptance testing is performed.
This is not always possible. Large production environments may not be duplicated in test
environments because of cost and other practical limitations. It is especially important in these
cases that release managers work closely with software developers.
With responsibility for deploying software, release managers can provide valuable
implementation details about the production environment that developers should test. For
example, release managers will have information about the types of client devices and the types
of network connectivity supported as well as other applications that may coexist with the system
under development. Release managers may need to address data migration issues as well.

Data Migration Testing


In addition to supporting software developers on application testing and quality assurance
processes, release managers may also have to support database administrators who are
responsible for migrating data between applications. When the release of a new application
entails shutting down one system, exporting data, transforming it to a new data model, and
importing it into the new system, release managers will share responsibility for ensuring the data
is extracted, transformed, and loaded correctly. Again, this process should be thoroughly tested
prior to release, but realistic data loads are not always possible in test environments.

Integration Testing
Integration testing is the process of testing the flow of processing across different applications
that support an operation. For example, an order processing system may send data to business
partners’ order fulfillment system, which then sends data to a billing system and an inventory
management system. Certainly these would have been tested prior to deployment, but real-world
conditions can vary and uncommon events can cause problems. For example, spikes in network
traffic can increase the number of server response timeouts forcing an unacceptable number of
transactions to rollback. In this case, it is not that the systems have a bug that is disrupting
operations, but that the expected QoS levels are not maintained. Testing and verifying software
functions, data migration, and integrated services can be easily overlooked as “someone else’s
job,” but release managers have to share some of this responsibility.

110
Chapter 5

Software Distributions
Software distribution entails several planning steps. At the aggregate level, release managers
must determine whether a phased release is warranted, and if so, which users will be included in
each phase. Phases can be based on several characteristics, including:
• Organizational unit
• Geographic distribution
• Role within the organization
• Target device
When deploying new software or major upgrades, a pilot group often receives the software first.
This deployment method limits the risks associated with the release. (Even with extensive testing
and evaluation, unexpected complications can occur—especially with end users’ response to a
new application).
When distributing software, several factors must be considered:
• Will all clients receive the same version of the application? Slightly different versions
may be required for Windows 2000 (Win2K) clients and Windows XP clients.
• Will all clients receive the same set of modules? If a new business intelligence
application is to be deployed, power users may need full functionality of an ad hoc query
tool and analysis application, while managers and executives may require only summary
reports and basic drill-down capability.
• How will installation recover from errors or failure? Downloads can fail and need to be
restarted. There may be a power failure during the installation process. Disk drives can
run out of space. In some cases, the process can restart without administrator intervention
(for example, when the power is restored) but not in other cases (such as when disk space
must be freed).
• How will the installation be verified? Depending on the regulations and policies
governing IT operations, differing levels or verification may be required. At the very
least, the CMDB must be updated with basic information about the changes.
Software distribution is the heart of release management, but the ancillary process of
communication and training are also important.

111
Chapter 5

Communications and Training


The goal of communication in release management is to make clear to all stakeholders the
schedule and impact of releases. This is the responsibility of release managers.
Training users and support personnel on the released system is not the responsibility of release
managers, but both training and release managers should coordinate their activities. When
training occurs too far in advance or too late following the release of software, it may be of little
use; the users may forget what they are taught or have already learned the basics by the time
training occurs.
Release management is a bridge from project-oriented software development and application
selection to production operations. Although testing can be a well-defined and well-executed part
of the development life cycle, release management still maintains a level of testing and
verification responsibilities. In addition, the core operations of planning, software distribution,
and communications constitute the bulk of what is generally considered release management.

Summary
Service delivery depends on a mosaic of interdependent processes, including incident
management, problem management, configuration management, change management, and
release management. These processes constitute core operations within the SOM model.
The focus here, as with other SOM elements, is to define management tasks in terms of generic
operations that apply to a wide range of assets and can be adapted to new technologies as they
emerge. The center of management is not desktops, servers, and network hardware but the
operations that deploy, maintain, and secure them.
This chapter has introduced the first part of systems management services. Chapter 6 will
continue the discussion by examining management issues in service delivery, including service
level management, financial management of IT resources, capacity management, and availability
management.

112
Chapter 6

Chapter 6: Implementing Systems Management Services,


Part 2: Managing Service Delivery
Service delivery is a complex mosaic of multiple processes and procedures that are required to
introduce, manage, and develop information services. The previous chapter examined how
service delivery is deployed with processes such as incident management, configuration
management, change management, and release management. This chapter continues with service
delivery, but turns your attention to management.
The deployment step focuses primarily on executing procedures to keep IT operations running
smoothly and adapting to the changing needs of users. Management is more about planning,
monitoring, and adjusting. In particular, this chapter will address:
• Service-level management
• Financial services management
• Capacity management
• Availability and continuity management
These aspects of service delivery have a common characteristic: These activities address the
long-term IT needs of an organization. The deployment operations discussed in the previous
chapter are performed to ensure the proper day-to-day function of IT systems. If those activities
were not practiced, the consequences would be seen rather quickly. Poor management, however,
can continue for some time before the full effects are noticed. Nonetheless, proper systems
management must address both the short-term and long-term needs of IT services.

Service-Level Management
How much service-level management is necessary? If you had to distill service-level
management to its most essential form and present it as a question, that would be it. IT managers
must understand how much storage, computing resources, network bandwidth, training, and time
from developers, quality control specialists, and a host of other IT services are needed by users.
In addition, IT managers must know when these resources are needed. For example, will the data
warehouse extraction, transformation, and load process run during normal business hours or at
night? If it is during the day, the application hosting the data will need to accommodate its
normal workload plus export potentially large amounts of data. The network must also
accommodate the additional load. If the data warehouse is loaded in the middle of the night, the
demands on both the application and the network would be less. Something as simple as when a
process runs can have a major impact on the performance of that process.

113
Chapter 6

Throughout this guide and in best practices and control frameworks, such as those documented
in ITIL and COBIT, there is a major emphasis on formalizing processes and procedures. This
idea applies to service-level management as well. The mechanism most commonly used in
service-level management is the service level agreement. An SLA is essentially a contract
between business units and IT service providers, such as in-house IT departments or outsourced
service providers. These agreements typically define the scope and levels of service provided.
Service-level requirements define the functional requirements that the business needs in order to
carry out its functions. They also entail, although this is not necessarily explicit, the need for
communications between business units and service providers. Business unit requirements are
rarely static and even in the best situations, requirements may not capture all nuances of a
business unit’s needs. Success for both business units and services providers require that
communications do not stop once requirements are defined.
Requirements will vary according to business objectives, but several topic areas are common to
most business applications:
• Application functions
• Training
• Backup and recovery
• Availability
• Access controls
• Service catalog and satisfaction metrics
Each of these areas should be documented in service-level requirements.

Application Functionality
Within the section on application functionality, the project sponsors should define what the
system is to do. It is important to avoid becoming mired in implementation details at this point.
The goal is to define what the system should do—not how it should be done. For example, if an
application must be accessible from both traditional Web clients and mobile device clients, state
that purpose; there is no need to include design considerations, such as whether to use Handheld
Device Markup Language (HDML) or Wireless Markup Language (WML).
It is important to think of application functionality in terms of business tasks, such as:
• Providing customer support
• Verifying inventory
• Reporting on the status of operations
• Confirming customer orders
The specific functions might cover a broad range of options and they should be as inclusive as
possible when dealing with service requirement agreements, especially if part or all of the
service will be outsourced.

114
Chapter 6

Training
Training should address both service use and service administration. User training is relevant
when an application as well as network and hardware infrastructure are included as part of the
provided service. For example, if an outsourcing firm is providing a CRM service that has never
been used by the customer, end user training should be included in the scope of the requirements.
Administrator training is almost always required, even when most of the systems infrastructure
will be managed by the IT department or an outsourcing firm. Application administrators are
often responsible for implementing and maintaining users, roles, and access controls as well as
organization-specific configurations related to the application’s functions.

Backup and Recovery


Backup and recovery is a crucial element of service requirements. Applications and hardware
fail. Sometimes an error in an application will corrupt information in a database; in other cases,
hardware will fail and a disk becomes unusable and data must be restored from backups. When
defining backup requirements, include the following criteria:
• Recovery time objectives
• Recovery point objectives

Recovery Time Objectives


Recovery time objectives define the acceptable length of time that an application or service can
be down. For example, mission-critical applications—for example, a CRM—might have very
short recovery time objectives, such as a few minutes. In such cases, failover servers or servers
with redundant subsystems are typically used. In other cases, such as a data warehouse and
reporting application, the system could be down for as long as a day without adversely impacting
operations.
A general rule to keep in mind is that the shorter the recovery time objective, the higher the cost.
For example, backups can be restored more quickly from online disk arrays than from offline
optical disks, but the cost of disk arrays is higher than that of the offline storage media.

115
Chapter 6

Figure 6.1: Although implementation details are not part of service requirements, the cost of different options
can be a factor.

The time to recover is only one aspect of recovery criteria; another is specifying what it is that
will be recovered.

Recovery Point Objectives


If a server fails at 11:00am on a busy business day and backup files and a standby server is
available, operations could be back online by early afternoon. But what will be restored? Will all
of the changes made up to 10:59am be recovered? Will only changes made before the prior day
be recovered? The answer to these questions is determined by the recovery point objective.
Formally, a recovery point objective is the recoverable state of a system at some point in time
before a failure. The goal of the recovery time objective is to define the maximum time that a
service is unavailable—the recovery point objective defines the maximum amount of data that
can be lost due to a failure. Again, like recovery times, the better the recovery point, the more
likely it is to increase costs.
For example, consider a CRM application that manages customer account data that uses only
daily backups. If there were a failure at 11:00am and a backup had been performed at 3:00am,
the recovery point is effectively the previous business day. Any work done before 11:00am on
the day of the failure would be lost. Similarly, if the failure occurred at 4:59pm, a full day’s work
would be lost. This is not suitable for many situations.

116
Chapter 6

Figure 6.2: Backups without further availability measures can leave work performed since the backup
vulnerable to system failures.

Fortunately, there are availability procedures (discussed in more detail later) that can provide
recovery up to the point of failure. These tend to require more complex software, but they are
often used in applications designed for midsized and large enterprises.

Availability
Availability criteria answer the question “What is the tolerance for downtime with this service?”
The answer is obviously closely related to requirements for backup and recovery but also focuses
on the tolerance for downtime. Although backup and recovery procedures are designed for
particular recovery times and recovery points, availability addresses the question of how
frequently the business is willing to tolerate downtime.
For example, a server might go down at 11:00am and be back by 1:00pm the same day and still
meet backup and recovery requirements. If the same server goes down every day, it might still
meet the recovery objectives, but the business users are not likely to tolerate a system that is
down 2 hours of the day. The key questions with regard to availability in service requirements
are:
• How long can the system be down?
• How frequently can the system go down?
The length of time a system can be down is expressed in minutes, hours, or days. The amount of
disruption in the ideal world is virtually none, but in reality, the cost of countermeasures to
prevent downtime must be balanced with the benefits.

117
Chapter 6

The rational choice is to allocate resources to availability measures until the cost of those
measures exceeds the expected cost of the corresponding downtime. For example, if a high-
availability solution is available for $50,000 and promises to keep downtime to less then 5
minutes, and another solution is available for $5000 but reduces downtime to 1 hour, which
solution is better? The answer depends on the lost revenue or cost of being down. If, for
example, the business would loose $10,000 if the system were down for 1 hour, the less
expensive solution is a better choice.
The frequency is usually expressed as a percentage of total uptime. For example, if a system
should be available 24 hours a day, 7 days a week, and the requirement is 99 percent uptime, the
system could be down 87.6 hours, or more than 3 days per year. Table 6.1 shows the amount of
downtime allowed under several requirements.
Availability Rates
Total Hours per
Year: 8760

Availability Hours Down


Requirement per Year
98.00% 175.20
99.00% 87.60
99.50% 43.80
99.90% 8.76
99.95% 4.38
99.99% 0.88

Table 6.1: System availability requirements are often expressed as a percentage of total possible hours a
service could be available.

Additional areas that should be addressed in service requirements are security and access
controls.

Access Controls
Access controls dictate who can do what with information assets. When developing service
agreements, access controls tend to be high level, unlike application-specific access controls,
which can be detailed and fine-grained. Access controls are dependent on three mechanisms:
• Identification and identity management
• Authentication
• Authorization

118
Chapter 6

Identification and Identity Management


The purpose of the identification phase is to indicate to an access control system who a user
claims to be. A username, a device such a smart card, or a biometric measure can be used for
identification. Identification does not necessarily provide evidence for who you are; for that, you
depend upon authentication mechanisms.
An identity record is associated with each user of a system. In some cases, these are relatively
simple structures with little more than a username. For example, a basic UNIX password file
includes the username, encrypted password, user number, group number, home directory, and the
name of the shell program to run when the user logs in. In other cases, the structures are much
more elaborate.

Figure 6.3: LDAP directories maintain identity and organization information that can be leveraged for access
control management.

119
Chapter 6

Active Directory (AD) can be used to store detailed information about users, including
organizational role, phone numbers, email addresses, public keys (when a public key
infrastructure—PKI—is in use) and other identifying information. AD and other types of
network directories can store information about other structures and assets on a network:
• Organizations and organizational units (OUs)
• Organizational role
• Groups of users
• Devices
• Applications
An advantage of directory-based identity management is that applications do not need to
maintain separate databases of user information. Centrally managing basic user information still
allows applications to tailor authentication rules to their particular needs.

Authentication
Authentication is the process of proving one’s identity. Passwords are commonly used for this
purpose, but with all the well-known limitations of passwords, other techniques have become
more popular. Some other methods for authenticating to systems are:
• Smart cards
• Fingerprints
• Palm scans
• Hand geometry
• Retina scan
• Iris scan
• Signature dynamics
• Keyboard dynamics
• Voice print
• Facial scan
• Token devices
The biometric methods also serve as identification methods. The objective of authentication is to
grant access to a system only to legitimate users. Because a single method, such as a password,
can be compromised, systems with high security requirements may use multi-factor
authentication.

120
Chapter 6

With multi-factor authentication, two or more authentication methods are used to verify a user’s
identity. This method often combines multiple types of mechanisms, relying on, for example,
something the user knows (such as a password), something the user has (such as a smart card),
and something the user is (such as a unique fingerprint). Once a user has been identified and
authenticated, the user is granted access to the system. What the user is able to do with that
system is dictated by the authorization rules defined for that user.

Authorization
Authorizations are sets of rules applied to users and resources describing how the user may
access and use the resource. For example, users may be able to log into a network and access
their own directories as well as directories shared by all users in their department. The following
list highlights considerations for defining authorization requirements with regards to SLAs:
• Who are legitimate users of the system or network?
• How is their identity information maintained?
• How are users grouped into roles?
• How are privilege assignments to roles managed?
• Will the auditing capabilities of access control systems meet the audit requirements of the
customer?
As a rule, service level requirements should focus on what a service should provide, not how it is
provided—but access controls can be an exception to that rule. For example, if an organization
has invested in an identity management system, with constituent LDAP or other directories,
single sign-on (SSO) services, and authorization services, then an SLA can, and should, dictate
the use of that system. Sometimes you cannot avoid having to manage multiple access control
systems; in those cases, you should at least try to minimize their number. Another aspect of
service level management that spans multiple areas of IT is maintaining a catalog of IT services
and service metrics.

Service Catalog and Satisfaction Metrics


The service catalog is a list of all services provided by an IT organization. This list can include
both in-house and outsourced services. The catalog should include details about the service,
including:
• Applications used to support the service
• Devices used to support the services
• Description of availability of the service
• Costs of the services
• Dependencies or other restrictions on use of the service

121
Chapter 6

In addition to keeping track of what services are provided, service management best practices
dictate that you measure how well services are provided. Some common measures include:
• Response time
• Time to resolution
• Number of incidents by category
• Unit costs, such as cost of service per user or number of users supported per device
• Direct customer satisfaction surveys
These metrics, especially when applied to a service desk, should be categorized by priority. A
security breach that leaves a database of customer information vulnerable is an urgent incident
that must be responded to immediately. A problem that delays or inconveniences without
disrupting core business operations might be categorized as normal and addressed on a first-
come first-served basis.
Service level management spans the range of IT services. It includes some elements of business
continuity planning, security services, and capacity planning. Successful service level
management begins with well-defined SLAs that identify user needs in several areas as well as
the level of service users can expect in each of those areas. In addition, service managers are
expected to measure performance and maintain and improve service. Of course, all this
management, along with the rest of IT resources, cost money.

Financial Management for IT Services


Financial management for IT services is challenging. Customers’ needs change, technologies are
constantly evolving, and there are often multiple ways to solve problems—each with their own
advantages and risks. In addition to managing today’s operations and projects, managers must
plan for future needs. Four common tasks in IT financial management are:
• Cost accounting
• Forecasting
• Capital expenditure analysis
• Operations and project financial management
These are fundamental financial management tasks and not limited to IT operations.

Cost Accounting
Cost accounting is the process of allocating the cost of providing service to the recipients of that
service. It sounds like a reasonable method—you pay for the services you use. When you are
buying relatively simple products, like a spindle of DVDs for backing up files, you can go to an
office supply store, pick out the right product for your needs, and pay the pre-set price. Why
can’t you do that for all IT services? The answer is, as it often seems to be, that the simplified
models of how things work start to break down when you get to real-world scenarios that are
more complex than example cases.

122
Chapter 6

Competing Requirements
Consider the following scenario: An IT department provides a backup service. Some
departments have relatively simple backup and recovery requirements, while others are more
involved. The remote sales departments need their network file servers backed up at night, and
their backups should be kept for a week. After that, the backup media can be reused. A full
backup on the weekends and incremental backups during the week are sufficient. The total
amount of data backed up is in the 100s of gigabytes. Another department has a terabyte-scale
customer management database that must be backed up every day. Audit requirements
necessitate keeping a month’s worth of data. Recovery time requirements are so tight, there can
not be too many incremental backups between full backups (recovery from a single full backup
is faster than recovery from a full plus several incremental backups). To meet the requirements
of the department using the customer management system, the IT department has to buy a high-
end backup tape solution with robotic components and high-speed tape devices. How should the
costs be allocated?

Cost Allocation
This is where it gets complicated because there are a number of options. The remote office has
minimal requirements that could have been fulfilled without the high-end solution needed by the
customer management department. The remote office could be charged a rate competitive with
what they would pay for an outside service to provide the service. In this model, the remote
office does not incur additional cost because of the needs of another department.
Another model allocates the cost based on units of service provided. If the customer management
department uses 95 percent of the backup storage and the remote office uses 5 percent, the
former is charged 95 percent of the total cost of the service and the latter is charged 5 percent. In
this case, the remote office is paying a premium for high-end hardware it does not need.
A third option is to use a graduated schedule of charges so that the customers using the least
amount of service pay less than the customer that forces the IT department to use high-end
solutions.
Yet another option is to have two backup solutions: one for low-end needs and one for high-end
needs. Each department would pay the full cost of its solution. Unfortunately, this could be the
most expensive option because two types of systems would have to be purchased and
maintained. This is the least rational solution for the organization as a whole.
In practice, the second option, allocating costs based on usage, is the easiest to implement. It
avoids the competitive analysis required by the first option, the political battles associated with a
graduated scale, and the extra expense of the two-solution option.

123
Chapter 6

Implementing Charge Backs


Once costs have been allocated to users, the IT department can charge those departments for the
service. These costs to the customers are known as charge backs. Some IT organizations operate
as an internal service that recovers all their costs from customers. There are definitely advantages
to this setup. For example, when costs, including IT costs, are allocated properly, managers can
determine which products or services are profit makers and which hurt profits. However, when
costs are distorted, the true cost of a single product or service cannot be determined.
Care must be taken when using charge backs in profit calculations. If a department is not allowed
to seek competitive bids for a service, such as backups, should the department be held
responsible for having to pay the higher prices for an internal service? Cost accounting attempts
to provide accurate measures of the true costs of services, however, in practice, the complexities
of providing a service and the global considerations of the organization as a whole can introduce
distortions that are not accommodated by basic accounting methods. Another challenge facing
managers is trying to estimate future needs.

Forecasting
Forecasting is as much an art as a science. It is fundamentally about estimating the cost of future
services, which include several types of costs, such as:
• Labor, including both employees and contracted labor
• Capital expenditures
• Lease costs
• Service contracts, such as maintenance
• Consulting
Within the areas of forecasting that can be standardized, a few general observations can be made:

Forecasting at the Appropriate Level


First, forecasting should be detailed enough to take into account significant differences in costs
without encumbering the process with too many details. For example, when forecasting labor
costs, do not simply use a head count and an average cost per person; labor rates vary and the
forecasts should reflect that. For example, a service desk technician and network operation
technician could be grouped into the same general pay level, and database administrators and
project managers are grouped into another. This way, if changes are needed to the forecast, for
example, another 10 service desk technicians are added, the forecast can more closely reflect the
actual costs.

124
Chapter 6

Differing Patterns of Cost Growth


Keep in mind that some costs tend to grow gradually and continuously and others have jumps in
costs. Adding more disk space to a storage array will have a relatively linear growth rate; adding
one disk might cost X, adding two disk costs 2X, adding three disks cost 3X, and so on. Other
costs, especially labor costs, have growth rates that look more like steps rather than steady
inclines.

Forecasted Costs

160000
140000
120000
100000
Costs

Labor
80000
Hardware
60000
40000
20000
0
1 2 3 4 5 6 7 8 9 10 11 12
Months in Future

Figure 6.4: Patterns of growth in cost can vary, some are continuous and others are more step-like.

For example, the number of servers can increase for a while before an additional systems
administrator will have to be added to the staff. However, the total cost of adding that one server
that necessitates hiring another administrator is far higher than the cost of adding the previous
server. This interaction among resource types must be accounted for when forecasting.

Accounting for Cash Flow


In some organizations, annual budgets are established and funds are available immediately for
expenditures. For example, in a government agency with an approved capital budget, the funds
are generally available for use once all the approvals are in place (assuming no further
restrictions on the funding). In other organizations, especially businesses, plans may be subject to
cash flows.

125
Chapter 6

Cash flow, essentially the money coming into a business minus the funds going out, can vary
over time, and expenditures must be timed to occur after sufficient cash is on hand. For example,
if the IT department plans to purchase additional servers and hire a new systems administrator,
the business needs cash on hand to pay for the hardware and meet payroll. When forecasting,
consider the timing of cash flows in the business, especially if your business is subject to
seasonal variations.
When forecasting, it helps to distinguish types of costs and how their growth patterns vary. It is
especially important to watch for costs that introduce jumps in the total cost of a project or
operation as well as the timing of expenditures that should be staged according to expected cash
flows in the organization.
It should also be noted that forecasting for operational expenses, such as labor, leases, and small
equipment, requires a different type of analysis than major expenditures for equipment with
multi-year life spans. Those large expenditures warrant a more investment-oriented approach
known as capital expenditure analysis.

Capital Expenditure Analysis


Capital expenditure analysis focuses on the purchase of equipment with relatively long lifetimes
(in the IT world, this period seems to be about 3 years, give or take a year). These purchases are
essentially investments, and questions arise about these investments, just like any other. How
much is this investment worth in today’s dollars? What kind of return on investment (ROI) can
you expect? Which piece of equipment is the better investment, A or B? Fortunately, there are
well-established methods supporting capital expenditure analysis. Three commonly used
calculations are:
• Net present value (NPV)
• ROI
• Internal rate of return (IRR)
These measures can be used separately or together to help formulate a decision about a particular
investment.

NPV
The NPV of an investment is a measure, in today’s dollars, of the value of future savings or
returns due to an investment made today. To determine the value of future savings or returns,
you must take into account the present value of money. For example, if you were given the
choice of receiving $1000 today or $1100 dollars one year from now, which choice would
maximize your revenue? That would depend on the interest rate of money in the open market. If
the interest rate is 5 percent per year, then having $1000 to invest today would yield $1050 in
one year; the better investment would be to wait and receive the $1100 in one year.
The interest rate used in this calculation is known as the discount rate. It is used to determine the
relative value of an investment. The NPV calculation takes into account the fact that savings or
returns accrue over time, and it uses the discount rate to account for changes in the value of
money over time.

126
Chapter 6

Let’s look at an example to see how this works: The IT department is considering investing in a
new database server to replace two existing servers. The cost of the database server is $50,000.
The department estimates that it will save $15,000 per year in maintenance, service contracts,
and licensing costs. Will the investment in a new server save money in the long run?
To answer that question, you use the formula for calculating NPV. Assuming the useful life of
the database server is 3 years, the formula for NPV is:
Amount saved in Year 1 / (1 + Discount Rate) +
Amount saved in Year 2 / (1 + Discount Rate)2 +
Amount saved in Year 3 / (1 + Discount Rate)3
Assuming a 5 percent discount rate, the calculations are shown in Table 6.2.
NPV Calculations

Amount Value per


Year Saved Discount Rate Year
1 $15,000 0.05 $14,286
2 $15,000 0.05 $13,605
3 $15,000 0.05 $12,958

NPV: $40,849

Table 6.2: Example NPV calculations.

In this example, the NPV of the investment is $40,849, less than the $50,000 investment. Unless
there are other reasons to make the investment, the organization would be better off keeping the
$50,000 in the bank than spending it on the server. Another commonly used measure in capital
expenditure analysis is ROI.

ROI
ROI is a commonly used measure for a number of reasons; ROI
• Takes into account the total cost and benefit of an investment
• Is expressed as a percentage, not a dollar amount, so it is easy to compare ROIs for
different investment options
• Is well known, perhaps in large part to the first two reasons
ROI is a calculation that takes into account the present value of future savings (like the NPV
calculation), increased income generated by the investment, and the initial costs of the
investment.

127
Chapter 6

In the NPV calculation, you started with the amount saved in a given year. With the ROI
calculation, you start with the net benefit of an investment for a given year. The formula for net
benefit is:
Net Benefit = Savings due to Investment + Increased Revenue due to Investment –
Recurring Costs
The net benefit fits into the ROI formula which is similar to the net present value formula. For a
three year period, the ROI formula is:
[ Net Benefit in Year 1 / (1 + Discount Rate) +
Net Benefit in Year 2 / (1 + Discount Rate)2 +
Net Benefit in Year 3 / (1 + Discount Rate)3 ] / Initial Costs
Let’s use the formulas in an example. An organization is considering an investment in a network
security appliance. The appliance will allow the IT department to retire or repurpose two servers
running content-filtering and antivirus software. The appliance requires less time to administer
than the two servers currently running security countermeasures, so there will be some savings
on labor. The appliance will also filter traffic faster, allowing for the rollout of new Web-based
services expected to generate additional revenue in the future. What is the ROI?
Start by calculating the net benefit for each of the next 3 years as shown in Table 6.3.
Additional Recurring Net
Year Savings Revenue Costs Benefit
1 $10,000 $10,000 $20,000
2 $20,000 $40,000 $5000 $55,000
3 $10,000 $80,000 $5000 $85,000

Table 6.3: Net benefit calculations for security appliance investment.

The savings are due to expenses that would be incurred if the existing servers and applications
running on those servers are kept. Year 1 and year 3 consist of software license costs,
administration costs, and routine maintenance costs. Year 2 includes those as well as several
hardware upgrades or replacements expected based on mean time between failures (MTBF) of
several of the server components.
The additional revenue is due to the fact that a new service can be offered because of the higher
throughput available from the security appliance. The first year will consist primarily of a small
pilot program and initial marketing efforts. Projections for the plan estimate significant growth
starting in the second year and continuing into the third year. The recurring costs are the costs
associated with maintenance. These include minimal administration charges as well as
maintenance fees charged by the appliance vendor. The net benefit is calculated according to the
formula shown earlier.

128
Chapter 6

You can now move on to the ROI formula. Assuming a 5 percent discount rate, and an initial
cost of $25,000 the ROI is:
[ $20,000 / (1 + 0.05) +
$55,000 / (1 + 0.05)2 +
$85,000 / (1 + 0.05)3 ] / $25,000 = 569%
This is clearly a good investment option, due largely to the increased revenue enabled by the new
appliance and, to a lesser degree, to the savings of the expenditures in the second year to
maintain the existing servers. ROI is a broadly understood and easily used calculation. Another
capital expenditure calculation that allows for comparison between projects is IRR.

IRR
IRR is a percentage, and is thus similar to the ROI rate; however, IRR does not depend on
knowing, or estimating, a discount rate. Rather, IRR calculates the discount rate for which the
NPV of an investment is zero. The advantage of this approach is that it is easy to compare two
projects to determine which is a better investment regardless of the size of the investment.
IRR is an iterative calculation that starts with the initial cost of an investment and subsequent
revenues or savings generated by the investment. Microsoft Excel provides a built-in function
IRR as well as a modified version of IRR, called MIRR, that address some of the shortcomings
of IRR in some situations.

Some financial analysts have questioned the use of IRR in capital expenditure analysis because, they
claim, it makes some poor investments look better than they actually are in some cases. See John C.
Kelleher and Justin J. MacCormack’s “Internal Rate of Return: A Cautionary Tale” at
http://www.cfo.com/article.cfm/3304945/1/c_3348836 for more information.

Capital expenditure analysis is an important part of IT financial management. Like forecasting, it


helps formulate plans and budgets for operations and projects, which, in turn, need to be
financially managed on a day-to-day basis.

129
Chapter 6

Operations and Project Financial Management


Within the day-to-day operations of IT departments, the main activities are roughly divided into
two types: operations and projects. Each has particular needs when it comes to management.

Operational Management Issues


Operations management is focused on the standard operating procedures in place within an
organization. These can include:
• Performing backups on servers and client devices
• Monitoring network performance
• Reviewing audit logs
• Granting privileges to users and provisioning user accounts
• Populating the data warehouse and generating management reports
• Performing service desk support
• Responding to incidents such as hardware failures
• Installing patches and other software upgrades
These tasks are part of the relatively stable set of operations that constitute the IT support for
running an organization. The issues most relevant to managing operations are:
• Staffing and training
• Meeting service delivery and other performance measures
• Tracking budget expenditures
• Seeking additional funding for unanticipated costs
As these tasks are done repeatedly, they are often, or at least should be, defined within standard
operating procedures. There are, however, times when IT has to perform tasks outside of the
routines defined for day-to-day operations. When these tasks are substantial and sizeable enough,
they are managed as projects.

130
Chapter 6

Project Management
Project management is a well-defined practice for achieving a set of one-time objectives, such as
developing an application or migrating a service, such as email, from one platform to another.
Unlike operations, specific projects are usually not repeated. However, the nature of projects is
sufficiently similar to warrant a set of best practices. The following sections summarize the core
activities within project management, especially as they relate to financial management.

Project Management Tasks


The main project management tasks are as follows:
• Planning project work—This task entails determining the exact nature of project
deliverables (for example, a functioning program), the delivery schedule, and the
resource required to accomplish the task.
• Performing risk analysis—Like risk analysis for IT management in general, the objective
of this task is to identify risks and their likelihoods, and mitigate those risks when
possible.
• Estimating and allocating resources—Different skills are needed at different times of a
project; to operate efficiently, projects should staff only resources as they are needed.
Similarly, hardware resources should be allocated as needed but with sufficient time to
detect and adjust for unforeseen dependencies and incompatibilities.
• Assigning tasks and directing activities—These are the day-to-day items that project
managers tend to focus on. The objective is to keep on schedule and to address problems
early.
• Tracking and progress reporting—Another job of the project manager is to report the
status of the project to management and seek help when problems cannot be resolved by
just the project team; for example, when a business partner fails to meet a contractual
agreement, thus putting the project deliverable and schedule in jeopardy.
• Performing post-project analysis—The objective of this task is to learn from the projects,
especially from mistakes that may be avoided in future projects.
Over time, a number of project documents have emerged as part of project management best
practices that help to control and document the state of projects as they execute.

131
Chapter 6

Project Management Documentation


As projects are one-time activities often requiring skill sets from groups or departments that
might not regularly work together, it is important to have a well-defined procedure for
communicating expectations about the goals of the project, controlling the execution of the
project, and reporting on it status. The following list highlights the core documents used for these
purposes:
• Project charter—This charter provides a definition of the scope of the project, its
objectives, and its lifespan.
• Business justification—This document provides the argument for pursuing the project. It
may include ROI or other investment analysis measures.
• Work breakdown structure—This document is a precursor to a project plan that lists the
tasks to be accomplished and the activities that will have to occur to accomplish each
task.
• Risk management plan—This plan documents the risks to the project, the likelihood of
each risk, and a mitigation strategy for each high-impact and highly likely risk.

Not all identified risks must have a mitigation strategy; it is best to focus on those that could be the
most disruptive.

• Project plan—This document is a combination of a resource management plan, a project


schedule (including Gantt charts), and task assignments.
• Status reports—These are often short (one page) summaries of the state of the project.
The reports note what was scheduled to be accomplished in the reporting period
(typically one week), what was actually accomplished, planned work for the current
reporting period, explanations for variations from the schedule, and a list of issues
requiring management attention.
Project management presents challenges not found in operations management. At the same time,
a well-developed set of best practices exist for managing projects. Anyone responsible for
project management is strongly encouraged to use them.

For more information about best practices in project management, see resources at the Project
Management Institute, http://www.pmi.org/.

Financial and service level management are major parts of service delivery management. More
narrowly focused but still essential areas of service level management are addressed in the
remainder of this chapter.

132
Chapter 6

Capacity Management
Capacity management is the practice of understanding the demands for IT resources, such as
computing, memory, storage, and network bandwidth, and ensuring adequate resources are in
place when they are needed. This is done in three ways:
• Performance management
• Workload management
• Application sizing and modeling
These are closely related but address the needs of capacity management in slightly different
ways.

Performance Management
Performance management entails monitoring systems to ensure resources are used efficiently, to
detect trends in the growth or reduction in the needs for particular resources, and to identify
performance bottlenecks. When a performance problem occurs, the key to resolving the problem
is identifying the point in the process that is causing the slowdown.
Consider a customer management system that generates reports on sales activity. The company
has been growing and sales activity is increasing; at the same time, the report-generation process
is taking longer and longer, out of proportion with the growth in sales activities. The causes of
the problem could be:
• Insufficient memory in the database server
• Insufficient bandwidth on the network
• Poorly coded SQL within the database application that does not scale well
• Insufficient CPU capacity for the number and size of the reports
It is critical to identify the bottleneck. If the problem is poorly written SQL code, adding more
processors and memory may reduce the problem; however, this option will incur a significant
expense and will probably work only for a short time. If the problem is insufficient memory, the
CPU is probably not being used to capacity; adding faster or additional CPUs will not reduce the
problem. Still another possible solution is to adjust the overall load on the system.

133
Chapter 6

Workload Management
Workload management entails understanding the full set of processes that must be run, their
dependencies, and their resource requirements and scheduling jobs and resources to maximize
the use of computing resources. The first step to workload management is identifying the
resources needed by each job. Some will require large amounts of bandwidth but little CPU, such
as transferring data for a data warehouse load; others will be both disk and memory intensive,
such as sorting large data sets for reports.
The second step is to schedule complementary jobs so that the contention for a single resource is
minimized. Assuming there are not linear dependencies between the jobs (for example, job A
must finish before job B starts), processes with different resource requirements should be
scheduled together. Another rule is to schedule jobs early when there is a dependency on them
by multiple other jobs. This method maximizes scheduling options of the later jobs. The other
core process in workload management is monitoring. This should focus on both current
performance and trends in growth or reduction in the need for particular resources. Another area
of capacity management entails analyzing the needs of new applications.

Application Sizing and Modeling


The purpose of application sizing and modeling is to understand the capacity demands of an
application before it goes online. It can be far more cost effective to understand the computing,
storage, and bandwidth requirements before investing in hardware than after. The application
modeling process should take into account several factors:
• The types of processes in the applications, such as CPU-intensive calculations or data-
intensive database queries
• The relative frequency with which these different types of jobs will be executed
• The number of users and the times they will execute jobs on the system
• Outside constraints, such as time restrictions on when jobs are executed so that dependent
jobs can meet their service level schedules
Application sizing and modeling entails elements of forecasting, so it will not yield certain
results. When dealing with application sizing, it is best to plan for ranges of uses (for example,
10 to 100 users with moderate reporting demands or 2000 to 3000 users with high reporting
demands). This gives the business sponsors the option to invest for the level of capacity while
planning for future levels and to mitigate risks if the exact level of demand is not known.

134
Chapter 6

Availability and Continuity Management


The goal of availability and continuity management is to ensure that IT systems are available for
use when they are needed. Specific requirements should be detailed in SLAs. The requirements
will usually be defined in terms of access to the system and performances level and not
necessarily list the types of disruptions that can interrupt service. IT managers, responsible for
availability and continuity management, will have to understand and address a variety of
potentially disruptive problems.

Availability and SLAs


The objective of availability management is to meet service level requirements; this is done by
monitoring and responding to key metrics. These metrics can address several aspects of system
availability:
• Server availability—For example, does the server respond to a ping request?
• Acceptable performance—For example, do key database queries return results within a
predefined length of time?
• Security—Is data transmitted confidentially, is the server protected by anti-malware
programs, and so on?
When metrics indicate that SLAs are not met, a procedure should be in place to respond. For
example, if application response times are slowing because of an increase in the number of users,
secondary jobs can be shutdown or their priority lowered to free CPU capacity for the critical
jobs.
There is also the possibility of a catastrophic disruption that leaves entire systems unavailable.
For example, a natural disaster could destroy the primary data center housing servers and
network equipment supporting a customer management system. This level of disruption is
addressed by continuity management.

Continuity Management
The goal of continuity management is to ensure business operations are able to continue in case
of a significant disruption in multiple services. IT continuity planning should be done as part of a
broader exercise in business continuity management.

135
Chapter 6

If there is a significant disruption in services, the business should determine which systems are
mission critical and the order in which they should be brought back online. There are also
financial considerations in continuity management. How much should be spent to ensure the
customer management application is available in the event of a disaster at the primary data
center? How long can the system be down before the business suffers adverse impacts? These
questions are best answered using a formal risk analysis procedure that includes:
• Identifying assets
• Assessing the value of assets
• Identifying potential threats to assets and the likelihood of their occurrence
• Prioritizing the allocation of resources based on asset value and threat level
The purpose of availability and continuity management is to keep systems up and running at a
level that meets SLAs. The tasks involved are varied because the problems addressed range from
the relatively minor (the system is sluggish for short periods of time) to major (operations need
to be relocated to a backup data center).

Summary
Service delivery entails many tasks, both operational and management. The previous chapter
examined the operational aspects; this chapter covered the management side of service delivery.
Although management issues are broad, they are dominated by service level management and
financial management. If these areas are well managed, three other key areas—capacity
management, availability management, and continuity management—are already well on their
way to being effectively implemented. The next chapter continues to examine elements of
systems management with a focus on application, software, and hardware management issues.

136
Chapter 7

Chapter 7: Implementing Systems Management Services,


Part 3: Managing Applications and Assets
Networks, servers, and client devices alone do not address the information needs of an
organization—applications, and their associated data, customize the functions of an otherwise
generic infrastructure and allow IT to meet the information management requirements of
businesses, agencies, and other organizations. The ability to finely customize software to meet
particular needs makes it a key to aligning information services to business strategy. At the same
time, the flexibility introduces a wide variety of management challenges. These challenges have
by no means been completely mastered, and software developers continue to create and refine
new development methodologies. There are, however, common elements to application
management frameworks. This chapter will examine the challenges of application management
from the perspective of application life cycle management and software asset management.
Application life cycle management entails how applications are created and deployed. Once
constructed, or otherwise acquired, software applications are assets that must be managed as any
other information asset. Of course, applications do not exist in a vacuum, and dependencies
between applications must be understood to ensure they function properly. Another key to proper
functioning is adequate security to protect the integrity of the application as well as the integrity
and confidentiality of its related data. Finally, despite many differences with other kinds of
assets, applications are assets and must be managed as such.

Application Life Cycles


The application life cycle is the series of steps an application goes through from the time a need
arises for the application until the time the application is retired. The application life cycle is
often complex with multiple possible paths through it. In cases of large, complex applications,
different parts of the application may be in different stages of the life cycle. Without a doubt,
managing the development of a complex application is one of the most challenging tasks in IT.
Part of the process of controlling that complexity is breaking down the development of the
application into manageable, logical stages:
• Business justification
• Requirements gathering
• Analysis and design
• Development
• Testing
• Deployment
• Maintenance

137
Chapter 7

Again, it is worth noting that although this list and the discussion that follows might give the
impression that the life cycle logically marches from one stage to the next, never veering from the
predefined sequence, that is often not the case. New questions may arise during the analysis and
design stage that trigger revisions to requirements. Testing might reveal unanticipated combinations
of conditions that force a redesign of a module. Of course, the business justification for an entire
project can change if there is a change in business conditions, leading management to scrape
everything developed to that point.
In addition, it should be noted that software developers use different methodologies for creating
applications. Most of these methodologies use the stages described in this chapter in one form or
another. The major differences in methodologies tend to focus on whether to use one or more passes
through these stages and how much to try to accomplish at each iteration through the stages. (See
the sidebar “Software Development Methodologies” for more information about this topic.)

Figure 7.1: The dominant progression through the life cycle follows the solid lines, but in practice, there are
many other paths through the life cycle as shown by the dashed lines.

The first step in the application life cycle is initiated by an organizational need.

138
Chapter 7

Business Justification
Why would an organization commit resources—money, staff, and time—to developing or
acquiring an application? There must be some benefit that outweighs the cost, of course.
Sometimes an organization may make a decision to invest in the development of an application
because the organization believes it will be a key to strategic success. Small startups can work
like this. A few developers and managers with a vision for starting a new business can be
justification enough. In larger organizations—such as midsized and large corporations,
government agencies, educational institutions, and major non-profits—a more formal approach is
usually required.
A business justification is essentially an argument for developing or purchasing an application
because it will serve a need of the organization. These documents often include:
• A description of the current state of the business or organization and a missing service or
unrealized opportunity.
• An overview of the benefits of implementing the proposed application, such as improved
customer service, which leads to higher customer retention rates; reduced cost of
manufacturing a product by eliminating older, higher cost IT systems used to support the
current manufacturing process; or higher throughput of a transaction processing system,
which will lower the marginal cost of each transaction.
• A formal assessment of the costs of the proposed project. Cost analysis often includes
measures such as the return on investment (ROI) or the internal rate of return (IRR),
which quantify the financial impact of the project and aid in allocating resources among
multiple proposed projects.
• A discussion of the risks associated with a proposed project. Any projection, such as a
business justification, is based on assumptions. The risk discussion points out what could
go wrong and how those risks can be mitigated.
The business justification should also demonstrate how the proposed application will further
align IT services with the strategic objectives of the organization. There are probably many
applications that can be cost justified but still do not align with strategic objectives. The goal of
deploying IT applications is to further the objectives that have already been defined; it is not to
introduce side services that might generate revenue for the organization but distract from core
services. Once it has been demonstrated that an application will serve the broader business
objectives of an organization, the application project is formalized, and the requirements-
gathering phase begins.

139
Chapter 7

Requirements Phase
The purpose of the requirements phase of an application project is to define what the application
will do. At this point, the question of how the application will operate is not addressed, that is
left for the analysis and design stage that follows. The key topics that should be addressed during
requirements gathering are:
• Functional requirements
• Security requirements
• Integration requirements
• Non-functional requirements
There is some overlap between these areas and requirements entail dependencies with
requirements in other areas as well.

Functional Requirements
Functional requirements are composed of use cases and business rules. Functional requirements
begin with the development of use cases, which are scenarios for how an application may be
used. Use cases include descriptions of how users, known as actors, interact with the system to
carry out specific tasks.
Uses cases typically include:
• A use case name and a version, such as “Analyze sales report, version 3”
• A summary briefly describing what the actor does with the system—for example, in the
“Analyze sales report” use case, the actor might authenticate to the application, enter date
and regional parameters, format data in tabular or graphical form, and sort and filter data
as needed
• Preconditions (conditions that must be true for the use case to be relevant)—for example,
a precondition of the “Analyze sales report” use case is that the data warehouse providing
the data has been updated with the relevant data
• Triggers are events that cause the actor to initiate the use case; an event such as needing
to calculate the distribution of inventory to regional warehouses will trigger the analysis
of most recent sales
• Primary and secondary sequences of events within the use case—for example, the
primary events sequence describes the typical steps to retrieving and displaying sales
data, and the secondary sequence describes what occurs when an exceptional event
occurs
• Post conditions describe the state of the system after a use case is executed; data may be
updated, other functions may be enabled, and other use cases may be triggered
The purpose of use cases is to describe, in high-level detail, specific functions. The finer-grained
details are captured by business rules.

An introduction to use cases and related modeling topics can be found at http://www-
128.ibm.com/developerworks/java/library/co-design5.html.

140
Chapter 7

Business rules are formal statements that define several aspects of information processing
• How functions are calculated
• How data is categorized
• Constraints on operations
Business rules are specified early in the application development process because so much
depends on them. For example, if a sales analysis system is proposed, it must be understood
early on how to calculate key measures such as gross revenue, marginal costs, and related
metrics. It must also be determined whether multiple definitions must be supported. Take
marginal cost calculations, for example. The division responsible for manufacturing a product
might include the cost of materials and equipment in the marginal cost calculation; whereas, the
finance department might include those costs plus the sales commission paid to sell the product.
This is an example of a single term meaning multiple things depending on the context.

Like use cases, formalism has been developed to standardize the definitions of business rules. The
Business Rule Markup Language, http://xml.coverpages.org/brml.html, is an open standard for
incorporating business rules into applications.

Security Requirements
Security requirements for an application should be defined along with functional requirements.
Implicit in every functional requirement are the questions “Who should be able to use this
function?” and “When can this function be used?” These broad questions, in turn, are answered
by more precise, but not detailed, questions such as:
• In what roles will users be categorized? The privileges and rights to use the application
and functions within the application should be based on the roles granted to users.
• How is the data used in the system categorized according to sensitivity classifications? Is
it public data, sensitive information that should not be disclosed outside the organization,
or private information whose distribution is controlled by the person it is associated with?
Is it secret information, such as proprietary information, trade secrets, or comparable
information?
• How will the application be accessed? Will remote users have access? If so, is it through
a public Web site, a restricted Web site requiring a username and password, or through a
VPN?
• What authentication mechanism is required to access the program? Is a
username/password scheme secure enough? If so, what is the password policy? Is multi-
factor authentication required? If so, what types of authentication mechanisms are
required? These could include smart cards, challenge/response devices, or biometric
devices.

141
Chapter 7

The scope and detail of security requirements are slightly different from functional requirements.
In the case of security, it is common to delve into the “how” instead of addressing only the
“what.” For example, the need for biometric authentication is really an implementation issue that
would not be specified if it were a functional requirement. However, security requirements may
be dictated by constraints outside the scope of the project. A financial services company, for
example, may decide that to remain in compliance with government regulations, biometric
security measures are required for all applications that reference customer account information.
The designers of the application will have no choice in the matter; if the application they are
developing accesses customer data, it is required to use biometric security measures. In cases
such as this, it is important to document these requirements before the analysis and design phase
begins.
Security requirements should also address:
• Compliance requirements
• Any restrictions on the time of access to the application
• How identities are managed? For example, will all users be registered in an organization-
wide Active Directory (AD) or LDAP directory?
• The federation of identities (that is, relying on identity information managed by another
party) if third parties are granted access to the application
• Encryption requirements, including the strength of encryption
• Policies on the transfer of data from the application. For example, can users download
data and store it locally on their workstations? Can they store data on their notebook
computers or other mobile devices?
• Any security policies and procedures that are relevant to the application
Security requirements sometimes have to address how the application will operate with other
applications or data sources.

Integration Requirements
It is difficult to imagine an application that will not integrate with some other application or data
source. Rarely are today’s applications islands unto themselves. For this reason, it is helpful to
understand the ways in which applications share services and data among themselves.

142
Chapter 7

In addition to security issues, integration also requires a coordination of:


• Data flows; for example, if Application A generates data used by Application B, what are
the rules governing how it will move from A to B? How often will data be moved? Will
an event in A trigger the movement of data to B, or vise versa?
• Data exchange protocols; for example, must the application under design use an existing
protocol to exchange data? If a detailed, application-specific protocol does not exist, is
there a required method, such as Simple Object Access Protocol (SOAP), that will be the
basis for integration?
• Support services required, such as services provided by a server OS, such as network file
system (NFS) and file transfer protocol (ftp).
• Database-level integration, such as access to source systems on a mainframe database,
relational database, or other repositories.
As with functional requirements, the goal of defining integration requirements is to identify what
is required, not specify how it will be accomplished.

Non-Functional Requirements
The term non-functional requirement is a catch-all used to describe requirements that are not
captured in the other categories. Some designers would include security and integration in the
non-functional category; however, their importance and complexity is often far greater than the
other non-functional requirements and therefore warrant a more detailed discussion. The
remaining categories of non-functional requirements include:
• Backup and recovery
• Performance levels
• Service availability
• Service continuity

143
Chapter 7

Figure 7.2: Example integration of application with other servers and services.

Many of these non-functional requirements overlap with systems management responsibilities. See
Chapter 2 for more information about these topics.

Backup and Recovery


Backup and recovery requirements dictate the type of backups performed and their frequency.
Full backups, which include all data and files associated with an application, may be combined
with incremental backups, which back up changes since the last full backup or since the last
incremental backup. Full backups take longer and require more media than incremental backups,
but recoveries can be faster than if full and incremental are used. In theory, a single full backup
followed by a series of incremental backups would allow a systems administrator to perform a
full recovery. In practice, a cycle of one full backup followed by several incremental backups is
more typical.

144
Chapter 7

Performance Levels
Performance levels define the expected response times to users and the number of users that can
be supported. This information is needed to size hardware appropriately. The number of CPUs,
memory, and network bandwidth required will depend, in part, on the expected performance
levels.

Service Availability
Service availability addresses the extent to which the application will be available. For example,
mission-critical applications may be expected to be up 24 hours a day, 7 days a week. In practice,
except for the most demanding applications, service windows are reserved for outside of peak
operational hours to attend to upgrades, patches, and other maintenance. When true 24 × 7
service is required, servers are configured in a cluster or failover configuration that improves
uptime and allows for a rotating maintenance schedule across the constituent servers.

Service Continuity
Service continuity requirements specify what is expected in the case of service disruption, such
as a natural disaster that shuts down a data center. These requirements are dictated by the need to
have the application available, the duration which the application can be down without impacting
the organization’s operations, and, of course, the cost of equipping and maintaining an off-site
facility.
Gathering and defining requirements for applications is an essential step in the application life
cycle. Functional requirements define what an application is to do, security requirements specify
the level of confidentiality and integrity protection is required, integration requirements deal with
how the application will function within the broader IT infrastructure, and finally the non-
functional requirements define the parameters needed to support several core systems
management services, such as backups and service continuity. Once the application requirements
are defined, the life cycle moves into the analysis and design phase.

Analysis and Design


The analysis and design phase of application development marks the transition from describing
what is to be done to defining how the application will accomplish its task. The steps in this
phase can be broken down into three broad categories:
• Creating a framework of a solution
• Making build-vs.-buy decisions
• Defining detailed design
The phase begins by mapping out an overall picture of how the application will function.

145
Chapter 7

Solution Frameworks
A solution framework is a high-level design of an application that describes the major modules
within the application as well as the architecture that encompasses and integrates each of the
modules. Although applications have different architectures, an increasingly common model is
based on three or more tiers:
• Data services
• Application services
• Client services
Figure 7.3 shows a simple example of such a model.

Figure 7.3: The multi-tier model is a common framework for applications.

For simplicity, this diagram depicts a three-tier model. However, within the middle tier there may be
multiple levels of application services providing functionality for other modules within the application.

146
Chapter 7

Data Services Tier


Data services are provided by one or more databases. The majority of databases in use today are
relational databases. The relational model has proven to be the most effective approach for a
wide range of applications, providing both a rich set of database management features as well as
performance that scales well to multi-terabyte scales. Some applications still depend on older
database models, including COBOL files and hierarchical databases, but these are usually
associated with legacy applications. These are not common choices for new application
development.
It is not uncommon for an application to use multiple databases for a single application. For
example, data warehouses and other business intelligence applications often draw data from
multiple source databases. For performance and ease of integration, data warehouses often
depend upon copying data from source systems and storing it in a scheme more amenable to
high-performance reporting (see Figure 7.4).

Figure 7.4: In the data warehouse model, data is first integrated in a separate data store and then processed
by an application server.

Other systems, such as order processing systems, may use multiple, independent databases. For
example, a financial services company may allow customers to access their checking accounts,
mortgage statements, and credit card activity all from a single Web application. The data,
however, is stored on three different systems, each one dedicated to managing one type of
account (see Figure 7.5).

147
Chapter 7

Figure 7.5: In many applications, multiple data sources are integrated directly in the application server.

During the framework modeling process, the source systems and how they will function together
is determined. These data service providers are used by application services that occupy the
middle tier of the architecture.

Application Services Tier


The application services tier is where the bulk of an applications work occurs. In any application
that is more that a simple data storage and retrieval application, the middle tier is responsible for
a wide range of functions, including:
• Integrating data drawn from multiple data sources
• Implementing workflows that dictate how data is processed
• Enforcing business rules, such as verifying credit status of a customer before completing
an order
• Triggering complex events, such as ordering items when stock drops below a predefined
threshold
• Providing an infrastructure to enable communications and coordination of multiple
services, such as Web service functionality

148
Chapter 7

The middle tier depends on a variety of infrastructure applications including:


• Web servers
• Application servers
• Database federation systems
• Portal frameworks
• Business rule engines
• Workflow engines
It is these components that manage the data and services that are used by the client layer to
support interaction with users.

Client Tier
The client tier is responsible for rendering information provided by the data services and
application services tiers. The client tier is becoming more challenging to manage and develop
for as the options for clients expand.
Conventional workstations and notebooks are now complemented with PDAs and smart phones
as application clients. This reality requires systems designers to develop for multiple platforms
using multiple protocols. For example, HTML, the staple of Web application development, will
not necessarily meet the functional requirements of mobile clients such as PDAs and smart
phones; alternative methods are required.
Frameworks are skeleton designs of how an application is organized. It is at this point that
systems managers can start to see how the application will fit into the existing network and
server infrastructure, what additions will be needed to meet hardware requirements, and what
additional loads will be put on network services. This is also the point at which decisions are
made about which components of the application to build and which to buy.

Buy vs. Build


The buy-vs.-build question, as it is often framed, is something of a misnomer. The phrase implies
a binary decision—either you custom build an application or buy commercial off-the-shelf
software (COTS) or use open source software. In practice, this is not a black or white decision.
Often there is a mix of some purchased and some customized development. It is useful to
distinguish between different points in along a buy-vs.-build continuum.
The combinations of buying and building include:
• Buying a turnkey system using commercial or open source software
• Buying a turnkey system based on commercial open source software but customized by a
third party
• Buying a commercial package or using an open source package and customizing it in
house
• Buying components or using open source components to build a custom application in
house
• Using commercial or open source tools to build the application from scratch

149
Chapter 7

In practice, few organizations outside of software development firms will start with tools and
build from scratch. Similarly, unless the application required provides a common, commodity
service, such as a backup and recovery program, few organizations will avoid at least some
configuration and customization of major applications.
The process of making the buy-vs.-build decision includes determining:
• The functional components of the application
• The communication protocols between the components
• The constraints on the application, such as the types of databases that will provide data to
the application
• The development skills available in house, or readily available
• The time to deliver the application
• Options in the commercial market and open source for components
• Availability of subject matter experts who can support design
Ideally, the end result is a balanced approach that leverages existing components while reserving
custom development for key components that add competitive advantage and cannot be
adequately implemented using existing systems.
With tools and components identified, the detailed design can begin. The more components or
existing packages are used, the less there is to design. At the very least, a detailed configuration
of a turnkey system should be in place before the system is deployed to a production
environment.

Detailed Design
The goal of the detailed design stage is to create a document suitable for programmers and
systems administrators familiar with the selected tools and components to build the application.
At this point, the requirements and overall architecture should be defined and the task is to
identify how the requirements will be met.
In practice, designers will discover elements of the application that were not considered during
requirements or find that requirements have changed (even with short development cycles,
requirements can change before detailed design is complete). These discoveries can trigger
review of functional requirements, non-functional requirements, and architectural design. These
discoveries are so common that they have prompted the creation of several design
methodologies. From a systems management perspective, this demonstrates that the supporting
infrastructure originally planned for a new application deployment may not be what is actually
deployed when the system design is finally completed. Once the design is in place, the
application life cycle can move to the development stage.

150
Chapter 7

Development
Development entails building applications and application components. Many books have been
written on this subject and it is well beyond the scope of this chapter and this guide to try to
address the practice of software engineering. There are, however, three topics relevant to systems
management that are worth addressing:
• Source code management
• System builds
• Regression testing
Each of these entails software artifacts that, like other assets, require a structured management
regimen.

Source Code Management


Source code for an application is often developed by multiple programmers over long periods of
time. Source code developed for one project may be reused in another project, either as is or with
changes to suit the needs of the application. Source code is a challenge to manage for a number
of reasons, including:
• Software modules often depend on other modules that may be under development at the
same time.
• No two developers should be allowed to change the same resource at the same time;
check-in and check-out mechanisms are required to prevent the lost development work.
• A single module may have multiple versions for different platforms; for example, an
interface designed for Windows may have to be re-written for a Linux platform.
Source code management systems are commonly used in software development efforts. These
systems address the challenges outlined earlier and, like configuration management systems,
become essential when systems reach a certain level of complexity.

System Builds
A system build is the process of gathering the component modules under development and
creating an executable application. Once enough components have been developed to have even
the most basic functions, system builds are used to ensure development continues in such a way
as to not break (at least not too badly) previous work. A system build is a minimal test of the
code under development. If an application’s modules and libraries can be compiled into an
executable application, the specific functions of the system can be tested.

151
Chapter 7

Regression Testing
Regression testing is the practice of testing applications or modules after small changes to ensure
that previously functionality components have not been broken by the introduction of bugs in
new code. Regression tests can be automated and the results compared with previous results.
This type of testing is not the full-scale system testing done prior to releasing a piece of software.
Regression testing is often done automatically after building an executable application. When
software is sufficiently constructed and tested by developers, it moves to the quality control–
focused level of testing, typically carried out by a testing team that does not include developers.
Software Design Methodologies
Software design methodology is one of those topics that can trigger seemingly incessant debate among
software developers. Over the past decades, a number of methods have been proposed, all with some
variation on top-down or bottom-up design. Although there are a number of minor variations on the major
models, we will focus only on the major ones, which are:
● Waterfall model
● Spiral model
● Agile model
The waterfall model is a linear approach to software development. According to the waterfall model, one
starts by gathering requirements, then develops a high-level design followed by a detailed design, builds
the code according to the design, tests it and correct bugs, and then deploys it. The advantage of this
model is that it is intellectually simple and easy to understand. The disadvantage is that it does not work
in most software development projects. The world does not proceed in the lock-step fashion assumed in
the waterfall model. Requirements change and this model does not adapt to that. The spiral method was
developed to avoid the fatal flaws of waterfall while maintaining the structured approach that does serve
the goals of software engineering.
Through the spiral approach, developers build software iteratively and assume that requirements will
change and that during the process of developing a system, new information is gleaned that will help in
the development of other parts of the system. Rather than build an application in one pass through the
structured stages, spiral methodologies build a set of functions in each iteration through the structured
stages.
In theory, proponents of waterfall methodology might argue that a skilled requirements gatherer could find
all the requirements early in the development cycle. Even if someone did have the mythic skills to
elucidate all the requirements in the precise detail needed, this does not account for the cost of gathering
those requirements. It is a well-known principle in economics that the cost of producing one more item
may not be the same as the cost of producing the previous item. In the case of gathering requirements,
the marginal cost, as it is known, of getting one more requirement begins to increase at some point. In
some cases, users may not know their requirements until they have had a chance to interact with the
application.
Agile software development methodologies take the spiral approach to an extreme and use very short
software development cycles—as short as several weeks. This allows for almost constant evaluation and
quick adoption.

152
Chapter 7

Software Testing
Software testing is a quality control measure that is designed to identify bugs in software (similar
to regression testing) and to ensure that all functional requirements defined in the earlier stages
are met by the software. The testing at this stage is integrated testing that exercises the full
functionality of the application. Unlike the testing done by developers, which is referred to as
unit testing, the goal with integrated software testing is to make sure the application’s
components function correctly together.
The artifacts used in integrated software testing are:
• Test plans
• Test scenarios
• Test procedures
• Test scripts
A test plan is a high-level document describing the scope of software testing and usually
includes:
• Functions to be tested
• References to requirements documenting functional requirements
• Known software risks
• Test criteria
• Staffing and resource requirements
• Schedule
The details of how functions are tested are included in test scenarios and test procedures. Test
scenarios describe use cases and specific features within those use cases to test. For example, a
scenario may describe a user retrieving a sales analysis report, entering search criteria for
filtering data, and exporting data to a spreadsheet. The test procedures define the steps carried
out by the tester to test each function. For example, to export the data to a spreadsheet, the tester
will select “Export” from the menu, enter file name “Test123.Xls,” save the file, then open the
file in a spreadsheet program and verify that the table headings, summary data, and formatting
are correct.
Testing can be a time-consuming and tedious task, especially when large numbers of functions
must be tested. Test scripts can be used to automate this process and a number of tools are
available.

153
Chapter 7

In addition to testing basic functions, which presumably was done during unit testing in the
development stage, systems integration is also tested. Applications will depend upon other
applications and while application programming interfaces (API) may be well defined and used
properly by client applications, there is more to testing integration than simply making sure a
single API call works correctly. Integration testing should include testing:
• Scalability of calls from the client application
• Consistent security between the applications
• The ability to roll back a transaction across multiple applications
These are the types of non-functional requirements that are not tested in unit testing and must be
explicitly planned for in integration testing.

The IEEE standard for software testing documentation is available at


http://www.ruleworks.co.uk/testguide/IEEE-std-829-1998.htm.

As the application, or in the case of large, multi-phase application developments modules, passes
integration testing, the application is moved to production through the deployment process.

Software Deployment
The process of software deployment is complex because of the dependencies between so many
aspects of information architectures. Release management, as the practice of controlled software
deployment is known, consists of a number of tasks, including:
• Coordinating with testers to ensure software is ready for deployment to particular
platforms
• Packaging software for installation on target platforms
• Determining dependencies for successful installation of software
• Receiving change control approval to deploy software
• Updating the configuration management database to reflect the new versions of the
software on particular platforms
• Planning the deployment so as not to disrupt operations or at the very least, to minimize
the impact of the deployment
In addition to coordinating the installation of software, the release management team must
coordinate with developers and trainers to ensure that end users, systems administrators, and
support personnel are all trained on the new software. The deployment phase in many ways
marks the final state of the software development life cycle because after that software is actually
in use. It is not truly a terminal state, though, because maintenance is such an important factor in
the life cycle.

154
Chapter 7

Software Maintenance
Software maintenance is the practice of making modifications to applications to ensure that they
continue to meet functional and non-functional requirements and do not present security
vulnerabilities that could compromise the integrity and confidentiality of information or the
availability of the system itself. Software maintenance usually comes in the form of patches and
upgrades.
Patches are usually small changes to code to correct a known problem. They do not provide
additional functionality. Upgrades, in contrast, are designed to enhance the functionality,
performance, or scalability of an application.
Another distinction between patches and upgrades are the timeframes for deploying them.
Patches may be provided by application developers as soon as a problem is discovered,
especially if the flaw results in a security vulnerability. In these situations, systems
administrators may have less time to test and apply a patch. For example, if fast-spreading
malware threatens an application and a newly released patch is available from the application
vendor, the systems administration team may deploy the patch with minimal testing. Upgrades
are usually well planned and both the application developers and application users have time to
properly plan their deployment.

Role of Application Development Life Cycle in Systems Management


In many ways, applications are like other assets managed and tracked by systems managers.
They have acquisition processes, they are deployed in a controlled manner, they are subject to
change control, and applications or their components are tracked as configuration items in the
configuration management database. At first glance, applications appear to be managed not all
that differently than other assets, but that is not the case.
Applications are subject to changing requirements that in turn are driven by changing business
conditions and strategies. Applications may be finely customized to the needs of a particular
organization to a far greater degree than other assets, such as networking devices or servers, can
be configured. The flexibility one has in designing software is one of its advantages. This
flexibility brings with it an added level of complexity not found elsewhere.
From a systems management perspective, one does not manage an application but manages an
application in multiple states at the same time. One also manages a host of secondary artifacts,
such as design documents, test plans, requirements, and patches that are all part of an
application. Applications are not just executable files and scripts residing on a server but include
the full range of activities and artifacts that support the application through its life cycle. Another
aspect of applications that is relevant to systems management but not tied to a single application
are the dependencies between applications.

155
Chapter 7

Managing Application Dependencies


Applications exist in something akin to a software ecosystem. Applications use functionality
provided by OSs, network services, and other applications. This use of other components outside
the control of an application, or at least an application development group, leads to a number of
different types of dependencies, including:
• Data dependencies
• Time dependencies
• Software dependencies
• Hardware dependencies
Disruptions in any of these dependencies can cause ripple effects throughout an application.

Data Dependencies
Data dependencies occur when one application depends on another to provide specific data at
certain times. There are many factors of data dependencies to consider but from a systems
management perspective, a key question is, At what point does an application failing to meet its
requirement to provide data begin to adversely impact operations? Consider some examples:
• The enterprise resource planning system of a retailer with 200 stores nationwide
aggregates sales data from stores each night. All stores are expected to provide data by
midnight (headquarters time zone). If more than 5 percent of the stores fail to provide
data or three or more stores in the same region fail to upload data, summary reports
cannot be generated.
• An order processing system depends on an inventory system to check stock levels before
committing to a delivery date. If the inventory system is offline, the order processing
system estimates the delivery date based on the location of the customer and proceeds
with the order.
• A data warehouse draws data from several systems, integrates the data in an enterprise
data warehouse and then populates a series of data marts targeted to particular analysis
functions. In the event some data sources are down, the data warehouse load continues
but the data warehouse generates only those data marts for which all data has been
received.
Clearly, data dependencies are not “all or nothing” affairs. Well-designed applications degrade
gracefully. If partial data is available, then partial functionality and services should be available.
Systems managers should design and manage infrastructure in such a way to support data
dependencies; to do so they must have insight into not only the requirements but also the
capabilities of applications with respect to data dependencies.

156
Chapter 7

Time Dependencies
Time dependencies are an important factor in application management. In some cases, these are
essentially questions of scalability. For example, an online order processing system may be able
to take as many as 1000 orders per minute, but it depends on a service provided by a sales tax
computing Web service that can only process as many as 500 orders per minute. Systems
managers can work with developers to improve on this by dedicating additional servers to the
Web service once the dependency has been identified.
Another type of dependency is more difficult to work around. It is not uncommon for large,
centralized applications to do quite a bit of batch processing outside of business hours. Banks,
for example, will post transactions against accounts and process loan payments during off hours.
Because transaction processing systems are subject to heavy transaction loads, this is also the
ideal time to accomplish tasks that would put an inordinate load on the application during normal
business hours. Data extractions, for example, often occur at these times. The problem is that
there are often several or more data extraction jobs that need to run in a limited time window.
Understanding these requirements is important for system managers so that they can arrange jobs
and allocate resources appropriately to meet the requirements of these non-transaction processing
requirements.
As with other performance measures, it is important for systems managers to track trends in non-
transaction processing. For example, if batch jobs are taking longer and longer to run, are some
critical processes running over into normal business hours and therefore potentially interfering
with core business operations? If so, how can the current configuration of hardware, software,
and batch jobs be reconfigured to eliminate the problem? The answer to this question requires
detailed information from a variety of sources including system logs and the configuration
management database.

Software Dependencies
Software dependencies are another type of dependency that should be explicitly managed.
Successful change management procedures depend upon knowing the dependencies between
applications so that a functioning system is not inadvertently disrupted by a change in some
dependent code. Tracking dependencies explicitly in a configuration management database can
help to minimize the chances of that kind of mistake. This is just one of the reasons that software
should be managed like other assets.

Hardware Dependencies
Applications are deployed to particular servers that have specific configurations. The
dependencies between applications and the hardware configuration required to support them
should be explicitly modeled. At any time, a systems administrator or IT manager should be able
to report on the details of which applications are running on the various servers in the
organization.

157
Chapter 7

Application Asset Management


The software development life cycle is a major process in the management of applications, but it
is not the only one. Application assets, including the hardware required to use those applications,
also entail an asset management process. There is some overlap between the software
development life cycle and the asset management process, but it is still worth outlining the key
elements.

Acquiring Assets
Acquiring assets and planning for their integration and deployment may depend heavily on the
software development life cycle if the asset is built. Regardless of whether an application is built
or bought, the acquisition process is dominated by:
• Functional requirements
• Compatibility with architecture
• Capacity planning
Functional requirements have been detailed earlier in the chapter. Compatibility with architecture
is another factor that can limit an organization’s options when it comes to acquiring assets.
Although shared standards allow virtually any major platform to inter-operate, the cost of
supporting multiple architectural models and platforms may be cost prohibitive. An architecture
based on J2EE standards, for example, can function with .Net applications but the additional
effort to deploy and maintain multiple architectures may outweigh the benefits.
Capacity planning must also be considered when acquiring assets. Factors influencing the
capacity of an application include:
• Number of users
• Peak load periods
• Time dependencies on other applications and data sources
• Expected growth rates
Availability requirements should also be considered in capacity planning. A clustered
configuration of servers, for example, could improve both availability and capacity.

158
Chapter 7

Deploying, Managing, and Decommissioning Applications


Deploying and managing applications as assets follow similar patterns to managing other assets.
For example, for complex applications, systems managers must find the appropriate level for
defining configuration items in the configuration management database. Should the application
be defined as a single item? Should each module? What about software libraries that the
application depends on? The answers to these questions is largely influenced by how tightly
coupled particular modules are. For example, if a financial reporting module within an enterprise
resource planning system changes more frequently than other modules and has different
licensing requirements, then it should be tracked independently.
The usefulness of an application, like hardware assets, will come to an end at some point. When
this happens, several events should occur:
• Application should be taken offline
• Enterprise directories with application information should be updated
• Hardware should be decommissioned or reallocated
• Leases for associated hardware and software should be closed
• Configuration management database should be updated
When managing applications, the software development life cycle entails complex processes that
can be especially challenging from a management perspective.

Summary
Managing applications is a process with characteristics not found in other areas of asset
management. The dynamics of the software life cycle introduce additional artifacts that must be
managed, such as requirements documents, code libraries, and test cases. Applications
themselves are more dynamic than many other assets and this, in turn, creates more work to keep
configuration management databases up to date and accurately reflecting the state of deployed
applications.

159
Chapter 8

Chapter 8: Leveraging Systems Management Processes for


IT Governance
Throughout, this guide has examined systems management processes as they apply to controlling
assets, processes, and procedures; providing service support; delivering services; and managing
applications. This chapter turns your attention to a higher level of management and asks: How do
you control and manage the implementation of these systems management processes?

What Is Governance?
Governance is the process of setting long-term objectives, establishing controls that measure the
progress toward those objectives, and monitoring to ensure controls are followed and objectives
are being met. In short, governance is about deciding what an organization should do, how to
ensure it will get done, and then making sure it does get done. As Figure 8.1 shows, the
governance process encompasses all aspects of service-oriented management (SOM).

Figure 8.1: The governance process defines a framework in which SOM operations are controlled.

Let’s begin with an example that gives an overview of types of governance activities, including:
• Planning and organizing IT operations
• Acquiring and implementing IT solutions
• Ensuring proper delivery and support for IT solutions
• Monitoring services to ensure compliance with policies and procedures
When discussing each activity, let’s explore how to establish goals for each activity and how to
measure progress toward those goals.

The practices described here are industry standards that have evolved over the years. The best
formalization of these types of best practices can be found in the Control Objectives for Information
and related Technology (COBIT) framework established by the Information Systems Audit and
Control Association (ISACA). More details about COBIT and ISACA can be found at
http://www.isaca.org/.

160
Chapter 8

Governance: An Example
Governance, as practiced according to COBIT, is a typical reductionist management practice.
You first identify the parts of SOM, dividing them into logical groups, then continue dividing
those groups into smaller and smaller constituent parts until the resulting units are easily
described in terms of:
• What is to be accomplished?
• What factors influence the success of the objective?
• How can progress on the objective be measured?
For example, consider a company that has a strategic objective to reduce telecommunications
costs by deploying voice over IP (VoIP). Doing so will require substantial investment of time
and money, and the board of directors expects executive management to have a plan in place for
overseeing the deployment of the VoIP system as well as ensuring that ongoing operations are
meeting the organization’s needs within budget and on schedule. The process begins with
planning how to acquire and implement the service. After the planning stage is complete, the
process moves on to the acquisition and implementation processes. COBIT then provides a
framework for delivery and support as well as monitoring and evaluation.

Planning a VoIP Implementation


For this example, assume that the IT organization responsible for implementing the VoIP
solution is organized into several groups:
• Business analysis and project management
• Applications development
• Network services
• Server and client management
• Support services
• Training
Each will have a role in the VoIP project, so one of the first steps after defining the strategic plan
that includes the project is to fit the project into the existing IT management structure. This
includes:
• Determining how the project fits in the IT architecture
• Identifying the management processes that will control the execution of the project
• Incorporating the financial planning of the project into the broader IT investment
portfolio
• Conducting a risk analysis of the project
• Planning the staffing and training requirements
• Executing established project management procedures

161
Chapter 8

This process is controlled by involving business owners, project managers, and domain experts
who will follow a formal planning process and document their findings, which are then reviewed
and approved by executive management before proceeding to the next stage.
The business analysis and project management group as well as managers from network services
and server and client management would have to be involved in the first step, determining how
the project fits into the IT architecture. This same group would identify the management
processes that will control the execution of the process. Now, ideally, that should be a relatively
easy task. In a mature governance environment, those processes are well established.

See the section “Governance and Maturity Models” later in this chapter for more information about the
different levels of process maturity.

Incorporating the financial planning of the project and conducting the risk analysis are the
responsibilities of the IT managers with assistance from business analysts, project managers, and
domain experts. This group should also handle planning for staffing requirements and training.
The final stage of planning is to formulate a project plan and engage management oversight of
the project.
Each of the activities must be well documented. Common procedures, such as project planning
and risk analysis, often have formal document deliverables that have a well-defined structure.

Project management professionals have formalized their discipline and have developed a body of
knowledge and a set of documents common to project management across domains, not just IT. For
more information, see the Project Management Institute Web site at
http://www.pmi.org/info/default.asp.

The deliverables from the planning stage should include project plans, risk analysis, and
requirements documents. The governance process measures timeliness and quality to ensure that
the planning process is working as expected. For example, key measures might include whether
the documents were prepared on time, if the requirements document addresses the full scope of
business and technical requirements, and whether the project plan met the standards outlined by
the Project Management Institute.

Implementing a VoIP Solution


During the implementation phase, the emphasis shifts to selecting, acquiring, and installing the
VoIP system. Selection is made based on a combination of functional requirements and
feasibility analysis, which should include both technical and budget constraints.
Once VoIP hardware and software—as well as supplemental acquisitions such as additional
network hardware and dedicated Internet bandwidth—are acquired, the implementation begins.
This process begins with development and testing procedures and then rolls out to production.

162
Chapter 8

The success of selection phases can be measured by the number of times business owners agree
with feasibility studies and sign off on requirements as sufficiently comprehensive to proceed
with the project. The measures of the deployment phase can include:
• Number and severity of bugs found in testing (reflects on the selection process)
• Number and severity of bugs found after deployment (reflects on the testing process)
• Number of days ahead or behind schedule for key implementation milestones
• Satisfaction of business owners and users with initial deployment
• Number of users trained on the system
After implementation of the service, the governance process will continue by controlling the
maintenance and support for the service.

Maintaining and Servicing the VoIP Service


Once the VoIP system is operational, a new set of management objectives emerges:
• Managing service level agreements (SLAs)
• Managing third-party providers
• Providing service desk support
• Ensuring the availability of services
• Maintaining adequate security measures
• Training users
• Managing costs and charge backs
To ensure that these tasks are performed effectively, managers use a series of measures, such as:
• Number of times an SLA is violated
• Number of non-performance incidents with third-party providers
• Length of time the service was not available and number of users affected
• Number of security breaches that resulted in loss of availability, loss of confidential
information, or corruption of data
• Number of users trained
• Total cost of OS, and per-user cost of system
The final part of the governance process is to monitor and evaluate IT processes.

163
Chapter 8

Monitoring Operations
The final stage of the governance process is monitoring the delivery of VoIP services to
determine whether objectives are being met. This could include analyzing summaries of the
management reports generated as part of the operation maintenance. The objective isn’t just to
know how the service is performing but to know what is being done to correct any problems.
In practice, you do not perform governance over a single project or operation but over all IT
projects and operations. This example illustrated the types of controls and measures that need to
be in place to ensure that projects and services meet management expectations and, if they do
not, that mechanisms are in place to make executive management aware of problems and provide
them with enough information to address the problem. Let’s move from the example to the
formal structure of controls.

Governing IT Services
Governing IT services, according to COBIT practices, is divided into four parts:
• Planning and organization
• Acquisition and implementation
• Delivery and support
• Monitoring
Each of these areas is broken down into a set of control objectives, which in turn, have a
definition, a method for achieving the objective, and suggested measures for determining
whether the objective is being met.

Planning and Organization


The planning and organization phase of governance is subdivided into several areas:
• Defining IT strategic plan
• Defining IT architecture
• Defining IT processes and organization
• Managing IT investments
• Managing human resources
• Managing projects
• Managing IT risks

This section is not an attempt to cover all the topics addressed by COBIT. Some planning and
organizational topics, such as controlling quality, are not covered. The purpose of this section is to
describe governance and its relation to SOM. This chapter cannot, and does not attempt to, replace
COBIT documentation.

164
Chapter 8

Defining the IT Strategic Plan


The first step in planning for IT governance is to define an IT strategic plan. The plan is
essentially a mechanism for aligning IT operations with business strategy. It should include:
• Descriptions of key business objectives and IT services required to realize those
objectives
• Priorities assigned to each objective
• Plans for methods for keeping the IT strategic plan in alignment with changes in the
business plan.
The plan should lay out what should be done by IT, not necessarily what is being done by IT. In
addition, it must be understood that the plan is a dynamic tool for directing IT operations. IT
management and executive management work closely to keep IT priorities in sync with changing
business goals.

Defining IT Architecture
The planning and definition of an IT information architecture is one of the first points at which
security emerges as a prominent aspect of planning. The information architecture of an enterprise
includes:
• A organization-wide data model
• A data classification scheme
• Assignment of ownership of elements of the data model

Organization-Wide Data Model


The data model of the architecture should not be confused with the detailed and operation data
models built during application and database development efforts. During the planning stages,
the data model is more like an inventory of data elements than a structural description and is
complete with dependencies and data integrity constraints. The data model should:
• Minimize redundancy of data
• Accommodate business functions needed to support strategic objectives
• Define common standards for data syntax to promote reuse
• Provide retention periods and destruction requirements
The data model becomes a reference point from which more detailed design and development
projects can begin. By including data standards, the data model helps to promote interoperability
between applications. The data model should also include a description of data retention policies.
These policies may be based on government requirements (for example, in the case of tax
information) or based on business practices (such as retaining non-regulated public information).
The model should also include data classifications.

165
Chapter 8

Data Classification Scheme


At the highest levels, business security classifications typically include:
• Public—The public classification is assigned to data that would not cause any adverse
impact on the organization if it were released to the public. Obvious examples include
information that has already been made publicly available, such as press releases, and
data submitted to government regulators, such as annual and quarterly security reports,
including 10-Ks and 10-Qs in the United States.
• Sensitive—Sensitive information should not be released publicly to protect the interests
of the organization; however, if it were released, the damage would be minimal. Financial
information, vendor negotiations, project plans, and other information that is often widely
dispersed within an organization fall into this classification. There are, of course, matters
of degree that must be taken into account. Financial information about a pending merger
that has not been disclosed could adversely affect share prices or prompt strategic
reaction from competitors.
• Private—Private information is information about persons involved with the
organizations, such as employees, customers, patients, and clients. Private information is
the subject of many regulations, from the European Union’s Privacy Directives to the
United States’ Health Insurance Portability and Accountability Act (HIPAA) regulation
on healthcare information to state level regulations such as California’s SB 1386.
• Confidential—Confidential information requires the greatest protection because its
disclosure could have significant adverse impacts on the organization. Examples include:
• Trade secrets
• Proprietary process plans
• Strategic negotiations
• Strategic plans
Significant controls should be in place to protect both the confidentiality and integrity of
confidential information.
Security measures should be sufficient to protect information appropriate for its classification
level.

Data Ownership
A role of data owner should be defined for each element of the data model. The business owner
is the person responsible and accountable for the management of that data. The business owner
role is typically filled by an executive or management role; it is not the systems administrator or
database administrator who may be responsible for the day-to-day maintenance of the data and
the infrastructure that supports it. Data owners are responsible for:
• Formulating policies and procedures controlling the use of the data
• Meeting regulatory requirements concerning the data
• Defining security, availability, and business continuity requirements regarding the data
The information architecture is one of the areas of COBIT that has direct impact on systems
management operations; another is defining IT processes and organization.

166
Chapter 8

Defining IT Processes and Organization


The most effective systems management operations are based on well-structured processes and
organizations. The planning of IT operations begins with defining areas of responsibility,
creating roles, and formulating policies and procedures for conducting the organization’s IT
operations.
COBIT identifies several specific functions that should be addressed in a process and
organization plan:
• Control of operations
• Quality assurance
• Risk management
• Security
• Data ownership
• Segregation of duties
Another aspect of the plan is the structure of IT operations. To effectively control IT operations,
the structure should be based on business needs and technical requirements. For example, from a
business user perspective, a mission-critical application needs to be available 99.999 percent of
the time. From a technical perspective, this requirement maps to several technical requirements,
ranging from storage capacity and network bandwidth to application design and access controls.
It is highly unlikely that one would organize IT operations around an application. The types of
services utilized by the applications, such as storage, networking, and servers, are more likely to
align with staff skills and a manageable division of labor.
Most of the elements of the process and organization plan are addressed elsewhere in this guide,
but segregation of duties has not and deserves attention. This is one of the most important
aspects of security management but is a far less popular topic than other more technically
interesting issues.
The principle of separation of duties is that more than one person is required to complete a
critical task. For example, a developer may program a change for an application that is then
tested by another person who then passes on the tested code to a third person for release.
Developers would not have access to the test environment, and only release managers can move
code into production. In this way, at least two roles would have to collude in order for a piece of
malicious code to work its way into the production environment. The other areas of planning and
organization center on the management of the IT organization rather than on the services it
provides.

167
Chapter 8

Managing IT Investments
Managing IT investments can be boiled down to one word: budgeting. Given a set of strategic
directives, IT executives are expected to deliver the services needed with the financial resources
allocated. This process is more than just balancing funding and expenditures, it includes:
• Allocating funds to specific operations and projects
• Creating financial forecasts and optional scenarios
• Establishing criteria for measuring the value of proposed projects
• Monitoring the value of ongoing projects
Managing investments is highly dependent on proper management of human resources and
projects.

Managing Human Resources and Projects


Managing human resources includes the typical operations one would expect, such as:
• Hiring and terminating employees
• Training
• Conducting personnel reviews and assisting with career planning
• Defining job roles and responsibilities
• Supporting the use of contractors and consultant to augment permanent staff
Having the right combination of skills within an organization is critical to maintaining ongoing
operations and properly staffing projects.
Project management includes elements of human resource management as well:
• Defining project management frameworks that explain stages of project management and
documentation required for each stage
• Creating project management guidelines
• Providing oversight of projects
Oversight is essential to detecting problems with project deliverables. Identifying and correcting
problems in projects early can limit the costs and risks to the project. Risk management, though,
extends well beyond tracking the timeliness of project schedules.

168
Chapter 8

Managing IT Risks
Risk management, like project management, should be done within a formal framework. IT, by
nature, has risks not present in other business areas. The potential for system incompatibilities,
security threats, and the disruption of operations can occur on a substantial scale with relatively
little input. For example, a single attacker could breach a database application and steal tens of
thousands of customer records, or a single failure in a critical network device can disrupt
multiple operations.
Managing IT risks includes:
• Defining a risk management framework for determining risks and identifying the
organization’s level of risk tolerance
• Conducting risk assessments
• Formulating risk mitigation plans
By creating a formal management structure that includes all the essential elements outlined, an
IT organization will have a strong foundation for moving to the other areas of IT operations,
such a the acquisition and implementation of IT services.

Acquisition and Implementation


Acquiring and implementing IT solutions is an ongoing process. From a SOM perspective, these
activities constitute a major part of the systems management process. The major tasks in this
stage include:
• Evaluating and selecting solutions
• Acquiring and maintaining both hardware and software
• Enabling operation and use
• Managing change
These tasks constitute the major parts of a system life cycle and so, not surprisingly, the controls
governing these tasks are essentially the same as those found in development methodologies.

Evaluating and Selecting Solutions


When a perceived business need for an IT solution is recognized, a formal evaluation and
selection process should begin. The process should include:
• Soliciting functional business requirements
• Defining technical requirements, such as capacity, security, and other non-functional
requirements
• Conducting risk analysis and feasibility studies for the project
• Receiving executive approval for the proposed solution and prioritizing the
implementation along with other IT initiatives

169
Chapter 8

The evaluation and selection tasks entail what is often called the “build vs. buy” decision. This is
something of a misnomer because complex systems are can rarely be reduced to such a simple
dichotomy. In practice, the decision is more akin to selecting a point on a continuum ranging
from buying a turnkey solution to building a custom solution for every aspect of a system.
For example, in the case of the VoIP example from earlier in the chapter, the systems designers
and project sponsors may conclude that no commercially available system meets all needs. The
same group is likely to conclude that “building” a VoIP solution is not feasible. The solution in
such cases involves starting with a commercial application as a base and customizing
applications as needed and integrating with existing infrastructure to get the functionality
required. This is done during the acquisition and maintenance phases.

Acquiring and Maintaining Systems


The acquisition phase of IT systems management is a relatively high-risk area. It is at this point
that solutions are implemented, software is developed, hardware acquired, and the impact of poor
planning and incomplete requirements starts to come to light. Thorough management is required
to keep projects on track, plan for contingencies, and be able to respond to roadblocks that can
derail a project.

The term “death march” has come into software development parlance to describe a project that will
inevitably fail. The failure is often due to a combination of poor planning, poor project management,
insufficient resources, changing requirements, and unrealistic schedules. All these factors can be
avoided, or at least mitigated, by proper governance procedures.

Managing this phase of IT operations entails:


• Mapping functional and non-functional requirements into sufficiently detailed designs so
that implementations can be carried out without the risk of unanticipated conflicts,
missed dependencies, or other factors that can compromise the progress of the project.
• Following software development practices suitable for the type of project underway.
Some methodologies, such as extreme programming, may be appropriate for small
projects or parts of projects, while a spiral methodology may be required for larger multi-
faceted projects.
• Using appropriate division of development, testing, and production environments.
• Ensuring change management and release management practices are followed.
• Planning for capacity and availability requirements so that suitable resources are in place
when applications are deployed.
As with other stages, the acquisition and implementation phases have overlapping characteristics
with the next phases—enabling operation and use.

170
Chapter 8

Enabling Operation and Use


The process of enabling operation and use focuses on the release management process that
moves a system from development and testing into production. The steps of this process include:
• Developing administration and end-user documentation for the system
• Developing training material for administrators and end users
• Planning the roll out of client applications
• Planning the roll out of server-side applications
• Coordinating any operational transitions for existing processes and systems to the new
application and related processes
This is also a relatively high-risk part of IT management processes, but the level of risk is
directly proportional to the amount of planning and the quality of development efforts that
precede it. When developers follow established software engineering practices, comprehensive
testing and defect management is in place, and the rollout of applications is coordinated with
business partners, then the risks in the transition to production are reduced. Poor programming,
inadequate testing, lax management of the systems life cycle, and ad hoc procedures for releasing
an application is a recipe for disaster. Once systems are in place, the management focus turns to
maintenance operations; one that requires formalized procedures is change management.

Managing Change
Changes made on an ad hoc basis are more likely to succumb to a common scenario. It begins
with an urgent requirement coming to light or the discovery of a flaw in a program. Due to a
sense of urgency, rather than follow formal analysis, design, development, and testing, it is
decided that a developer can start with a minimal summary of requirements (which are rarely
documented). The developer makes a change that addresses the immediate problem, or at least
corrects the symptom of the problem, with the good intention of going back into the code and
fixing it the right way when he or she has more time. Formal testing procedures are bypassed and
after a few unit tests followed by a minimal integration test, the code is moved to production.
What follows from that point can vary, but some of the outcomes are:
• The patch itself has a bug that was not detected during the minimal testing that was done
• A new bug indicates an unanticipated dependency in another part of the code, which was
thought to be unrelated to the section that was patched
• The patch, while programmed according to the system documentation, fails to work
correctly because of a previous ad hoc patch that changed a function but was unknown
because the follow-on step of updating the documentation wasn’t performed

171
Chapter 8

This type of disruption to operational systems can be avoided by:


• Using established testing methodologies
• Reviewing changes with a broad set of developers, administrators, and key users before
implementing the change
• Documenting the change process
• Conducting post-implementation reviews to determine ways to improve the process
A well-established change management procedure is essential for managing the next stage:
delivery and support.

Delivery and Support


Delivery and support is, in many ways, the heart of systems management. Much of the effort of
systems managers is directed to several key processes:
• Managing service levels
• Maintaining performance and capacity levels
• Ensuring security of systems
• Managing budgets and resources
• Providing training
• Providing service support
• Managing data
• Managing the physical infrastructure
Together these activities provide the systems management foundations for the day-to-day
operations of business systems.

Managing Service Levels


Business owners of a system and systems administrators should have a common understanding
of the expectations for the service levels of the system. Ideally, this is worked out during the
requirements and design stages, but realistically, it typically needs adjustment as an organization
gains experience with a system and adapts to changing trends in system demands. These
agreements are formalized in SLAs and operational level agreements (OLAs). Some of the
factors included in SLAs are:
• Service availability
• Capacity of services
• Performance levels
• Service support response times
• Continuity plans
• Security requirements
The SLAs address what is to be provided, and the OLAs focus on how service levels will be met
with particular hardware, software, network, and staffing resources.

172
Chapter 8

Maintaining Performance and Capacity Levels


SLAs provide the metrics that systems administrators use to allocate resources. To maintain
performance levels, systems administrators must:
• Monitor response times
• Collect related performance statistics (for example, CPU usage, bandwidth usage, and so
on) when performance levels are not reached
• Report on performance levels so that management can adjust resources as needed
• Develop forecasts of future demands on the application
Closely related to performance management is capacity management. The difference is that
instead of focusing on the response times, capacity management focuses on underlying resources
required to maintain service levels. The tasks in this area include:
• Monitoring CPU, disk, and network use
• Assessing the impact of a change in requirements; for example, the time and space
required to perform additional backups
• Forecasting trends in resource use
Another task essential to the viability of service levels is ensuring the integrity of applications
and data.

Ensuring Security of Systems


Security is a multifaceted challenge, and governing the security management process is equally
as complex. The fundamental goals of information security are to maintain the confidentiality,
integrity, and availability of systems and data. To meet this objective, well-managed IT
organizations will:
• Document security requirements relative to business requirements
• Establish identity management and access controls
• Review government regulations and establish procedures to ensure compliance
• Establish monitoring and auditing procedures
• Establish incident response policies and procedures
• Change control procedures
• Establish security configuration standards

173
Chapter 8

Security entails a balance between the need to protect information and assets and the need to
keep resources accessible to users without unnecessary burden. To strike a proper balance, the
business requirements, relative to security requirements, should be well documented. This begins
with data classification. Identity management and access controls will build upon the information
classification scheme described earlier. By first portioning information into different categories,
it is easier for security managers and systems administrators to properly apply access controls.
Other requirements, such as the need to share information with business partners, can extend
beyond the boundaries of the organization.
Government regulations drive a wide array of security requirements and have helped to promote
the practice of IT governance. At the very least, organizations should understand which
regulations apply to the operations and then review policies and procedures in light of those
regulations. In some cases, such as the Sarbanes-Oxley Act, auditors can help formulate
appropriate controls to meet regulatory requirements.
Monitoring and auditing policies are required as well. Governance depends on measures to
assess the effectiveness of controls, so one would expect security management to require
monitoring for that reason alone. More importantly, monitoring is an active part information
security practice; it serves multiple purposes, ranging from helping detect anomalous events to
providing traces of events that occur during a security breach.
Finally, the governance of security operations should include the establishment of incident
response policies and procedures. Executives, managers, systems administrators, network
managers, and others should all know their roles and responsibilities in the case of a security
breach. Well-defined reporting procedures should be established. Key measures of system
security management include the number and severity of security breaches and the number of
times security requirements are not met.

Managing Budgets and Resources


During the planning process, budgets are established and priorities set. During the delivery
phases, the focus shifts from the high-level allocation of resources to tracking expenditures and
charge backs. The goal is to ensure that charges are properly allocated and costs recovered and
that projects and services stay within budget.

The Problem with Charge Backs


One of the trickiest areas of financial IT management is allocating costs. Ideally, the business
units that use services pay IT costs; and they pay according to the levels of services they receive.
In practice, distortions in cost can occur.
Consider the example of a storage area network (SAN) used by several departments. The IT
department purchases or leases the disk array for a period of 3 years and charges each
department according to the amount of storage used. The costs of the hardware, software,
service, and support staff are known for the 3-year period, so the IT department calculates the
lifetime cost of the service. Each department is charged for the percentage of the storage they use
on a per-gigabyte-of-storage-per-month basis.

174
Chapter 8

Thus, if five departments use equal amounts of storage, they each pay the same amount. Suppose
that one of the departments decides to use another storage service or no longer needs as much
storage. The IT department is charging less because less storage is used, and so they are no
longer recovering their costs. Does the IT department increase charges to compensate for the lost
charge backs? If it does, the other departments will bear the increased costs leading them to
either reduce their storage use or look elsewhere for storage services. If the remaining
departments reduce their storage use, the cycle continues, and the IT department would have to
increase per-unit charges again to recover costs.
Internal charge-back models must be carefully formulated to avoid distorting reasonable
economic incentives. Some balance must be found to meet the objectives of individual
departments while realizing the benefits of economies of scale. Key measures of the success of a
charge-back system is the number of times charge-back costs are disputed and the number of
times service agreements are either terminated or not renewed because of cost disputes.

Measuring Variance
Budgets will vary from actual expenditures; how often this happens and to what degree is
another measure of the budgeting and allocation management process. When measuring
variance, management should determine an appropriate aggregate level.
For example, within a department, a line item for one activity, such as payroll and benefits, may
be over budget but another comparable line item, such as consulting fees, may be sufficiently
under budget to compensate for the difference. This may or may not be a cause for concern.
Consulting fees within the budget may be highly variable, while payroll costs tend to be less so.
If consulting fees are reduced in the next budget cycle, what will offset the ongoing increased
payroll charges?
Another type of variance that should be monitored by the governance process is the reallocation
of funds to different types of expenditures. For example, a decrease in spending on service
contracts to compensate for overruns in other line items could leave some services vulnerable to
disruption or subject to lower performance levels than defined in SLAs.

Providing Training
Training is a fundamental IT service. For overall governance of training, the following are key
measures:
• Number of users trained
• Rate of service desk calls related to functions addressed in training
• Subjective quality ratings provided by trainees
Effective training is correlated with the demands for service support.

175
Chapter 8

Providing Service Support


Service support provides users with someone to turn to when problems or questions arise with IT
applications. Successful service support requires
• Adequate capacity of first-line support personnel to field calls
• Appropriately trained support personnel who can handle the majority of calls within the
first and second levels of support
• Escalation procedures for determining when to seek more specialized assistance with a
particular problem
• Management procedures for collecting data about service calls to identify trends and spot
potential weaknesses both in applications and support services
The quality of service support can be measured with quantitative measures, such as the number
of calls per service desk employee, the average time to resolve an incident, and the number of
incidents escalated to higher levels. Qualitative measures, such as users’ satisfaction, can also be
used.

Managing Data
Backup and recovery operations are required to preserve the availability and integrity of data.
Although the topic sounds mundane and rather simple at first, the complexities of backup
become clear quickly. Some of the topics that must be addressed in backup policies include:
• Determining what to back up—Data is frequently duplicated for performance purposes or
for ease of integration. What data source is considered the system of record (that is, the
definitive record)?
• Adequately protecting backup data—For example, what data classifications should be
encrypted when backed up?
• How long should data be retained?—When data is removed from operational systems
according to records retention policies, how will copies of the data be deleted?
• Backup media is subject to failure like any other device—How much testing of backup
and archive material is required?
This service may be measured by the number of times backups are successfully performed and
the percent of time backups are performed in the time allotted to the backup process.

176
Chapter 8

Managing the Physical Infrastructure


Many of the governance topics focus on managing the technical aspects of IT services and
controlling organizational factors, such as budgets and staff. Another important topic is
managing the physical infrastructure of an IT operation. This includes
• Providing adequate facilities—including space, power, and environmental controls—to
accommodate staff and hardware
• Supporting contingency planning by having offsite or backup facilities in the case of
service disruption at a primary location
• Deploying physical security controls at facilities
Measures for this area include tracking the number of security breaches at a facility and the
number of lost hours due to power failure or loss of other required utility.
Delivering and supporting IT operations is a multi-faceted challenge. By dividing the tasks in the
logical divisions outlined and tracking progress with some of the measures provided, an IT
organization can build on past experiences to improve on service delivery.

Monitoring and Evaluating IT Management


Monitoring and evaluation processes are not limited to the technical or human resource aspects
of IT. The IT management process itself should be monitored and measured. This process
includes steps such as
• Analyzing performance reports to understand the overall state of IT operation
• Assessing how well SLAs are being met overall
• Reviewing exceptions to management frameworks
• Monitoring the state of regulatory compliance
• Conducting self evaluations, including quality surveys of customers
Again, the objective is to establish a set of policies and procedures and then measure the level of
adherence to those policies and procedures. The monitoring process for IT can provide
indications of management processes that need correction.

177
Chapter 8

Governance and Maturity Models


This discussion of governance has outlined the major areas of IT management and divided those
into logical manageable units with key measures for determining how well each is managed. The
measures also provide indications of trends, weak areas, and other factors requiring management
attention. It must be recognized, though, that not all organizations are at the same level of
capability to monitor their management processes.

Examples of Varying Levels of Capability Maturity


Consider some scenarios that impact the ability of an organization to provide effective
governance of IT operations:
• An organization without a well-defined strategic plan cannot align IT objectives to
business objectives.
• Without standardized procedures for managing projects, it is impossible to compare the
performance of multiple projects to determine which factors promote and which hinder
project deliverables.
• A development team does not have a distinct test environment or team of quality control
testers, so they perform their own testing in a development environment before deploying
code.
• Training of systems administrators on new applications is not a formalized process;
developers typically spend a brief amount of time with systems administrators just prior
to deployment to explain how the system works and what is required to manage it.
Formal documentation is not produced.
These examples all depict organizations at different capabilities for governing IT operations. In
the first scenario, the organization is incapable of governance because there is no foundation
upon which to build an IT strategy. IT managers are left to respond to ad hoc requests for
services and are likely juggling multiple projects with no priorities to order those initiatives. The
result of this type of management scenario is a group of frustrated users and business managers
and an IT staff in “fire drill” mode.
In the second scenario, the organization recognizes the need for management frameworks but
does not have a consistently applied methodology for project management. Although some
project management is better than none, every project may be managed differently, varying
according to the management style of the person in charge.

178
Chapter 8

Inadequate resources hamper the development team in the third example. The team is forced to
conduct two distinct activities, testing and development, in the same environment. This can lead
to conflicts in the use of resources, introduce dependencies that would not exist if separate
environments were used, and can delay deliverables as tasks are scheduled around the limited
resources of the development environment. In addition, testing sizable software development
efforts require a formal methodology; many developers are not trained in those methodologies.
The result is, good intentions aside, inadequate testing that leads to higher risk of failure during
deployment.
The final scenario depicts a lack of emphasis on operational support. Without proper training and
support, systems administrators will not be able to effectively manage and tune an application.
Users may not receive or may be delayed in receiving the services they need. These kinds of
scenarios show that adopting sound management frameworks and development methodologies is
not black and white (as in, either you do it all or you do not do it at all); rather, there is a
continuum with many processes within many organizations somewhere between the best and
worst extremes.

Capability Maturity Models


The Carnegie Mellon Software Engineering Institute (SEI—http://www.sei.cmu.edu) has
developed a formal model for measuring the level of maturity of an organization with respect to
processes such as software development. These are known as capability maturity models and
define levels of control and optimization that an organization is capable of exercising. The
maturity models are divided into six stages of development:
• Level 0 (Non-existent)—At this level, there are no discernable processes of systematic
management.
• Level 1 (Initial)—At this level, projects, issues, software development, or other key
processes are managed on an ad hoc basis. There is little or no connection between how a
process is carried out in one instance and another. There is no coherent management
process.
• Level 2 (Repeatable)—At this level, there is some common understanding of how to
perform tasks, but there is no formal documentation or training on the processes. At this
stage, success or failure is highly dependent on the capabilities of the people immediately
involved in the process.
• Level 3 (Defined)—At this stage, procedures have been formalized and documented.
Participants receive training in the process. There is little oversight, though, to ensure that
the process model is followed.
• Level 4 (Managed)—This stage builds on level 3 by adding more management oversight.
• Level 5 (Optimized)—Processes are well managed, compliance with standards is
measured, and results are measured and used to tune processes.

179
Chapter 8

This framework uses a set of key performance goals and key performance indicators to guide the
implementation of the COBIT objectives. Key goal indicators measure how effectively an
organization achieves its goals. Key performance indicators measure specific operations and
processes and are leading indicators of the organization’s trend toward reaching its goals. Key
goal indicators measure overall performance with respect to a goal after the fact. In addition, key
performance indicators are measures gathered during the observation period and thus allow time
for management to adjust practices and make corrections as needed.

For more information about maturity models and management, see the SEI documentation at
http://www.sei.cmu.edu/managing/managing.html.

Organizations that start to implement formal governance procedures will do so at some point in
the maturity model. If, for example, executive management has decided to improve the software
development process, which is currently at some point between Level 1 and Level 2, then one of
the first objectives will be to formalize documentation and training. Governance measures should
also focus on allowing management measures to progress toward those goals. Implementing
governance procedures must be done with recognition for the relative capability maturity of the
IT organization.

Summary
Governance is the process of directing and controlling operations to ensure that long-term
objectives are met. COBIT is a deep and broad framework for implementing governance best
practices. The IT field is mature enough that management and governance practices need not be
an exercise in reinventing the wheel; rather, the goal of executives and IT management should be
to find frameworks that serve the needs of the organization and work well together. SOM, for
example, is highly amenable to governance because of the logical organization of operations and
resources and the focus on measuring performance. It should be understood that governance is an
ongoing process that will change as the maturity level of the IT organization changes. As
systems management, software development, and training procedures improve, it is likely that
the ability to keep those aligned with strategic objectives will improve as well.

180
Chapter 9

Chapter 9: Supporting Security with Systems Management


The security and systems management functions of an organization go hand in hand. Security
professionals depend on the services and infrastructure maintained by application, server, and
network administrators. Countermeasures such as firewalls, content filters, and anti-malware
must be deployed, maintained, monitored, and integrated and these tasks fall into the domain of
network and systems management. At the same time, systems managers have a wide array of
responsibilities and they require a secure foundation upon which to do their work. We cannot
expect application administrators to maintain a mission-critical application while the server is
subject to Denial of Service (DoS) attacks or client devices are riddled with spyware and
malware. There is much overlap between security and systems management, and this chapter
will focus on how systems managers can support and help to improve the overall security of the
IT infrastructure.
Information is a broad and challenging field. Several frameworks and organizing structures have
been proposed. The ISO-17799 standard is popular among security professionals because it
addresses the field from their perspective. Another approach, taken by the SANS Institute, is to
think in terms of layered walls and defense in depth. This model is probably more similar to
architecture models and infrastructure designs used by systems management. Although the topics
addressed in this chapter span both the ISO-17799 standard and the SANS model, the SANS
model will serve as an organizing principle.
The key areas of information security as it relates to system management are:
• Network security
• Host security
• Vulnerability management
• Authorized user support
• Security management
Some areas, such as network security, have dedicated administrators and engineers who
specialize in both managing and securing network assets. The other areas are more likely to
require the support of systems and application administrators and warrant the most attention in
this chapter. However, for completeness, we will examine all the areas.

181
Chapter 9

Network Security
Network security requires many security measures around the network perimeter. For example,
common network security devices include:
• Firewalls
• Intrusion detection/prevention systems
• Content filters
• Network access controls
• Messaging boundary gateways
All these are primarily security devices, but they are still information assets that require
management. Furthermore, these devices are becoming more complex and that implies more
demanding management. Take, for example, the most basic of network security devices, the
firewall.
Firewalls segment networks and control the type of traffic that can pass between segments. For
example, HTTP traffic may be allowed from devices outside of the organization’s network but
FTP traffic is not. Firewalls are a first line of defense but are limited by the amount of
information they analyze. For example, packet filtering firewalls examine only packet header
information, whereas proxy firewalls examine information within the packet. Application
firewalls are increasing in popularity because they can filter traffic based on the needs and
vulnerabilities of a particular application.
For network administrators, the increasing complexity of firewalls and other network security
devices will bring with it greater demand for mature systems management practices. Whereas in
the past when one or two packet firewalls might be used in a network, today there may be several
other more complex firewalls within the segments as well. These must be administered, patched,
and maintained, and that means these must fall under systems management operations.
The following activities can generally be expected of network administration and systems
administration groups when supporting network security:
• Assisting with the procurement of network security devices
• Installing and configuring those devices
• Monitoring basic functionality
• Generating alerts and logging events, such as a device going offline
• Maintaining asset information in a configuration management database
• Applying patches
• Assisting with vulnerability assessments
• Participating in risk analysis operations related to network security
Of course, network administration and systems administrators each have distinct areas of focus;
to maintain and improve security, they should also understand the architectures and processes
that constitute their colleagues domains.
As you move further from the perimeter and away from specialized security devices, the roles of
systems administrators as security professionals increase. This is certainly true of host security
operations.

182
Chapter 9

Host Security
Host security measures maintain the integrity, confidentiality, and availability of information and
services provided by servers and client devices. System attacks are those targeted to particular
applications or hosts. The purpose of such attacks may be to disrupt services or steal information.
As the economic motives behind attacks have grown to dominate the reasons for serious attacks,
we are likely to see more attacks targeted to specific applications and hosts.
Attacks may include:
• DoS attacks attempting to disrupt the operations of an organization
• Database breaches attempting to steal private but profitable customer information
• Application-specific attacks, such as attacks on enterprise management systems that
contain sensitive and confidential information about an organization’s operations
System attacks are often not ends in themselves but rather a means to an end—information and
resource theft. In spite of the range of functions hosts serve, a number of security measures are
common to all, including:
• Personal firewalls
• Anti-malware
These are some of the most common elements of security support within the practice of systems
management.

Personal Firewalls
Personal firewalls serve the same purpose as network firewalls but the function is localized to a
single host. The personal firewall controls traffic into and out of a device. Controlling inbound
traffic on a host is a basic perimeter type defense with obvious benefits.
Outbound traffic can also be blocked. This can reduce the impact of a compromised device,
which might, for example, be part of a spam-generating botnet. The malware infecting the device
may generate spam but the personal firewall can block the transmission of the unwanted email.
The challenge for managing personal firewalls is the number of devices that must be deployed
and the varying requirements. Consider some examples:
• A Web server will require traffic on HTTP, HTTPS, and related ports
• A database server will require inbound and outbound traffic on ports dedicated to the
database listener
• A salesperson’s notebook will should have email (SNMP) traffic blocked
• All hosts may, by default, require ftp ports blocked
Determining the proper configuration of a personal firewall is the responsibility of security staff.
Once the configuration is defined, though, ensuring that the proper configuration is in place and
the software is up to date and activated on the device is the responsibility of systems
administrators. One of the areas in which the systems management and security staff will have
shared responsibilities is in managing anti-malware systems.

183
Chapter 9

Anti-Malware
The malware threat has evolved from disruptive and annoying viruses written to demonstrate a
hackers ability to circumvent normal operating system (OS) operations to financially motivated,
sophisticated blended threats designed to steal information and compromise hosts. There are
several distinct types of malware:
• Viruses and worms
• Keyloggers and video frame grabbers
• Trojan horses
• Botnets
• Rootkits
These different types of malware are used for carrying out different aspects of an attack and may
be blended together to create a more serious threat than posed by any single type of malware on
its own. Understanding the difference in malware is important to understanding how they can
impact IT operations.

Viruses and Worms


Viruses and worms are the most well-known forms of malware. Viruses consist of a payload, the
part of the virus that carries out its malicious activity, and propagation code, which allows the
program to spread by attaching to other programs. More sophisticated forms of viruses include
encryption modules used to mask the viruses from antivirus detection. In practice, encryption is
not enough of a defense because signature-based detection methods can still be used to identify
encryption modules even when the payload is encrypted.
Polymorphic viruses change the structure of the program without changing its functions. These
kinds of viruses include a module known as the polymorphic engine that introduces operations
that have no effect on the functioning of the program, such as an instruction to add 0 to a number
or to concatenate strings into a variable that is never used in the control or output of the program.
Worms are similar to viruses but propagate on their own by exploiting vulnerabilities in
applications and network systems. The SQL Slammer worm, for example, spread by using a
vulnerability in SQL Server communications that allowed it to find other SQL Server instances
by searching random IP addresses. The worm spread rapidly and within minutes had slowed
traffic on large segments of the Internet when it struck in 2003.

184
Chapter 9

Keyloggers and Video Frame Grabbers


Another type of malware that is a growing concern is malware designed to electronically
eavesdrop and steal information. Keyloggers are programs or hardware devices that intercept
keystrokes from a keyboard and log them to a file. The file is then sent to an attacker, in the case
of software-based keyloggers, or retrieved by an attacker, in the case of a hardware keylogger. It
is easy to imagine a scenario in which a keylogger could be used to collect useful information for
a thief. Consider the following sequence of events in which a user
• Opens a browser and enters a URL for a popular online auction
• Searches for an electronic devices and makes a purchase
• Opens her payment service account by entering a username and password
• Navigates to her bank’s Web site
• Logs into her accounts using her bank username and password
• Transfers funds from her savings account to checking account
• Navigates to several news sites
Logging every keystroke can lead to a great deal of useless information from the attacker’s point
of view; however, by scanning for text patterns found in known sites, such as the names of
online auctions, banks, and retailers, attackers can quickly identify parts of the log files that will
most likely have usernames, passwords, and account numbers.
For example, an attacker may scan the file looking for text such as “www.mybankwebsite.com”
or “www.someonlineauction.com” and then search for a single term 4 to 12 characters long, such
as “JaneDoeNYC” followed by another word 6 to 15 characters long, such as “P2sSw5rd!” to
retrieve usernames and passwords. Similar scanning techniques can be used to find Social
Security numbers, drivers license numbers, bank account numbers, and so on. Of course, there is
more useful information than just the text that passes through the keyboard.

A Picture, a Thousand Words, and Video Frame Grabbers


One way to avoid having passwords captured by keyloggers is to display a virtual keyboard on
the screen and have users mouse over and click each character in a password. This can
circumvent a keylogger but, as we should expect, attackers have devised ways to continue to
steal information in spite of the countermeasure.
A video frame grabber makes copies of the contents of video memory and so can capture a wide
array of information, such as:
• Virtual keyboards used to enter passwords
• Email messages displayed on the screen
• Spreadsheets and documents displayed
• Instant message discussions
• Account information displayed by database applications
Both keyloggers and video frame grabbers are especially threatening when unmanaged devices
are used to access information. Unmanaged devices include home computers used by customers
to access their account information as well as public access computers, such as in hotels, which
may be infected with malware, including keyloggers and video frame grabbers.

185
Chapter 9

Trojan Horses
Trojan horses are programs that appear to serve one purpose but actually contain malware.
Trojan horses may be found in:
• Browser add-ons
• Utility programs, such as clock synchronizers
• File-sharing utilities
• Programs and files sent through email and instant messaging
Trojan horses are a mechanism for distributing malicious code. They are often used with
multiple forms of malware, known as blended threats, which can include keyloggers,
communications programs, file transfer programs, and command and control programs that allow
remote control or remote execution of code. The ability to execute programs on compromised
hosts gives attackers the means to create networks of compromised computers, sometimes called
zombies but more commonly known as bots.

Remote Control and Botnets


A bot is a program that may be controlled by an attacker. Bots have been used to distribute spam
and phishing attacks, conduct click fraud, and launch distributed DoS attacks. A compromised
host typically listens for commands on an Internet Relay Chat (IRC) channel or instant
messaging service. Botnet controllers can send commands to execute scripts, send spam, or
download updates to the botnet software.
Identifying and eradicating botnets, Trojan horses, keyloggers, video frame grabbers, viruses,
worms, and other malware is more difficult when a device is also compromised because of the
presence a rootkit.

Hiding Malware with Rootkits


A rootkit is a program that masks the presence of other programs and files and makes the
activities of those programs more difficult to detect. Rootkits may modify OS or application code
to
• Intercept low-level system calls for file information
• Prevent the display of information about processes executing
• Load rootkit code instead of OS code
• Substitute legitimate application code with compromised versions of code

186
Chapter 9

Rootkits compromise the OS, so there is not necessarily a trusted computing base. Any
information returned by the OS kernel (for example, processes that are executing or the size of a
particular binary file) may not be true because the code that executes the requested service may
be compromised.
Some tools have been developed to detect patterns indicative of the presence of a rootkit. For
example, a rootkit detector might compare file system information returned by the OS with
information returned by low-level analysis of the disk system; any discrepancies could indicate
the presence of a rootkit. Another technique is to boot a device from a trusted source, such as an
OS CD and scan for rootkits.

Rootkits may become even more difficult to detect, especially if vulnerabilities in BIOS are exploited.
See Robert Lemos’ “Researchers: Rootkits Headed for BIOS” at
http://www.securityfocus.com/news/11372.

The best response to the threat of malware attacks is to use a defense-in-depth strategy. This
approach recognizes that no one countermeasure or policy will fully mitigate the risks of an
attack. It also recognizes that anti-malware programs and related systems are themselves
complex programs with their own limits and vulnerabilities. A defense-in-depth approach to
malware protection will include:
• Antivirus and personal firewalls on client devices
• Network-based content filtering to block malicious content before it reaches the client
• Intrusion prevention monitoring to detect unusual network activity, such as large volumes
of network traffic outside of normal patterns
• Host-based intrusion prevention that detects changes to OS files
• Regular monitoring of logs and audits of security measures
• End-user training, especially on the threat of social engineering techniques
• Comprehensive set of policies that define an organization’s strategy for managing the
risks of malware attacks
An emerging technique for addressing the threat of unwanted applications—such as malware,
bots, and other unintentionally downloaded software—is application control. Application control
mechanisms allow administrators to define policies about the programs that may run in an
environment. For example, a policy may categorize applications based on an administered
security rating, digital signing, date of discovery, or other attribute. Measures such as application
controls are an increasingly important addition to defense-in-depth strategies.
Another area of security that is dependent on systems management services is vulnerability
management.

187
Chapter 9

Managing Security Vulnerabilities


Complex systems are vulnerable to compromise. Sometimes attackers can gain access because
they have detailed knowledge of an application and can exploit subtle and little-known
vulnerabilities. In other cases, attackers may exploit an old version of code with a known
vulnerability that has not been corrected on the victim’s device. Understanding vulnerabilities
and correcting or compensating for them is essential to managing security vulnerabilities.
Detecting security vulnerabilities through penetration testing is another area in which security
professionals depend upon systems managers. Penetration testing is the processes of examining
networks and systems for vulnerabilities and attempting to exploit those vulnerabilities. It almost
goes without saying, but there are many kinds of vulnerabilities that can be tested, including:
• Network security
• Server security
• Application security
• Social engineering and user awareness
• Physical security
• Wireless network security
Systems and application managers can be especially helpful with network, server, application,
and wireless network security penetration testing.
There are two approaches to penetration testing. First, you can start with an assumed knowledge
of the network and server infrastructure; and second, starting without such knowledge. The
former, known as white box testing, gives the tester the most knowledge with which to conduct
the test and therefore increases the chances of finding vulnerabilities. Starting without
knowledge of the IT infrastructure may give a better indication of how an attacker with no
knowledge of the systems would proceed. The goal of penetration testing is to find as many
vulnerabilities as possible, so white box testing is the favored method. The black box testing
method entails an implicit dependence on “security by obscurity,” a discredited countermeasure.
Assuming white box testing is used, systems management personnel will be needed to provide
details about:
• The types of devices deployed
• Network topology
• Applications and version information
• Authentication and authorization mechanisms
• Countermeasures in place
• Operational procedures
• Wireless network configurations
An attacker may not have access to all this information, but a penetration test conducted by
security professionals with these details is more likely to provide the kind of vulnerability
information sought. Once vulnerabilities are found, the next step is to correct them.

188
Chapter 9

Configuration and Patch Management


In many cases, vulnerabilities can be corrected through either changes to configuration or
through the application of patches. Each will be considered in turn.

Configuration Management
Configuration management entails tracking the software and configuration of devices. This is of
importance to security management in a number of scenarios. First, if a vulnerability is
discovered in a particular version of an application, a configuration management reporting
system can identify which hosts are running that version. This is especially important when
client devices may have one of several different versions. Although an organization may make it
a policy to standardize on one or two versions of office suites, it may have half a dozen or more
versions of client-side database drivers.
Another case in which configuration management supports security operation is with risk
analysis and incident response. The existence of a vulnerability is one factor that determines how
to respond; another factor is the importance of the device with the vulnerability. High-priority
devices, such as customer-facing servers, should be patched immediately if the vulnerability
could seriously disrupt operations. However, a lower-priority device, such as a server running a
database tracking a training schedule, can be queued for patching at a later time after critical
systems have been addressed.
Another area configuration management can help with security is in the planning process. For
example, if antivirus software will be upgraded, the configuration management database can help
to determine the number of licenses required. Much of host-based security, though, is based on
countermeasures, such as personal firewalls and anti-malware systems.

Patch Management
Patch management is the process of updating software and configuration (“patching”) to improve
security, functionality, or performance. There are several components in patch management:
• Being aware of patch releases
• Testing patches
• Deploying patches
• Maintaining configuration information
All of these require systems management support. Patches are released on regular schedules by
many vendors, so those can be incorporated into maintenance schedules. These patches often
address minor or moderate impact bugs or provide performance improvements. Unscheduled
patches, such as fixes for security vulnerabilities, may come at any time. Assessing the impact of
vulnerabilities and the benefit of patching is a process that should be done by both security and
systems administration teams.

189
Chapter 9

Prior to deploying a patch, it needs to be tested. If a patch breaks a functioning system, a


decision must be made—does the benefit of the patch outweigh the benefit of the lost
functionality? Most likely, a patch that causes a problem will do so with a limited number of
configurations. In such cases, a configuration management database can aid in determining
where the patch should be deployed and where it should not.
Deploying patches is strictly a systems management operation. Ideally, the process is automated
so that code is distributed to the appropriate clients, the installation is verified, and details are
logged for analysis when installation fails.
The final step in the patch management process is updating configuration information. A
configuration management database and related applications may pull this data as part of its
routine operations or configuration information may be pushed to the database during the
deployment process. Yet another area that requires security and systems management
coordination is access control.

The SQL Slammer incident is one of those cases that did not have to happen. Microsoft had patched
the vulnerability exploited by SQL Slammer months before the worm struck. Part of the problem was
that database administrators had not patched SQL Server instances, and part of the problem was due
to users not knowing they were running a desktop version of SQL Server that had been embedded in
some applications. This is one of the reasons asset management is so important to information
security—you must know what software you are running and how it is patched.

Controlling Access
One of the most challenging aspects of both security and systems management is access controls.
The number of users, roles, and privileges is growing so rapidly that for many organizations, the
only way to keep up is to leverage automation. This, in turn, requires a policy framework for
driving the automated tools.
Controlling access is not one dimensional and one must look at it from both a user and resource
perspective. With regards to users, the key issues are identity management and authentication
and authorization. From the resource management perspective, it is important to address topics
such as file and disk encryption as well as secure remote access.

Identity Management, Authentication, and Authorization


Identities, at least in the realm of information security, are a representation of a user for the
purpose of providing access to resources and services. Identity management is an operational
practice that includes:
• Provisioning user identity records
• Automating supporting workflows
• Administering identity services
• Providing self-service mechanisms, such as password resets
• Decommissioning identities
There are both security and systems management benefits of identity management. From the
security side, a single representation of a person is much easier to manager than multiple
instances.

190
Chapter 9

For example, if a new employee joins the finance department, a policy can define the
authorizations to use all the financial systems common to members of the department as well as
common to all employees. More specific details, such as the person’s role in the department, can
provide further authorizations. This model provides for a centralized method for access control in
contrast to a commonly found alternative.
Without identity management, a user’s authentications are distributed across systems and
applications. If a person needs access to the financial planning system, an account is created on
that application for them. If they need access to a network file server, authorizations are
established on the network server for them. When the person changes positions or leaves the
company, systems and application administrators around the company have to update access
controls. Deploying identity management is clearly beneficial for both operational and security
reasons and it is another area that crosses boundaries between systems management and security
management.

File and Disk Encryption


Mobile computing has introduced a new set of threats and management concerns. News stories
about stolen notebook computers with tens of thousands, or more, records with personally
identifying information are too common.

For more information about the problem of lost and stolen notebooks, see the Realtime IT
Compliance Community at http://www.realtime-itcompliance.com/lost_stolen_laptops/.

A logical solution for security professionals is to use file, or ideally full disk, encryption. If a
mobile device is stolen or lost, no one else will be able to access the information on the device
assuming sufficiently strong encryption is used.
Systems administrators may not see it as such a black-and-white situation. Yes, encryption will
protect data when the device is stolen, but what about operations when the device is not stolen.
Consider:
• What happens if there is a problem with the disk drive, the encryption key cannot be
recovered, and the disk must be reformatted?
• The encryption key is lost?
• What is the performance penalty for encryption?
• How will devices be administered once they are encrypted?
• How will full disk encryption configurations vary by hardware model and feature?
Full disk encryption is growing in popularity and more organizations are likely to adopt it.
Administrators will have to understand how this functionality will effect end user support,
recovery efforts, and device management tools and procedures.
File encryption also helps to address the growing problem of data in motion. Files are easily
transferred to removable media, such as USB memory devices, iPods, and removable disk drives.
Encryption can help to protect data copied to such devices; a better solution is controlling access
to such media based on policy, or in some cases, blocking access to them completely.

191
Chapter 9

VPNs and Secure Remote Access


VPNs and secure remote access services are similar to full disk encryption in terms of the
benefits to security but at an increasing cost of management complexity. Some of the common
issues with VPNs are:
• Ensuring proper client configuration
• Establishing policies
• Managing VPN certificates
• Maintaining sometimes fine-grained access control rules
As with access controls, as the number of subjects involved with VPNs increases so do the
management issues.
One must also keep in mind that remote workers use a variety of methods to communicate:
managed Internet access, unmanaged Internet access, wireless access, and direct connections.
Enforcement of VPN usage is especially important when connecting wirelessly. Make sure the
security tools being used to enforce remote worker policies actually enforce VPN usage
automatically and transparently, according to policies based on location and concomitant risk.
Centralized release and patch management along with a configuration management database can
support VPN administration. Although many may think of managing a VPN as a security or
network administration function, there are still fundamental tasks in the systems management
area as well.

Security Information Management


Security management is the practice of establishing, coordinating, and evaluating the range of
security measures put in place within an organization:
• Security policies
• Compliance
• Auditing
• Incident response and forensics
• Business continuity

192
Chapter 9

Security Policies
Security policies are the foundation of an information security program. Policies are high-level
descriptions of what is permitted and what is expected with regard to security. Organizations will
typically have several security polices, covering:
• Acceptable use of IT infrastructure
• Access control
• Anti-malware policy
• Content-filtering policy
• Encryption policy
• Document and email retention
• Notebook and mobile device security
• Server and workstation security policy
• Wireless network access policy
Policies are generally written to clearly define the scope of the policy, the reason for the policy,
and the details of the policy as well as provide the definition of technical terms if needed. An
encryption policy, for example, might contain:
• A scope statement that defines the business units, employees, contractors, and business
partners that need to adhere to the policy.
• An explanation for the need for the policy, such as protecting the confidentiality of
customer information and proprietary company information.
• Policy details, such as a list of the categories of information that must be encrypted (for
example, confidential, private, and sensitive information), the algorithms that may be
used, and minimum key lengths.
• Definitions for terms such as digital signatures and public key cryptography.
Policies, such as encryption, can apply to multiple services or they may be specific to a particular
service, such as email policies. In either case, policies should be aligned with the service-
oriented model.

193
Chapter 9

Compliance
Adequate protection of private and confidential information plays a role in many government
regulations. Some of the most well known include:
• Sarbanes-Oxley Act—publicly traded companies
• Gramm-Leach-Bliley Act—financial service firms
• Health Insurance Portability and Accountability Act (HIPAA)—health care firms
• BASEL II—financial services
• 21 CFR Part 11—pharmaceutical companies
• Federal Information Security Management Act (FISMA)—federal government
• California State Bill (SB) 1386—business with customers in California
• EU Directives on Privacy—companies doing business in the EU
• Personal Information Protection and Electronic Documents Act (PIPEDA)—companies
doing business in Canada
Responsibility for complying with the array of regulations in existence is likely spread across a
number of departments. Fortunately for IT practitioners, sound security management practices
often contribute significantly to meeting compliance requirements. With proper controls, such as
information classification, access controls, network and host defenses, and proper monitoring
and auditing, IT departments can meet the requirements of many regulations by continuing their
security best practices. The organization of information security addresses the need for
governance and management of security services and functions.
With regards to governance, executive management should have well-defined controls and
measures in place to allow them to monitor and, if necessary, correct security operations. The
governance model detailed in the Control Objectives for Information and Related Technologies
(COBIT) framework provides a sound foundation for governance practices in general. The
controls and measures described in COBIT are useful across the spectrum of service-oriented
management, not just security management.

For more information about COBIT, see the Information Systems Audit and Control Association’s
Web site at https://www.isaca.org/.

Security Management and Asset Management


Like policy formation, asset management is one of the fundamental activities in security
management. Asset management consists of two components: tracking hardware and software
assets and classifying information.

194
Chapter 9

Hardware and Software Asset Management


Assets cannot be protected if they are not managed, and they cannot be managed if they are not
identified. This idea seems so obvious that it should not warrant mentioning, but tracking IT
inventory is not a trivial task. Consider some of the factors that have to be accounted for when
tracking inventory:
• Hardware has to be identified and inventory
• Software running on a device must be tracked
• Components within a device may be replaced or removed
• Hardware may be transferred between departments or individuals
• Some devices that access IT resources are not owned or controlled by the IT department
or the organization
Hardware is one of the easiest aspects of physical inventory to manage. The location and the
person or department responsible must be tracked. Movement within the organization needs to be
monitored, and when the device is retired that must be noted as well. When devices are
transferred or retired, operations may need to be performed to erase private or confidential data.
This should be governed by information classification policy.
Software can be a challenge to track without tools. Applications are often installed, patched, and
removed from users’ devices as their needs change. Very few organizations can maintain a
consistent set of software components on all devices across the organization even when they
standardize as much as possible.
In both hardware and software management, a subunit of a device or application may be moved
among devices. For example, disk drives may be moved between workstations and application
server modules may be uninstalled from one server and moved to another.
To compound the challenges facing IT managers responsible for asset management, many
managers now have to deal with semi-managed devices. These are often mobile devices that are
owned by employees, contractors, and consultants but have some access to IT infrastructures.
The most common are:
• Mobile email devices, such as BlackBerry devices
• PDAs
• Smartphones
• Data exchange devices, such as flash drives
The problem with these devices is that they can introduce malware or other threats to a network.
Even if a complete inventory is maintained of all the software and hardware owned by an
organization, security staff may still not have an accurate picture of the potential threats facing
their infrastructure. Properly managing and controlling the use of semi-managed devices has
emerged as a key challenge in security management.

195
Chapter 9

Information Classification
Information classification is the process of labeling different types of information and
establishing appropriate controls for each type. Commercial and military institutions use
different classification schemes; the most common categories in commercial classifications are:
• Public
• Sensitive
• Private
• Confidential
By categorizing information, appropriate controls can be placed on information without having
to apply a most-restrictive policy that protects all information as if it were equally important.

Public Information
The public classification is reserved for information that, if disclosed publicly, would not have an
adverse affect on the organization. For example, information provided in press releases would
not contain information that requires any unusual level of protection.

Sensitive Information
Sensitive information should not be publicly disclosed, but if it were, the disclosure would not
have serious adverse affects on the organization. Information about project plans, work
schedules, orders, inventory levels, and other operational data by itself could not be used against
an organization. It is conceivable that a competitor could piece together competitive intelligence
about a firm by examining large amounts of such operational information.

Private Information
Private information is about customers, clients, patients, employees, and other persons who have
dealings with an organization. The disclosure of private information could adversely affect those
individuals; organizations may be subject to fines or other legal proceedings for violating
regulations regarding the protection of private information. Examples of private information
include:
• Employee records
• Protected healthcare information
• Financial records
• Social Security numbers, driver’s license numbers, and other identifying information
Depending on the industry, organizations could be subject to a range of regulations governing
the protection of private information. The health care and financial services industries are subject
to comprehensive regulations in the United States; the European Union (EU) has established
broad privacy protections that apply to all businesses.

196
Chapter 9

Confidential Information
Confidential information requires significant controls because the disclosure of this information
could have a significant impact on an organization. Some of the typical types of confidential
information include:
• Trade secrets
• Negotiation details
• Strategic plans
• Intellectual property, such as algorithms and product designs
Like private information, confidential information should be protected with well-defined access
controls and clear lines of responsibility.
Although many of the same measures may be used to protect confidential and private
information, they are fundamentally different and should not be linked with regards to security
policies and procedures. Private information, for example, may be subject to specific audit
requirements that are not relevant to protecting confidential information. Similarly, some
confidential information may be protected with stronger, and more costly, measures than
required for private information. These two categories should always be managed as separate
entities.

Security Auditing and Monitoring


IT auditing became much more common with the advent of the Sarbanes-Oxley Act. The goal of
this and related regulations is to preserve the integrity of business information. To meet that
objective, you must have procedures and systems in place that protect information and you must
periodically review those systems and procedures to ensure they are functioning adequately.
Thus, regular IT audits are much more thorough than may have been conducted in the past.

Audit Controls
Auditing begins with policies. Policies may be defined by an organization on it own or as part of
compliance with regulations. Regardless of the motivation for policies, the role of auditing is to
ensure that they are appropriate for the objective and sufficiently implemented. Some of the most
important areas that should be verified in audits include:
• Information classification
• Access controls appropriate for information classifications
• Adequate perimeter and network defenses
• Adequate host defenses
• Adequate review of content, both entering and leaving the network
• Sufficient training on security measures
• Backup and recovery procedures
• Appropriate security management practices, such as separation of duties and rotation of
duties
Auditing is an in-depth review of security policies and procedures. Auditing may be regular but
is still infrequent; day-to-day monitoring is also required.

197
Chapter 9

Security Monitoring
Monitoring can be time consuming unless tools are used to help sift through the volumes of log
data that can be generated in even a moderate-sized network. The difficulties arise from the
range of events that should be monitored, including system events, application events, and user
events. Some of the most common are:
• System performance metrics, such as number of processes, CPU utilization, storage
utilization
• Login attempts and failures
• Applications executed and functions executed within enterprise applications
• Changes to OS configurations
• Errors generated by applications
• Files read, modified, and deleted
• Attempts to access unauthorized resources
In isolation, any one of these events may not be indicative of a serious breach. However, in
conjunction with other events, these may warrant closer examination and may indicate a breach.
One of the greatest challenges in information security today is integrating data from the variety
of security mechanisms already in place. Firewalls, routers, intrusion prevention devices, access
control systems, OSs, anti-malware solutions, and content-filtering applications can all generate
large quantities of data, some of which can be quite useful if it is identified and integrated with
other information in a timely manner.

Security Management and Risk Assessment


Risk assessment is primarily a function of security management. The goal of risk management is
to identify risks to IT infrastructure, prioritize those risks, and implement mitigation strategies to
bring the risks within acceptable levels. As that list of tasks implies, you cannot eliminate risks,
but you can reduce their likelihood. Prioritizing also implies that you might not be able to
adequately reduce the potential for all risks. Risk management often entails balancing needs
against limited resources.
Thinking of risk management in terms of SOM allows you to view risks in terms of services
provided and not just in terms of specific pieces of infrastructure. For example,
• Data storage management is more than just providing disk space; it includes backup and
archive services and access control management. Security risks in this service include
breaches of access controls and theft of backup media.
• Communication services such as email, instant messaging, and voice over IP (VoIP)
depend on network infrastructure and so share common risks, such as DoS attacks.
• Application services, such as Web servers and J2EE and .Net application servers, can
provide a wide range of services but are subject to risks such as host intrusions,
information theft, and application tampering.

198
Chapter 9

Mitigation strategies within service-oriented should address the full service, and this often entails
detailed mitigation strategies based on the particulars of an implementation. For example,
standby servers in a different location being used to mitigate the risk of a compromised email
server shutting down communications services. If the primary email server were to fail, email
records within the domain’s DNS entries could be updated and email re-routed to the alternative
server. Controlling risks is closely aligned with another security management function: business
continuity management.

Security Management and Business Continuity Management


Information security is often described in terms of three characteristics: confidentiality, integrity,
and availability. It is the last characteristic that is the subject of business continuity. From a SOM
perspective, business continuity is a broad topic that includes information security but is not
limited to it.
Security professionals should contribute to business continuity plans for a number of reasons:
• Systems availability is subject to threats such as DoS attacks; business continuity
planning should take into account countermeasure to mitigate the impact of a such an
attack.
• Business continuity often includes plans for redeploying operations to an alternative site;
electronic and physical security measures must be in place at these sites as well as at the
primary site.
• During a business disruption, data may be moved between servers or entire facilities. The
data must be protected in transit.
Another area in which security professionals are required is compliance with government
regulations.
The defense-in-depth approach is considered a security best practice but it does come at a price:
multiple security solutions must be managed. In addition, to gain the most from these point
solutions, the information from them should be coordinated. The key activities within this area
include:
• Ensuring both network and host-based defenses are kept up to date with signature files
and patches
• Coordinating information from multiple sources, such as perimeter defenses and host-
based defenses
• Ensuring procedures dictated by security policies are in place and enforced across
countermeasures
• Properly configuring new devices and software during the implementation phase of a
project
• Ensuring mechanisms are in place to support incident response
The last example is one in which systems managers may be asked to play a major role because
incident response can require a rapidly executed and well-coordinated plan to contain the impact
of a security breach.

199
Chapter 9

Incident Response
An incident response plan is like an insurance policy: no one wants to have to use it, but
everyone is glad to have one when it is needed. A security incident can take on many forms,
including:
• A virus infection of multiple devices or critical servers
• The discovery of a significant number of Trojan horse programs
• Infections with keyloggers
• A DoS attack on a network device
• An attempt to break into a server
• An attempt to steal information for a database
• The discovery of a botnet within an organization’s network
• Loss of a notebook or other mobile device containing sensitive, private, or confidential
information
Incident response planning has two dimensions—one addresses procedures and the other
addresses the human resources element of the problem.

Incident Response Procedures


When an event occurs, the logical challenge is to determine how to respond. The solution should
be governed by an incident response policy that includes:
• Guidelines on containing the potential damage of the incident
• Persons to notify, including both IT and business executives and managers
• Procedures for contacting information security personnel with knowledge of forensic
procedures who can help gather evidence
• Procedures for securing compromised devices and preserving evidence
There are a few issues related to human resources that should be kept in mind when designing an
incident response plan:
• The need for incident response training
• The need for separation of duties
• The benefits of post-incident analysis

Training and Incident Response


Users, technical staff, and management should all be trained in incident response procedures. For
many, it may be as simple as directing them to call the service desk when something suspicious
appears. This suspicious activity could be something as clear as a warning from a local antivirus
program indicating malware has been detected to something less obvious, such as sluggish
performance from a device for no apparent reason (this could indicate a spyware or Trojan horse
infection that is using the device for other purposes).

200
Chapter 9

Technical staff, especially front-line service desk support and systems administrators should be
trained on how to respond according to the severity of an incident. For example, minor incidents,
such as a virus infection on a single device, might call for a basic response using a procedure
defined for relatively predictable incidents. For major incidents, such as a DoS attack that is
blocking access to critical servers, front-line technical staff should know how to enlist additional
help to deal with the problem.
Executives and managers should understand the implications of various types of attacks with
regard to the impact on business operations as well as legal responsibilities with regards to
reporting the incident and complying with government regulations.

Separation of Duties
There is something strange about the fact that it is more prudent to trust two or more individuals
than it is to trust one, but that is the idea behind separation of duties. This is especially important
when responding to security incidents. One of the activities of incident response is to collect and
preserve evidence. It is not unheard of for someone working for an organization to be involved
with crimes against that organization. If an employee or contractor perpetrated an incident, that
person may be involved with the incident response.
For example, a database administrator is someone with the keys to the proverbial kingdom when
it comes to large volumes of business information. If someone were stealing customer credit card
data from a database and a security monitor on the network detected unusual activity on a
database server, the first person to call would be the database administrator. The potential
problem is clear; the solution is to have at least two knowledgeable individuals respond to an
incident.

Response Evaluation
Security breaches are disruptive and potentially costly, but they are also opportunities to improve
security measures. A post-incident evaluation can provide valuable information about:
• How attackers breached security mechanisms
• Which security mechanisms worked and which did not
• If attack techniques were not anticipated
• Whether monitoring and logging were adequate to diagnose the incident
• Vulnerabilities in applications, OSs, or network devices
• Vulnerabilities in policies and procedures
The goal of the post-incident evaluation is to improve the quality of security, not simply to place
blame. Managing information security is difficult and a breach does not necessarily imply
negligence or disregard for policies and procedures.

Summary
Security management is one of the most multi-faceted areas of systems management. It ranges
from the broad issues of managing security information down to the detailed practice of threat
and vulnerability assessment. In addition to day-to-day activities such as monitoring systems,
applications, and users, systems administrators and security professionals must manage an array
of security mechanisms deployed in such a way as to provide multiple layers of defense.

201
Chapter 10

Chapter 10: Managing Risk in Information Systems


The focus of this guide has been on the practice of systems management with an emphasis on
best practices for creating and maintaining IT infrastructure. As useful and effective as these
practices are, they cannot guarantee that operations will always go as planned, that projects will
stay on schedule, or that adverse events will not occur. Part of effective systems management is
managing the risks inherent in IT operations. This chapter will examine the following topics
within the broader area of IT risk management:
• The practice of risk analysis
• The impact of risks and their implications for risk management
The goal of risk management is to understand the breadth of risks facing an organization and to
formulate strategies for mitigating those risks to the greatest extent possible.

The Practice of Risk Analysis


Risk analysis is a methodical process for identifying risks and assigning a cost to those risks. The
four basic parts of risk analysis are:
• Identify information assets and threats to those assets
• Determine the impact of threats to an organization
• Determine the likelihood for each threat
• Assess the risk versus the cost of countermeasures
Together, these steps provide the basic information that is needed to align risk management
strategies with overall business strategies.

Identify Information Assets and Threats


The ancient Greek maxim “Know thyself” aptly summarizes the purpose of the first stage of risk
analysis. It may sound almost trivial, but knowing what information assets are in place can be a
challenge. Just consider examples of what falls under the category “information asset,” at least
from a risk analysis perspective:
• Servers and client devices
• Network devices
• Databases
• Application code
• Systems documentation
• Intellectual property
The spectrum of information assets ranges from tangible hardware to intangible intellectual
property. Each is subject to particular types of risks.

202
Chapter 10

Figure 10.1: Threats are associated with virtually every part of an IT infrastructure.

Servers and Client Devices


In many ways, the risks to hardware are the easiest to identify if for no other reason than the fact
that these devices are tangible. Servers and client devices are subject to physical and logical
risks. Physical risks are those that threaten the actual device, as opposed to software that runs on
the device; physical risks include:
• Fire, water damage, and natural disaster
• Theft
• Electrical surges
• Component failure
• Unapproved and approved hardware changes
Logical risks can be just as disruptive; these include:
• Software bugs
• Viruses, worms, and other forms of malware
• Misconfigured applications
• Loss of backups or failure of backup media
• Unapproved and approved system changes
Network devices are subject to these threats as well as others.

203
Chapter 10

Network Devices
Network devices such as routers, firewalls, intrusion prevention systems (IPSs), content filters,
and other security appliances are subject to the risks of network attacks. The Denial of Service
(DoS) attack is relatively simple but highly effective, especially when launched from multiple,
distributed devices.

For an example of just how disruptive a distributed DoS (DDoS) attack can be, see Scott Berinato’s
“Attack of the Bots” in Wired Magazine at http://www.wired.com/wired/archive/14.11/botnet.html.

Other types of network attacks include:


• DNS poisoning
• IP spoofing
• Man-in-the-middle attacks
• Eavesdropping
DNS poisoning corrupts DNS servers so that URLs are redirected away from legitimate sites to a
third-party site. For example, if www.myrealbanksite.com should map to 192.160.100.10 but the
DNS server is modified to change the mapping to some other address (such as 180.101.10.12), a
person trying to browse to the bank site may be drawn into a phishing scam without even
knowing it.
IP spoofing is the practice of changing the source IP address of a packet to make it appear as if it
came from another source. IP spoofing is possible because of the design of the IP protocols, but
IPv6 addresses these threats.
In a man-in-the-middle attack, a third party intercepts a communication session between two
other parties and can monitor, change, and interject communications between the two. Secure
communications protocols are designed to prevent this type of attack, or at least raise the cost so
high that it is not an efficient strategy for the attacker.

204
Chapter 10

Figure 10.2: A man-in-the-middle attack entails intercepting and tampering with communications between
two parties that is presumed to be secure.

Eavesdropping, of course, is monitoring the communications of other parties. Strong encryption


can minimize the risk of eavesdropping.

Databases
Databases are an obvious target of attack. These are the repositories of a wide range of
information, including:
• Personal information about customers
• Employee information
• Financial records
• Operational information
All databases with a user interface are subject to SQL injection attacks, which can result in data
theft. In this type of attack, an attacker creates a query that exploits vulnerabilities in the
interface’s query processing code. There are multiple techniques for preventing SQL injection
attacks, most of which are based on sound coding practices.

205
Chapter 10

Figure 10.3: Databases are vulnerable to attacks at multiple points.

In addition to SQL injection attacks, vulnerabilities in database components, such as the listener
(the application that listens on a specified port for requests to the database), can be used to
compromise a database. Once an attacker has gained access to a database, the attacker can also
tamper with the data as well as steal it. Relatively small changes to data can be difficult to detect
unless auditing and monitoring policies are well established. Database servers are also subject to
disruption from DoS attacks.

Application Code
The importance of application code can range from the mundane, such as scripts for cleaning up
temporary directories, to mission-critical systems, such as enterprise resource planning (ERP)
systems. These assets are subject to several risks, including:
• Flaws in logic
• Insufficient error-handling code
• Dependency on flawed library or other shared code
• Insufficient CPU, storage, or network resources
Flaws in application logic will be found in any sufficiently complex system. Vendors and
developers routinely patch applications to correct known problems.
Insufficient error-handling code is a problem because applications will encounter conditions that
will disrupt normal operations. When a storage device is full, the application will not be able to
save data. How does the application respond? Graceful degradation of services requires that the
application provide alternative means for systems administrators or users to respond to the
problem.
Complex applications are modular and layered. Lower levels provide services for upper levels.
For example, a customer management system uses databases for persistent storage; database
systems depend on files systems or, in some cases, low-level I/O routines provided by the
operating system (OS). Client applications, such as office productivity programs, depend on
graphical interface components provided by the OS. Vulnerabilities in any of these lower-level
systems can create risks for any application that uses them.

206
Chapter 10

In addition to the long-term risks associated with flawed code, there are transient risks such as
insufficient resources. An error in an application or a mistake by an operator can consume large
portions of available bandwidth on the network, for example, by unnecessarily transferring a set
of large files. Similarly, a poorly formed database query can easily consume available CPU
cycles and I/O operations.

Figure 10.4: Layered applications introduce dependencies that pose potential risks.

Systems Documentation
Documentation rarely makes it on any top-ten list of information assets, but it should.
Organizations that depend on the knowledge of their staff, contractors, consultants, and business
partners without formally documenting processes and procedures are at risk. Employees leave,
contractors move on to the next assignment, and partners go out of business. The need to
formalize and capture information about IT systems is obvious.
Capability maturity models, such as Carnegie Mellon Software Engineering Institutes’ model,
define levels of capability that range from ad hoc management to optimized management.
Moving from ad hoc through more capable stages requires, among other things, formalized and
documented processes. Without this, organizations are subject to a number of risks, including:
• Disruption of services
• Additional expenses associated with reverse engineering
• Delayed deployment of applications and services
• Increased need for training
• Poor quality control

207
Chapter 10

Systems documentation is just one type of institutional knowledge that constitutes an


organizational asset.

For more information about capability maturing models, see the Software Engineering Institute’s (SEI)
Web site at http://www.sei.cmu.edu/cmm/.

Intellectual Property
Intellectual properties are intangible assets based on the creativity of an organization or
individual and provide some type of competitive advantage or constitute an asset that can be
sold. Intellectual property includes:
• Patents
• Trade secrets
• Designs and art work
• Processes
• Copyright material
The more knowledge-based a business, the more important the intellectual property. This type of
asset is subject to a number of risks but the most important, and threatening, is theft. Headlines
from the U.S. Department of Justice (DoJ) press releases depict the range of crimes related to
intellectual property theft:
• “Former Vancouver Area Man Sentenced to Five Years in Prison for Conspiracy
Involving Counterfeit Software and Money Laundering: Web of Companies Sold up to
$20 million of Microsoft Software with Altered Licenses”
• “Pharmaceutical Distributor Pleads Guilty to Selling Counterfeit Drugs”
• “Local Business Owner Sentenced to Year In Jail for Copyright Infringement Conspiracy
Related to the Sales of Counterfeit Goods”
• “California Man Sentenced for Electronically Stealing Trade Secrets from his Former
Employer, a Construction Contractor”

For more examples, see the Computer Crime & Intellectual Property Section of the U.S. Department
of Justice Web site at http://www.usdoj.gov/criminal/cybercrime/ip.html.

Intellectual property theft can occur in many ways, including:


• Attackers can breach electronic security measures and steal information from servers.
• Thieves can steal notebook computers, PDAs, and smartphones with documents,
diagrams, and other confidential and private information.
• Employees and other insiders can steal or leak information for their own use or for sale to
others.
• Contractors and consultants can breach non-disclosure agreements and use knowledge
acquired at one client while working for one of that client’s competitors.

208
Chapter 10

Mitigating the risks to intellectual property is challenging. Unlike tangible assets that can be
locked down and monitored, intellectual property pervades an organization, is embedded in
software that is distributed to customers, and may be remembered by employees and other
insiders long after they leave an organization.
The types of information assets and risks to those assets are wide ranging. From the most
mundane PC to the valuable intellectual property, the relative impact of risks must be assessed in
order to properly manage risk.

Determine Impact of Risks


Once information assets and the risks associated with them have been identified, the next step of
risk analysis is to determine the impact each of those risks can have on an organization.

Types of Costs
The impact of threats is a function of the value of the asset damaged by a threat, the cost of
restoring the asset, and the cost of not having the functional asset. For example, if an application
server is destroyed in a fire, the cost to the organization includes:
• Replacing the server
• Restoring data to the replacement server
• Configuring the replacement server
• Testing the replacement server
• Lost revenue or productivity during downtime
• Cost of switching to and from failover servers, if used
These costs apply to other types of assets as well. In addition to these, consider the following
when determining the impact:
• The value of intellectual property to competitors
• The potential for penalties for violating regulations, such as failure to comply with
privacy regulations if a customer database is compromised
• The costs to brand value due to the public disclosure of a security breach
• Contractual penalties for not meeting service level agreements (SLAs)
Identifying the types of costs is followed by steps to quantify those costs.

209
Chapter 10

Determine Costs
Quantifying costs related to risks is far from straightforward. To begin, let’s examine the
simplest of cases and then move on to the more challenging areas.

Quantitative Measures of Costs


It is relatively easy to calculate the costs of tangible assets with known replacement costs,
depreciation schedules, and such. A comprehensive inventory of assets along with information
tracked in financial management systems can provide most if not all information needed to
quantify the cost of replacing those assets.
Valuing intellectual property is more difficult, but there are formal methods for doing so,
including:
• Basing the value of intellectual property on the amount of royalties an organization
would have to pay if it licensed the intellectual property from another party
• The premium price charged for a product when compared with similar products that lack
the benefit of the intellectual property
• The cost to develop the intellectual property, perhaps adjusted for inflation
• The cost of redeveloping the intellectual property
• Valuing all intellectual property as the value of the business less the value of all tangible
assets
The value of other assets, particularly brand value and reputation, do not lend themselves to
quantitative measures.

Qualitative Evaluations
Qualitative techniques are used when quantitative measures cannot be used. These techniques
typically depend on the reasoned opinions of experts or others knowledgeable about a particular
area. For example, if a bank is trying to assess the cost of a security breach in which 10,000
customer records are compromised, it might consult with:
• Attorneys regarding disclosure regulations
• Marketing executives for an assessment of the negative publicity
• Industry consultants who have worked with competitors in similar situations
• Customer focus groups
The outcome of the evaluations may be ordered sets of risks with relative measures—such as
high, moderate, and low—assigned to each. Although these assignments are not as precise as
quantitative measures, they can provide enough guidance to allocate resources to protect these
assets. The value of assets is one component of risk analysis calculations; another is the
likelihood of threats.

210
Chapter 10

Determine the Likelihood of Threats


Although there are many threats to information assets, many occur rarely enough that only
occasional review is required; however, other threats are constant and require continual
monitoring. Understanding the likelihood of each threat is essential to properly allocating
resources.
The likelihood of fires, floods, storms, and other natural disasters can be determined using
historical data. Insurance companies may be able to provide relevant statistics.
Disruptions due to equipment failures can be calculated using metrics such as the mean time
between failures (MTBF). Manufacturers should be able to provide MTBF measures.
Statistics about security threats are more difficult to come by. The Computer Security Institute
(CSI) and U.S. Federal Bureau of Investigation (FBI) conduct an annual computer security
survey that is often cited for security trends. Although useful, the CSI/FBI study is limited to
surveying cooperative organizations. Not all companies or government agencies are inclined to
discuss security practices and events. Furthermore, even when organizations are willing to share
information, they can describe only what they are aware of. There may be botnets, rootkits, and
Trojan horses installed on the corporate network without the knowledge of systems
administrators.

The CSI/FBI report is available from the CSI Web site at http://gocsi.com/. For more information
about the limited usefulness of self assessments, see Jeffrey Gangemi’s article “Cybercriminals
Target Small Biz” in BusinessWeek online. According to the article "approximately 70% of small
businesses consider information security a high priority, and more than 80% have confidence in their
existing protective measures" yet almost 20% do not use antivirus scanning and 60% do not use
encryption on their wireless networks.

With asset values and the likelihood of experiencing particular threats calculated, you can move
on to the next stage of risk assessment, calculating risk measures.

Calculating Risk Measures


Ultimately, you need to put monetary value on risks so that you can determine the appropriate
level of resources dedicated to protecting assets. The building blocks of this process are:
• Asset value
• Exposure factor (EF)
• Single loss expectancy (SLE)
• Annualized rate of occurrence (ARO)
• Annualized loss expectancy (ALE)
Asset value is the calculated as described earlier.

211
Chapter 10

Exposure Factors
EF is the percentage of the value of an asset lost in one occurrence of a threat. For example, if a
server is completely destroyed by a flood, the EF is 100 percent; if one-fifth of the data in a
database is stolen or otherwise compromised, the EF is 20 percent. Note that each threat will
have a distinct EF.
SLE is calculated with the formula:
SLE = asset value × exposure factor
For example, if the value of a database is $500,000 and the EF is 20 percent, the SLE is
$100,000 (500,000 x 0.2).
An asset may be subject to multiple threats, so there can be multiple SLEs for a single asset. A
notebook, for example, is exposed to theft, malware attack, hardware failure, and, in some cases,
fire due to battery overheating. SLE would need to be calculated for each of these possible
events.

Annualized Rate of Occurrence


ARO is the expected frequency of a threat occurring within one year. As with the SLE, there is a
separate ARO for each threat. If threats are expected to materialize less than once per year, the
ARO will be less than one and greater than zero. If the rate of occurrence is greater than once per
year, the ARO will be greater than one.
ARO may or may not take into account existing countermeasures. The threat of a virus, worm, or
other malware attack is relatively high; however, the likelihood of a successful malware attack
on devices running antivirus scanners and being fully patched is much less. If a risk analysis is
being done from scratch and countermeasures are not taken into account, use an estimated rate of
occurrence based on the frequency with which malware is found in the wild. However, if
countermeasures are already in place, and the goal of risk analysis is to analyze the need for
additional countermeasures, an ARO based on the likelihood of malware avoiding detection by
existing defenses is more appropriate.

212
Chapter 10

Annualized Loss Expectancy


ALE is calculated according to the following formula:
ALE = single loss expectancy × annualized rate of occurrence

ALE is calculated for each threat to each asset to determine the overall loss expectancy. Table
10.1 shows an example of calculating the total loss expectancy for a single asset.
Laptop
Value Threat EF SLE ARO ALE

$5,000 Theft 100% $5,000 0.1 $500


Fire 100% $5,000 0.01 $50
Malware 20% $1,000 0.2 $200
Hardware
Failure 10% $500 0.05 $25

Total Loss
Expectancy $775

Table 10.1: Total loss expectancy for a notebook in a one year period.

From the calculations in Table 10.1, you can see that it would be reasonable to spend as much as
$500 per year in anti-theft devices but no more than $50 for anti-fire measures. Ideally, the
outcome of risk analysis is a plan to minimize the cost of countermeasures while maximizing the
reduction in the overall level of exposure to assets.

For an in-depth look at risk assessment, see the Risk Management Guide for Information Technology
published by the U.S. National Institute for Standards and Technology, available at
http://csrc.nist.gov/publications/nistpubs/800-30/sp800-30.pdf.

Of course, if you do not have the detailed cost information for this type of calculation, such a
formal method is not useful. An alternative method, which is especially useful when qualitative
risk assessments are used, is the risk-level matrix.

213
Chapter 10

Qualitative Risk Assessment


It is not always possible to assign accurate quantitative measures to threats and their potential
impacts. Consider examples such as:
• The impact on brand from a security breach and the disclosure of a large number of
customer records
• The likelihood of an employee planting a logic bomb in a script set to execute after the
employee terminates
• The likelihood of a high-priority vulnerability in a new network appliance
• The chance that a business partner’s Web services will be unavailable due to a DoS
attack, hardware failure, or other cause
In these examples, you could probably develop a consensus around imprecise likelihoods and
impact measures. Often a simple breakdown into high, medium, and low likelihoods and impacts
is sufficient to order risks so that a rational remediation plan can be formulated. Table 10.2
shows the basic form of a risk matrix, which can be used to combine the measures of likelihood
and impact to determine the overall importance of a threat.
Impact
High Medium Low
High High Medium Low
Likelihood Medium Medium Medium Low
Low Medium Low Low

Table 10.2: A risk matrix that combines likelihood and impact to assess the overall importance of risks.

Some of the combinations of likelihood and impact yield obvious overall risk levels. For
example, a high likelihood risk—such as malware infected emails—combined with high
impact—such as infecting a large number of clients or consuming network resources, a la SQL
Slammer—yields a high risk threat. Low likelihood threats with low impacts present little risk to
an organization and should not be the focus of attention during risk analysis.
Asymmetric combinations of likelihood and impact are more difficult to judge. For example,
how much effort should be made to mitigate a low likelihood but high impact threat? For an
organization with a conservative perspective, such a risk should be categorized as medium;
however, a more risk-tolerant company may categorize it as a low risk.
The risk levels in Table 10.2 are suggestive but by no means definitive. The risk tolerance of an
organization should dictate the risk levels when different combinations of impact and likelihood
are in question.

214
Chapter 10

Risk Analysis Steps


Risk analysis is a methodical process for understanding both the types of threats and their
likelihood. The goal of risk analysis is to identify the highest impact threats, order the priority of
risks based on their impact on the organization, and understand the optimal level of investment in
mitigating those risks. To summarize, the practice of risk management entails:
• Identifying information assets and threats to those assets
• Determining the impact of threats to an organization
• Determining the likelihood for each threat
• Assessing the risk versus the cost of countermeasures
In addition to these steps is the need for monitoring and analysis that provide fundamental
information about likelihoods and impacts for future iterations of the risk analysis life cycle, as
depicted in Figure 10.5.

Figure 10.5: The risk analysis life cycle.

215
Chapter 10

Understanding Business Impact of Risks


Throughout, the discussion of risk analysis has described a number of threats and discussed how
the likelihood of those threats is used to assess the level of risk associated with those threats. Of
course, the likelihood alone is not enough to understand the risk posed by a threat; you need to
take into account the impact.
Impact is the effect a threat has on an organization when a threat is realized; examples include:
• The loss of revenue that occurs because an online sales application fails, perhaps due to a
software failure, a network services disruption, or a malware attack
• The fines and penalties incurred due to violations of regulations
• Penalties specified in contractual agreements that require a specified level of quality of
service
• Loss of brand value to negative publicity associated with a high-profile failure or security
breach
These are representative examples of the broad categories of business objectives that should be
considered when assessing the impact of threats to operations and security. The categories
addressed in this chapter include:
• Operational impact
• Compliance impact
• Business relationship impact
• Customer relations impact
As detailed in this section, these topic areas cover a wide range of business objectives. Like
threats, it is sometimes possible to quantify with a fair level of accuracy and confidence the level
of impact; in other cases, more qualitative assessments are required.

It should be noted that these categories are not mutually exclusive. In fact, impacts of threats can be
measured in more than one of these categories at a time. A high-profile security breach can have an
impact on customer relations as well as lead to fines and penalties for violations of privacy
regulations. A failed server at a retailer can impact both operations and customer relations, especially
if the failure occurs during the high-volume holiday season.

As Figure 10.6 shows, impacts can be thought of as affecting multiple categories at the same
time.

216
Chapter 10

Figure 10.6: Threats can have impacts along multiple business dimensions simultaneously.

Operational Impacts
Operational impacts are those that challenge the ability of an organization to carry out its
workflows. Some common operational workflows are:
• Receiving and processing customer orders
• Fulfilling orders
• Conducting marketing and advertising
• Managing customer relations
• Providing service desk support to internal IT users
• Performing maintenance
• Executing projects
• Managing operations
Within the set of operational impacts, the timeframes along with when the impact of a threat is
realized can vary significantly. For example, consider the difference in impact if a point of sales
system fails and if a data warehouse database server fails.

217
Chapter 10

When a point of sales system fails, revenues from sales stop. Merchandise is not sold, customers
may turn to other providers for the products they need, and financial reporting and reconciliation
operations are blocked. In short, critical, time-sensitive operations are disrupted and losses may
be permanent. This is not the case when a decision support system is offline.
Consider how a data warehouse or other business intelligence application is used. Managers
receive reports about sales, revenues, expenses, and other measures of the financial state of their
department and lines of business. Often, one of the main purposes of a business intelligence
application is to provide a comprehensive view of the state of operations that is not available
from traditional transaction reporting systems, such as account receivables and account payable
systems. These transaction-oriented systems have been designed to keep financial records and
ensure accurate and comprehensive accounting. Business intelligence systems supplement those
with reports designed for analyzing longer-term trends and patterns of activity that required
consolidating data from multiple systems.
Now imagine a data warehouse database server is down for a day. What is the impact on
business? In the short term, the impact of such a disruption is minimal. Managers and executives
can presumably continue to manage day-to-day operations and can perform planning and
strategic analysis later when the data warehouse is back online. There is not a threat of lost
revenues, the disruption is not obvious to customers or business partners, and presumably this
type of management system is not directly subject to compliance regulations, at least in terms of
availability.

Figure 10.7: Threats to time-critical operations, such as sales, have more significant impacts than threats to
less time-sensitive operations, such as business intelligence reporting.

218
Chapter 10

Compliance Impact
Operational and security risks can have an impact on regulatory compliance. The past several
years have witnessed heightened awareness about regulations as well as the advent of new, high-
profile regulations. The list of regulations that affect business and government agencies and
departments is long and spans multiple jurisdictions. Some of the most well-known and broadly
applicable are:
• The Sarbanes-Oxley Act (SOX), which governs financial reporting and other aspects of
management in publicly traded companies in the United States
• The Gramm-Leach-Bliley Act, another U.S. regulation, provides for the protection of
personal financial information
• The Health Insurance Portability and Accountability Act (HIPAA), which regulates the
use and disclosure of protected health care information in the United States
• The Australian Federal Privacy Act and the Canadian Personal Information Protection
and Electronics Documents Act (PIPEDA), which establish privacy protections in their
respective countries
• The European Union Data Privacy Directive and Directive on Privacy and Electronic
Communications provide protections for those living in EU member countries
• California State Bill (SB) 1386 requires business and government agencies to notify
victims living in California when personal private information is disclosed
• Bank of International Settlements’ BASEL II requirements cover reporting and
disclosures by financial institutions
• The U.S. Food and Drug Administration (FDA) 21 CFR Part 11 regulations govern
operations of pharmaceutical companies
• Federal Financial Institutions Examination Council (FFIEC) guidelines on business
continuity planning in financial institutions
• Federal Information Security Management Act (FISMA), established by the U.S. federal
government, to establish standards for information security within departments and
agencies of the U.S. federal government.
A number of conclusions can be drawn from examining this list:
• Regulations are defined by a range of governing bodies, from state-level governments,
such as California, to trans-national institutions, such as the EU
• Regulations are targeting both the integrity of business operations, as seen in SOX and
BASEL II, and the protection of individuals’ privacy, seen in California SB 1386 and the
EU’s privacy directives
• Regulations apply to a broad range of industries and governments; in some cases,
regulations are directed at specific industries (the FDA’s 21 CFR Part 11 regulations of
pharmaceuticals); in other cases, regulations are broadly applicable (such as SOX, which
applies to all public companies in the U.S. and FISMA, which is broadly applicable
across the U.S. federal government)
Clearly, compliance is a significant category when assessing the impact of risks; however, it is
not just the government that you must be concerned with when considering the impact of risks on
operations.

219
Chapter 10

Business Relationship Impact


The impacts of risks are not limited to an organization’s boundaries. The ripple effects of a
disruption in services can directly and indirectly affect business partners as well. Supply chains
now span multiple businesses, including manufacturers and service providers, and a loss of
continuity can affect all participants.
Consider, for example, a simple supply chain:
• An electronics manufacturer produces graphics co-processors for high-end scientific and
engineering applications. The market for this is limited, so production is done in
relatively small quantities.
• The electronics manufacturer uses a single parcel carrier to transport the chips to a
graphics card manufacturer.
• The graphics card manufacturer produces the high-end graphics cards, along with other
products, for use by several PC manufacturers.
• The graphics manufacturer ships components using two different parcel carriers.
• PC manufacturers order graphics cards from the graphics card manufacture in such a way
as to maintain a just-in-time inventory.
• The PC manufacturer ships complete systems to customers using multiple parcel carriers.
Now imagine that the parcel carrier used by the graphics co-processor manufacture is delayed
shipping components to the graphics card manufacturer. The graphics card manufacturer cannot
ship needed graphics cards, so PC manufactures are delayed in delivering completed systems,
which, in turn, causes customers to cancel orders.

Figure 10.8: Supply chains now expose organizations to the impact of operation disruptions from other
businesses.

Negative feedback can spread across the supply chain as the impact of unfilled orders, missed
opportunities, and long-term customer dissatisfaction becomes known.

220
Chapter 10

Customer Relationship Impact


In the past, the inner workings of a business were relatively opaque to customers. As long as
products arrived as ordered, bills were accurate, payments were processed correctly, and quality
service was maintained, customers would not necessarily care how vendors conducted their
operations. Although this is still true to some extent, the publicity around financial
mismanagement and improper reporting, such as at Enron and WorldCom, and the widespread
concern over security breaches and the disclosure of personal information, has changed the
customer relationship landscape.
Privacy regulations, such as California SB 1386 and the EU privacy directives, are requiring that
businesses and government agencies notify customers and citizens when private information is
improperly disclosed. There is also greater publicity about privacy breaches; well-known
incidents include:
• The ChoicePoint breach in 2005 in which personal information about 163,000 victims
was stolen
• A breach at the University of California, Los Angeles resulted in the theft of information
about 800,000 individuals, including current and former students, current and former
faculty members, as well as parents and applicants
• The theft of a notebook computer from the home of a U.S. Veterans Administration that
contained personal information about approximately 28.5 million veterans and spouses;
the notebook was later recovered and no data appears to have been compromised or
exploited

For a list of privacy breaches, see the Privacy Clearinghouse Chronology of Data Breaches at
http://www.privacyrights.org/ar/chrondatabreaches.htm.

Quantitative measurements of the impact of such breaches and the negative publicity are difficult
if not impossible; qualitative measures are the best that can be expected in such cases.
The impact of risks should be understood along several dimensions, including operational
impact, compliance impact, business relationship impact, and customer relations impact. This is
a fundamental aspect of risk analysis and without a thorough understanding of the range of
effects of different threats, you cannot accurately gauge and mitigate threats to the organization.

Summary
Risks are a constant in the realm of IT infrastructure management. Security risks are well
publicized and a wide range of countermeasures have been deployed to mitigate security risks.
Other types of threats to business continuity and integrity, such as natural disasters and
disruptions to supply chains, can also present risks to both short-term operations and long-term
strategic goals. The practice of risk management has evolved, providing the tools and techniques
to effectively and efficiently understand these threats. Of course, the ultimate goal is to mitigate
these threats, and risk analysis enables this goal even with limited resources.

221
Chapter 11

Chapter 11: Benefits of Mature Systems Management


Processes
The SOM model discussed throughout this guide touches on many aspects of IT infrastructure
management, from risk analysis and asset management to patch management and service
delivery. It has to; IT is a broad and varied discipline. Despite the variety of topics, a single
theme links them all—process management. The information systems that run businesses,
governments, and organizations long ago reached levels of complexity that could not be
managed with ad hoc approaches. Formalized processes and procedures, aligned with
organizational objectives, are the foundation upon which successful IT operations are built.
This chapter examines the benefits of mature systems management processes by examining two
related questions:
• How can a mature systems management model help control IT costs?
• What are the costs of not controlling IT operations?
Not surprisingly, the answers to these questions are as diverse and varied as the field of IT itself.
There is no simple answer to either of these questions, but the following pages will provide a
high-level overview that spans the breadth of the costs and benefits of mature systems
management processes.

Controlling IT Costs
“Do more with less” is something of a popular mantra in management circles, and less popularly,
with IT operations staff. As unpopular as it is with some, that four-word sentence captures the
driving business factors that are shaping how we implement and manage information services.
Consider how it translates into day-to-day operations:
• As employees and contractors leave, the remaining staff is expected to assume their
responsibilities
• Strategic plans—driven by market conditions, perceived opportunities, government
regulation, and other factors—create new requirements for IT services but not additional
funding for meeting those needs
• Internal customers’ expectations are increasing because they are exposed to rich
applications in other external environments, such as the Web

222
Chapter 11

The outcome of these pressures includes the need for IT managers to deftly reallocate resources,
leverage technologies in innovative ways, and constantly plan for change. To succeed, managers
need to focus on business fundamentals while adapting to the dynamics of information
technologies.
The fundamentals of controlling costs are the same in IT as any other part of an organization;
economics textbooks will tell you that there are labor costs and there are capital costs. What
those textbooks do not always tell you is what to do with those costs. To fill this knowledge gap,
let’s first divide the world of IT costs slightly differently than the most basic branch and consider
three types of costs:
• Labor
• Capital expenditure
• Operating costs
Let’s examine how mature systems management processes benefits each of these.

Labor Costs
Labor costs can make up a significant portion of an organization’s IT budget, and controlling
those costs while maintaining quality service levels can be a challenge. Of course, any manager
can cut staff and reduce bottom-line costs, but organizations need to maintain services, adapt to
new opportunities, and expand the range of services offered. Blindly cutting staff is a short-term
solution to a long-term problem. IT managers succeed when they consider the full range of issues
in staffing their operations:
• Cutting costs can mean reducing quality if reductions are not based on reorganization that
includes quality measures in decisions
• Restructuring often requires improved communications and reporting to support a
geographically dispersed workforce
• Automation can reduce labor costs and maintain quality of service (QoS) if workflows
are well understood and systems are implemented to accommodate those workflows

223
Chapter 11

The SOM model described throughout this guide can help reduce labor costs by making systems
management more efficient while maintaining and improving QoS. In particular, the SOM model
can support
• Automation of manual processes
• Cross functional skills and reallocation of resources
• Improved support services

Automation of Manual Processes


Given the level of sophistication of businesses’ IT infrastructure, it is surprising how many
manual processes are still required in some organizations. Some common labor-intensive
operations that can be automated include:
• Provisioning user access
• Patching and upgrading devices
• Inventorying devices
• Troubleshooting

Provisioning User Access


These tasks vary in the level of automation possible. Provisioning user accounts, for example,
can be highly automated. New users can create requests for access, managers can approve
requests, and then an automated process could create accounts, set authentication parameters,
establish authorizations based on roles, and notify the new user when the process has completed.
Figure 11.1 shows a typical workflow for this process.

With the exception of the initial request and the manager approval, the rest of the process is driven by
an established and automated workflow that is controlled by policies for authentication and
authorization.

224
Chapter 11

Figure 11.1: An example of user access provisioning workflow.

225
Chapter 11

Automation in this process provides several advantages. First, although the time required to
create user accounts may be relatively small, a large volume of transactions can result in
significant costs over time. With support for password resets, a provisioning system can further
reduce the cost of user access management.
A second benefit is improved quality control. The business rules governing user authorizations
can be lengthy and in some cases complex. For example, authorizations may be granted based on
employees’ department, roles in the organization, and the projects they work on. It is more
efficient to have an automated process querying a directory for user attributes and applying a
policy using those attributes than having a system administrator manually checking detailed
request tickets for specific details about system access. Consider: If a user had to specify which
systems he/she needed access to, the list might include:
• Local PC
• Shared network drives
• An email account
• Group calendar
• Employee self-service portal
• Department-specific applications
• Project-specific applications
• Position-specific applications
Each of these may have different authorizations. For example, employees in IT may be given
administrator or power user privileges for their workstations but others are not. Managers may
have access to a project management application but other staff does not. By defining policies
that specify authorization rules and applying them consistently with a workflow process, you
reduce the likelihood of errors.

226
Chapter 11

Patching and Upgrading Devices


In spite of efforts to standardize platforms, there will always be variations that must be
supported. An organization may standardize on Windows and run Windows XP on most
desktops. There are always exceptions though. Some are outside the scope of the normal device
management process. For example, an IT lab may have devices running beta versions of
Windows Vista. The IT staff working with these versions assumes responsibility for their
management. But there are cases in which operational devices have to run a non-standard
configuration:
• A legacy client-server application that will not run on an operating system (OS) later than
Windows 2000 (Win2K)
• A rich Internet application that requires a different version of the Java runtime
environment than the supported version
• A department application that is supported on Linux but not the distribution supported by
IT
There are also variations among categories of devices. For example, mobile devices may require
a virtual private network (VPN) client that is not needed on desktop devices.
Automation can reduce labor costs related to patching and upgrading by determining the current
patch level of devices, allowing administrators to install software to specific devices based on
current configurations, and, in general, reducing the average staff time required to perform
patching and upgrade operations.
Again, as with user provisioning, quality can be improved. An automated process can detect
failures in upgrades, roll back to previous configurations, and report the problem to a systems
administrator. This process allows administrators to quickly detect common patterns in failures
and revise the upgrade or patch scripts as necessary.

227
Chapter 11

Figure 11.2: Patching and upgrading workflow.

228
Chapter 11

Inventorying Assets
Tracking which devices are online is a fundamental operation; without accurate inventory data,
other operations, such as patch management, lease management, license allocation, and security
management, will produce suboptimal results, at best, and fail, at worst. In fact, inventory
management is the foundation of asset management and begins with discovery of both software
and hardware assets to populate the inventory.
As the number of devices in an inventory grows, the problem of tracking them obviously
becomes more difficult. But quantity is not the only problem.
Configurations can change quickly. New software may be installed on client devices, OS
configurations may change, and peripheral devices may be added to PCs and workstations. In
addition, reorganizations, mergers, and divestitures can create an inventory management
challenge because of the short time and large number of changes that can occur. Updating
inventory with a large number of changes in a short period of time while maintaining sufficient
quality controls is a task that can place a significant burden on IT staff. Again, automation can
result in significant cost savings by reducing the number of staff required to manage inventory.

Troubleshooting
Troubleshooting is more difficult to operate than other operations, but supporting services can be
automated resulting in reduced labor costs. Some troubleshooting problems are isolated to a
single device. For example, a user may notice an increase in the time required to open local files,
start desktop applications, and perform routine tasks. A review of the current configuration may
determine that the recently added applications are taxing the device resource and additional
memory is required. A Service desk technician may also notice differences in the configuration
from the standard configuration, which leads the technician to investigate the possibility of a
spyware or botnet infection. Having hardware and software configuration information from a
configuration management database (CMDB) can reduce troubleshooting times in such cases.
Other situations are more difficult to diagnose. For example, users of an enterprise application
may report slow performance. The application is a multi-layered system that includes:
• A Web client application
• A Web server
• A J2EE application server
• A messaging service
• A relational database
The slowdown could be caused by a problem in one of these components or in a combination.
Troubleshooting multi-layered applications requires coordination of developers, database
administrators, application administrators, and network support staff. This coordination is
facilitated if a configuration database is available that tracks information across platforms.

229
Chapter 11

Consider a potential problem with a critical system, such as a financial services application.
What if a single configuration item fails, what is the impact on system availability? Will an
essential business operation complete on time? Since the CMDB tracks configuration item
relationships that define the service, technicians can quickly evaluate the impact of a potential
failure and assess alternative solutions to work around the failure.
The potentially labor-intensive tasks—provisioning user access, patching and upgrading devices,
inventorying devices and troubleshooting—are examples of common IT operations that can
realize reduced costs if automated processes are in place. Another way the SOM model, coupled
with automation, can reduce labor costs is through the facilitation of the development of cross-
functional skills.

Cross-Functional Skills
IT professionals have come to expect frequent reallocation of staff as a strategic initiative
change. Along with reallocations come the understanding that more and more tasks are being
aggregated into fewer staff positions. This is part of the logic of improved productivity that is so
important to remaining competitive. An important corollary to the idea of consolidating
responsibilities is the need for cross-functional training.
Consider a systems administrator who had been responsible for managing a number of Linux
servers that supported Web servers and application servers. The administrator is then assigned
responsibility for a set of Windows XP servers used for network file shares. If this person is out
sick, on vacation, or quits, who will run these servers? It is not practical to have another person
on staff as backup. It is practical to cross-train others for the job.

Figure 11.3: Without cross-functional skills, dependencies develop on single individuals or small groups.

230
Chapter 11

The idea behind cross-training is that there is no dependency on a single individual to provide an
essential service. If one systems administrator is away, another should be able to fill in. The
problem is that the complexity of systems management makes it difficult to understand the depth
and breadth of a wide array of systems. A Linux administrator may be able to pick up UNIX
administrators’ duties pretty quickly, but the same might not be said for a Windows
administrator. Similarly, a Windows administrator familiar with supporting desktop devices may
not be familiar with the intricacies of managing Windows servers running SQL Server or
Microsoft Exchange.
The problem of maintaining adequate skill levels across multiple employees is reduced if low-
level, tedious, platform-specific tasks are automated, leaving the higher-level analysis and
management tasks to staff. For example, monitoring disk usage requires different commands
under Windows than under UNIX and, depending on the reporting requirements, can require
knowledge about specific parameters to command-line utilities. Rather than spending time
scanning UNIX manual pages for the right parameter, a systems administrator’s time is better
spent addressing the core tasks and ensuring adequate disk space. Both Windows and UNIX
administrators could perform basic monitoring tasks using a centralized management console
with information about the status of various servers.

Figure 11.4: Using asset and configuration management tools can facilitate cross-training by alleviating the
need to learn low-level, platform-specific details.

Using tools to perform low-level information gathering tasks is just one example of how systems
management support tools can facilitate cross-training, which, in turn, can improve the overall
quality of systems management and allow for consolidation of tasks across a smaller workforce.

231
Chapter 11

Improved Support Services


The responsibility of IT staff does not end in the server room or at the workstation. Even if
problems are addressed and systems are operating, if support customers are not satisfied, IT
management will hear about it.
Support services are a critical component of IT operations and timeliness and quality are
important elements of those services. This reality is evident in a number of ways:
• Response time to problem tickets
• Wait times when calling the Service desk
• The time required to troubleshoot and resolve problems
• System downtime
• Service provisioning
Often users will not care how a problem is resolved. What is important is that the problem is
solved. If a user’s workstation is unavailable because it has been infected with a botnet and a
rootkit, how long will it take to service it? The user probably does not care about the details and
challenges of detecting and removing rootkits, the user just wants the workstation back up and
running with all of the previously available applications and data.
One way to clean the computer is to run anti-malware software to detect and remove as many
rogue pieces of software as possible. Next, a systems administrator would have to review registry
settings and manually check for hidden rootkit components—a time-consuming process that
requires specialized skills. Alternatively, the systems administrator could format the OS drive,
re-install the OS, and restore data files. This is a preferable option, assuming an automated
backup process is in place to make copies of data files and a ghost image of the clean version of
the OS and device-specific software is available.
The ability to roll back to a known-good configuration and restore users’ data is just one example
of how automated systems management processes can improve service support. Automating
manual processes, supporting cross-functional training, and improving support services can
reduce labor costs within IT departments. Improved support services can extend the reach of
labor cost savings to user departments as well by reducing downtime and ensuring that
operational systems are available when needed. Another area of potential savings is capital
expenditures.

232
Chapter 11

Capital Expenditures
Capital expenditures are expenses to acquire or improve long-term assets. These can include:
• Disk arrays
• High-end servers
• Enterprise applications
• Network devices
In budgeting, capital expenditures are often treated separately from operational expenses. Capital
expenditures warrant detailed analysis because they are costly and commit the organization to a
long-term investment. A formal, mature systems management model can help with capital
expenditures in two ways:
• Improved asset management
• Decision support reporting

Improved Asset Management


After a capital asset has been acquired, management will expect that the asset is utilized as much
as possible to realize the maximum return on the investment (ROI). IT capital assets are complex
devices and how to get the most from them is not always obvious. Some capital assets, such as
buildings and manufacturing equipment, are easy to assess. Is the building full? Is the
manufacturing equipment producing its intended products? Measuring the utilization of IT
equipment is not always so apparent:
• Is a firewall fully utilized if some features are not configured properly?
• Is a content filtering appliance fully utilized if it has only basic policies?
• Are software licenses fully utilized if some use the applications infrequently?
These questions demonstrate the problem of measuring the utilization of IT capital assets—the
systems are so complex that there is not a single measure that can distill the relative value of an
asset at some point in time. Rather a combination of measures is needed. An asset management
system that includes information about both hardware and software utilization can help measure
the utility of capital assets.

233
Chapter 11

Hardware Asset Management


Capital expenditure management is not just about purchasing new equipment; it is also about
managing existing assets. With the dynamics of today’s organizations, it is likely that an asset
will be reassigned to uses other than originally planned. Asset management systems can help
with this process.
For example, a high-end server may have been allocated to a department with a large number of
employees who used a custom application for its work. The department has since been
restructured and the custom application replaced with an online service. Is the high-end server
still required by that department? Could it be reallocated for another use, perhaps eliminating the
need to purchase another server? Answering these questions requires information about the
utilization of the server, the number of registered users, and planned projects in the department’s
pipeline that might make use of that server. Some of these questions can be answered with the
right asset management and performance management data.
Sound hardware asset management practices are essential for organizations that lease hardware
as well as those that purchase it. Leasing is a common method for reducing costs, but the practice
introduces additional tasks, such as managing lease returns, which can be well served by
centralized asset management procedures. Of course, not all capital assets are hardware.

Software Asset Management


Better software management is also an important aspect of capital expenditure management.
Effective software management practices are important because software licenses are
abstractions that do not take up room in a server rack or on a user’s desktop. A single copy of an
application may be installed multiple times, sometimes in compliance with license agreements
and sometimes not. Tracking both licenses and software installations are important parts of asset
management.
Automated tracking of software assets has several advantages:
• Better allocation of software licenses
• Providing information that can potentially lead to volume licensing discounts
• Decreased support costs due to better information for Service desk staff
• Improved security with the ability to rapidly identify instances of vulnerable applications
• Better license compliance
• Develop hardware refresh cycles to help predict purchasing needs
• Understand application usage—are there enterprise applications that only require read-
only licenses?
Of course, a software asset management system can also provide information needed to justify
additional software purchases when needed. This falls under the other broad benefit of capital
expenditure management, improved decision support reporting.

234
Chapter 11

Decision Support Reporting


When planning and budgeting for capital expenditures, you need accurate information about
existing assets, how they are utilized, and expected changes in demand for services provided by
those assets. A framework for systems management, such as SOM, can address these needs by
answering questions such as:
• What hardware in the inventory meets the requirements of a proposed project?
• How is that hardware allocated?
• Is any of the hardware underutilized?
• If so, can it be replaced by lower-end hardware, freeing the higher-end hardware for the
proposed project?
• If not and new hardware must be acquired, are ancillary resources, such as disk arrays, in
place and do they have sufficient capacity to support the new project?
Many of the same details that can be used in operations management are also useful for long-
term asset management; furthermore, many of the advantages found in capital asset management
have parallels in operational aspects.

Operating Costs
The last type of cost in IT is operating costs, or the cost of running the IT department on a day-
to-day basis. Labor costs are typically considered part of operating costs, but in this discussion,
labor costs have been treated separately. This section deals primarily with the remaining types of
operating costs. Specifically, this section examines how a mature systems management
framework can improve cost controls by improving several areas:
• Management reporting
• Allocation of resources
• Predictability of operations
• License management
• Security posture

235
Chapter 11

Improved Management Reporting


IT management reporting for systems management can be boiled down to three simple questions:
• What do we own?
• Where is it?
• How is it being used
• How much is it costing?
Everyone running an IT department needs to answer these questions, but how they are answered
is, in part, a function of how IT operations are managed.
A common problem with management reporting is that silos of management form within
organizations. Responsibility and control of operations need to be divided among managers, and
how it is divided is somewhat arbitrary. For example, one business might divide along OSs with
one group responsible for Windows systems, another for UNIX/Linux systems, and yet another
for mainframe devices. In other cases, the division may be along functional lines with client
devices managed by one group, Web servers and application servers are managed by another,
and database servers are under the control of a third group. There are good arguments for all of
these arrangements and one is not necessarily better than the others; however, they all suffer
from the same potential pitfall: silos of management.

Figure 11.5: Silos of management have advantages but can make management reporting more difficult than it
needs to be.

236
Chapter 11

A restructuring is likely to lead to different silos without actually solving the management
reporting problem (something of a “rearranging the deck chairs on the Titanic” solution). A
better option is to use a centralized configuration database that can collect and manage
information about assets across organizational boundaries. This option has several advantages:
• It is independent of organizational and management structure
• It allows for consolidated reporting
• Reports are consistent across management domains
• More in-depth analysis, such as dependencies between systems and resources, is possible
Figure 11.6 shows an example of the types of information that can be collected and managed
within a consolidated centralized management database.

Figure 11.6: A centralized configuration management database can collect and maintain information about
assets across organizational boundaries and support improved management reporting.

Improved Allocation of Resources


Allocating resources is fundamentally a problem of getting the right device to the right place at
the right time. This, in turn, becomes a problem of understanding the
• Inventory of assets
• Needs of particular users, groups, and applications
• Total life cycle cost of assets
• Relative value of services rendered by allocating particular resources to particular needs

237
Chapter 11

Figure 11.7: Knowledge of the types of assets in the inventory is the first step to optimizing the allocation of
resources.

Understanding the value of particular operations, such as improving services in customer


relations versus deploying additional servers to a database cluster in finance, is a business
decision that expands beyond systems management. It does, however, require the kind of data
that systems managers can provide:
• Cost of procuring the assets
• Cost of maintaining the assets
• Cost of end-user support for assets deployed to particular uses
• Cost of depreciation
• Cost of disposal

238
Chapter 11

A centralized management model that includes inventory and cost information can greatly
facilitate the financial analysis that must be done to optimally allocate resources. Much of the
same data that is used for optimizing the allocation of resources is also useful for predicting time
requirements and levels of effort required for systems management operations.

Improved Predictability of Operations


The saying “Those who do not remember the past are condemned to repeat it” has a corollary in
the IT realm: “Those who do not measure the past are condemned to repeat it without guidance.”
Consider some of the common and repetitive tasks that IT operations have to contend with:
• Patching applications
• Upgrading OSs
• Installing applications
• Resetting passwords
• Installing and configuring client devices
• Hardware refresh cycles
Managers are constantly planning for these kinds of operations. To do so successfully, they
require raw data that can answer questions such as:
• What is the average time to push a Microsoft Office patch to remote users?
• What percentage of OS upgrades failed on notebooks?
• What is the average time required by Service desk staff to reset passwords?
• How many mobile devices still have to be patched with a security update?
Again, a centralized repository of asset and patch management information can provide the raw
data needed to answer these questions. Although benchmarks and industry standards are good
guides for planning and budgeting purposes, when it comes to day-to-day operations in which
the margin between staying on budget and having an overrun is thin, having detailed information
about past performance is essential. Another area in which detail management can directly
impact the bottom line is with software license management.

239
Chapter 11

Improved License Management


Managing licenses is a high-value proposition. It allows IT departments to show quick return on
investment (ROI) and adeptly reallocate licenses as business needs change. At the same time,
neglecting license management leaves an organization liable for violation of contracts if more
copies of software are in use than are licensed.

Figure 11.8: Software license management can track usage against licenses and help administrators remain
in compliance with contractual agreements.

Improved Security Posture


Proper systems management practices can help to improve and then maintain a sufficiently
secure environment. Systems management supports network and device security in several ways:
• Patching applications and OSs
• Maintaining proper configurations
• Monitoring activities and resource utilization
• Establishing and enforcing policies for network and system access
• Ensuring business continuity through backup and recovery procedures

240
Chapter 11

These management areas are important to security because, as is well known, information
security requires multiple layers of defense to mitigate the potential for a “weakest link”
problem. For example, if a network worm is exploiting a known OS vulnerability and the only
defense is the antivirus software running on desktops, any problems with that antivirus program
could result in infection. It is not difficult to imagine, for example, a notebook without updated
antivirus signatures that would miss detecting the worm and leave the notebook vulnerable. A
comprehensive systems management program can mitigate this type of potential problem by
• Ensuring OS patches are up to date
• Allowing systems administrators to quickly identify devices that do not have up-to-date
antivirus programs
• Providing better reporting on system accounts, increasing the chances of detecting
unauthorized accounts
• Supporting the enforcement of least privileges, so if a system is compromised, the
processes running on that device cannot do widespread damage
Let’s examine the problem of systemic vulnerabilities. Vulnerabilities are weaknesses in systems
that can be exploited to compromise the integrity, confidentiality, or availability of a system.
Vulnerabilities are created by
• Errors in applications
• Incorrect configurations
• Deficiencies in procedures
All these potential sources of vulnerabilities can be compensated for by proper systems
management (at least to some degree). Patch management is especially important with the first
problem, errors in applications. Applications today are increasingly complex, they are deployed
on a variety of platforms, and they are often designed and developed under tight deadlines that
leave too little time for comprehensive testing. The result is that serious flaws creep into the
software and are eventually deployed across enterprise IT systems.
Vendors regularly patch software. Microsoft, for example, has regular monthly updates. Other
large vendors, such as Oracle, use a quarterly schedule. Of course, high-risk vulnerabilities may
be corrected outside of this schedule. These types of regular updates allow systems
administrators to plan for updates so that patching does not have to be an ad hoc, disruptive
process.

Zero-day vulnerabilities are particularly problematic because they are unknown to vendors and
customers until attackers or malware developers exploit them. By definition, there are no patches for
zero-day vulnerabilities when they are exploited. This is one of the reasons that defense-in-depth
security strategies are so important. No one security method, such as patching, is effective all the
time against all threats. Only by combining multiple countermeasures can an organization achieve
reasonable levels of security.

241
Chapter 11

Security professionals often advocate defense-in-depth strategies. This advocacy should not be
misconstrued as a call to simply implement more security applications, such as firewalls,
antivirus solutions, content filters, intrusion prevention systems (IPSs), and a host of other tools.
You certainly need those, no question—but you also need sound systems management.
A network fully loaded with the latest security countermeasures will not be secure if the network
devices and servers are misconfigured, if client devices are not patched, if applications are not
using authentication and authorization mechanisms, or if tested backup and recovery procedures
are not in place.
The benefits of a methodical systems management approach touch numerous parts of IT
management, from the allocation of resources and the predictability of operations to improved
software license management and systems security. Just as any coin has two sides, so does the
story of IT costs and systems management. The other side is the cost of not properly managing
systems operations.

Cost of Not Controlling IT


The majority of this chapter has been dedicated to describing the benefits of a comprehensive
approach to systems management, but a brief examination of the consequences of not following
such a regimen can also be enlightening. There are at least four areas in which poor systems
management can have a direct impact on the bottom line of an organization:
• Compliance
• Loss of system integrity and availability
• Loss of confidential and private information
• Loss of service and business disruption
The relative importance of each of these will vary by industry and market, but they can all have
substantial impact on some groups of IT operations.

Compliance
Regulatory compliance is something we have all come to expect and live with. Some regulations
are broadly applicable to a large number of organizations. The Sarbanes-Oxley Act, for example,
requires adequate controls on IT operations to ensure the integrity of financial reporting of all
companies publicly traded in the United States. Businesses are not the only ones subject to
regulation: governments establish regulations for themselves as well. The Federal Information
Security Act (FISMA) defines security requirements for U.S. federal agencies and departments.
Some regulations targeting particular industries worth noting are:
• Health Insurance Portability and Accountability Act (HIPAA)—health care
• 21 CFR Part 11—pharmaceuticals
• FISMA—U. S. federal government
• Gramm-Leach-Bliley Act—financial services

242
Chapter 11

When considering the impact of compliance, consider (at least) two parts: the initial fines and
other costs of a violation and the cost of cascading violations. For example, a violation of
HIPAA can result in stiff fines when protected health care information is disclosed. However, a
violation of a state’s privacy statue can result in fines and may trigger the violation of another
federal regulation, such are the Gramm-Leach-Bliley Act, which results in additional fines.
Effective systems management practices will not guarantee that an organization is in compliance
but can provide the tools and management reporting necessary to get into compliance and
demonstrate that compliance.

Loss of System Integrity and Availability


Another general downside of poor systems management is that information could be
compromised or systems may be unavailable. System integrity is compromised in several ways,
including when
• Data is tampered with and changed in unauthorized ways
• Applications and OSs do not function properly because dependencies are not understood,
managed, and maintained
• Applications do not function properly because updates and patches are not applied
properly
Closely related to system integrity is system availability. In this case, potential problems with
availability stem from:
• System downtime due to virus, Denial of Service (DoS), or other attacks that may have
been mitigated by proper systems management and security procedures
• Unstable applications that crash because of misconfiguration
• Insufficient performance because growth in use trends have not been monitored and
insufficient resources are in place to accommodate demand
A well-managed systems administration program will not guarantee that systems never crash or
that performance will degrade, but it does mitigate the possibility of those problems.

Loss of Confidential and Private Information


Losing confidential and proprietary information can be costly. In the case of confidential
information, especially personally identifying data, the loss can lead to violations of regulations.
Governments from the state to the national and transnational level are establishing privacy
protections for their citizens. A combination of security measures and supporting systems
management services can again, as with the loss of integrity and availability, mitigate the worst
impacts of such a potential threat.

243
Chapter 11

Business Disruption
Yet another factor to consider when determining the cost of systems management is the potential
for business disruption. When information systems are down, the impact can be widespread,
shutting down day-to-day operations as well as adversely impacting management operations.
Often businesses will invest in backup solutions, offsite facilities, and other measures in case of
disaster. The transition from primary to backup systems can be difficult in the best situations, but
without proper planning and management, they may be impossible to implement without adverse
consequences.
Clearly, there are costs associated with implementing comprehensive systems management
models such as the SOM model. There are, however, even greater potential costs for not
implementing such models.

Summary
The benefits of mature systems management practices are well known. Labor, capital
expenditure, and operational costs all benefit from such practices. In the case of labor, the
automation of manual tasks, improved cross-functional training, and improved service support
follow. Capital expenditures benefit from better reporting and decision support. Day-to-day
operations benefit in several ways ranging from better allocation of resources and license
management to improved security and operational predictability. Finally, consider the cost of not
leveraging systems management best practices, which, in addition to the lost opportunity for
improvement, brings costs all its own.

244
Chapter 12

Chapter 12: Roadmap to Implementing Service-Oriented


Systems Management Services
Service-oriented management is the platform for managing systems management functions
across the diverse and wide-ranging needs of today’s enterprises. The platform takes a function-
rather than device-specific focus for several reasons:
• The need to stay aligned with business objectives requires an agile management structure
• Demands on IT management, such as compliance, apply to IT services not to specific
devices
• Devices accessing enterprise resources may be managed (owned by the enterprise), semi-
managed (owned by employees but subject to some IT policies, such as smartphones), or
unmanaged, such as public kiosks and customer PCs that access Internet-accessible
services
The evolution of service-oriented management is driven by the demands placed on IT managers
and systems administrators; some of the most prevalent are:
• Responding to new market opportunities
• Reducing IT operations costs
• Sharing information and assets with business partners
• Making network resources accessible from remote locations
• Providing more services to customers online
• Meeting emerging requirements of auditors and regulators
Each of these drivers is a bridge point between business and IT operations. None of these are
exclusively business or technical. The divisions between the technical and non-technical (or
business) sides are fast becoming a legacy of earlier times. This chapter provides a roadmap for
implementing service-oriented management by examining four topics:
• Limits of traditional management models in light of emerging challenges in systems
management
• Current status of IT operations
• Transition to mature service model
• Implementation of a service model for systems management
As noted earlier, evolving demands on IT are bringing IT and business operations closer than
they have been in the past.

245
Chapter 12

Limits of Traditional Management Models and Emerging Challenges


Some may wonder why systems management practices need to change, but the fact is that the
demands on IT are evolving just as much as the underlying technology.

Demise of Device-Centric Systems Management


Traditional management models that focused on fixed functions or OSs for devices may not fit
neatly with the way devices are deployed and reused in today’s environments. For example, there
was a time when it might make sense to divide management duties for specific devices based on
the types of OS deployed. This might include:
• Windows desktops, running Windows 98 or other desktop-only system
• Windows servers running email and network file servers
• Linux servers that ran Web servers, ftp servers, and other low-demand systems
• UNIX servers for high-end operations, such as database servers
Today, similar Windows systems run from notebooks to servers with dual-core processors just as
likely in desktops as servers. Linux servers are now just as likely to host full relational database
systems as UNIX platforms are. And devices are not even limited to single OSs—virtualization
allows multiple different OSs running concurrently on a single machine.
The management tasks associated with the range of devices is also consolidating into several
well-understood areas:
• Procuring
• Deploying
• Patching
• Monitoring
• Securing
• Retiring
The kinds of information that must be tracked and operations performed do not vary by operating
system or hardware manufacturer.

246
Chapter 12

Example Benefits of Service-Oriented Management


With this evolution of systems management practices comes some beneficial new practices. For
example, you can now effectively track virtually all IT hardware assets in a configuration
management database (CMDB). A single logical repository can provide more efficient access to
information about the state of an organization’s infrastructure than if the information were along
the lines of platform type or assigned managers.

Figure 12.1: CMDBs provide the means to track virtually all IT assets.

A CMDB captures four types of information:


• Standards and baseline
• Technical
• Ownership
• Relationship
Technical details vary by the type of configuration item. For example, hardware technical data
can include model numbers, firmware versions, storage capacity, and other physical attributes.
Software applications may include version numbers, patch levels, and administration
documentation.

247
Chapter 12

Ownership information tracks the organizational dimension of a configuration item. This can
include the business unit responsible for the services provided by a configuration item. For
example, the finance department may be the owner of a server and enterprise resource planning
(ERP) application. Ownership may be distinct from responsibility, which should also be tracked
in a CMDB. A group within the IT department may have responsibility for the application server
owned by the finance department in the previous example.
Relationship data describes how configuration items depend on one another or are used together.
For example, a UNIX server may depend on a particular router in the network.
Take the case of a newly discovered piece of malware that exploits a vulnerability in a
commonly used code library. Which devices in the organization are using that library? Of all the
vulnerable devices, which are running mission-critical operations? Which are on mobile devices
that may not be connected to the network and may not receive the patch when pushed to devices
by the patch management system?
Compliance is forcing a new regimen on IT operations. More controls are now required to ensure
that devices are configured properly and patched appropriately. It is not uncommon to establish
minimum security requirements for any device connecting to the network. Notebooks, for
example, may be required to run anti-malware and personal firewalls. If these services are not
available, the device is not granted access to the network. How is this enforced?
Policy management solutions and access control devices have to be coordinated to ensure any
device accessing the network is in compliance. A single policy can apply to multiple devices and
devices may be subject to multiple policies. The responsibilities of systems managers are
growing rapidly and automation is essential to keeping up with these changes. If automated
systems management solutions are not in place, or partially in place, the first step is to assess the
state of IT management practices.

Roadmap Step 1: Assessing the Current Status of IT Practices


IT practices are so wed to the particulars of the technology deployed in an organization that it is
easy to underestimate the importance of the business drivers behind IT operations. For this
reason, the first step in preparing for the move to a service-oriented model of systems
management is to understand the following:
• Overall business strategy and goals
• The state of IT alignment with business strategy
• The risk tolerance of the organization
The purpose of this step is to ensure that the technical decision making in later stages is done in a
way that supports the broad objectives of the organization.

248
Chapter 12

Overall Business Strategy and Goals


This may sound obvious, but then again, starting at the beginning always does. The first step in
the move to service-oriented strategies is understanding the strategy and goals of the
organization. Of course, if the organization is a commercial entity, the goal is to make a profit
and increase shareholder value; if the organization is a government agency, the goal is to serve
the agency’s constituency. That is the obvious part; the less obvious part is how to do it.
Executive management sets goals and strategies that answer the “how” question. For example, a
company may have the goal of increasing market share in a particular region and will do it by
improving customer service. An agency may decide that it will improve service to its
constituency by reducing the cost of delivering three high-volume service transactions. These are
the kinds of goals that IT manager can take as guides to formulating IT plans; they are in essence
the functional goals that IT, in conjunction with other departments, must deliver. An important
aspect of this process is prioritizing support offerings, controlling support costs, and in some
cases, outsourcing low-end priorities. Making sure IT managers understand functional goals and
keeping IT operations in sync with those goals is an ongoing process.

IT Alignment and Business Strategy


Clear definition of business goals coupled with communication about those goals is the basis of
IT alignment with business strategy. Some challenges with keeping IT and executive direction in
sync include:
• Bridging the business and technical parts of an organization due to difficulties translating
from business goals to technical implementations
• Including IT in planning changes in business direction; for example, a shifting emphasis
to new markets, product lines, or business models
• Keeping business managers aware of changes in technical implementations, such as
resource limitations, development cycles, and changes to delivery schedules of new
systems or modifications to existing systems
Ironically, it is often not technical problems but communication problems that can cause the
greatest difficulties at this level. However, one of the advantages of a service-oriented
management model is that operations are managed at a higher level of technical abstraction that
more easily aligns to business operations. This can contribute to more effective communications
between business managers who might describe goals and problems in terms of services and
technical managers who can now measure and control operations in terms of those same
services.
Bridging the gap between business and technology concerns and objectives requires a sound
understanding of both realms. Meeting business requirements often requires staffing IT operation
with business analysts along with technical professionals.
The final piece of the first step of the roadmap is assessing the risk tolerance of the organization.
As Figure 12.2 shows, goals and executive strategies may drive IT operations, but all these
decisions are made within a particular and organization-specific environment for risk and risk
tolerance.

249
Chapter 12

Figure 12.2: Risk tolerance is part of the background in which all business and technical decisions are made.

Risk Tolerance of the Organization


Well-formulated goals and strategies for achieving those goals are necessary but not sufficient to
guide IT decision making; another required piece is an understanding for the risk tolerance of the
organization. No strategy can guarantee success—there will always be unknowns and
uncontrollable events that thwart best efforts. Take for example, a large-scale migration to
Microsoft Vista. Although best practices, such as the Business Desktop Deployment guidelines,
can help with the planning and assessment, the potential for problems still exist. Risks include
operational downtime due to software incompatibilities, insufficient hardware, errors in
establishing access controls, and insufficient service desk resources.

For more on Vista migration, see the Microsoft Desktop Deployment Center and the Altiris Vista
Resource Center.

250
Chapter 12

A formal risk analysis can identify the theoretical cost and benefits of risks and risk mitigation
strategies. Of course, executives and managers will try to mitigate these risks but there are limits
to these efforts:
• Financial constraints
• Time constraints
• Unknown factors
• Unknown frequencies of risks
• Technical limitations
• Resource constraints
Understanding these limitations and working around them must be guided by the organization’s
tolerance for risk.

Financial Constraints
Managers have limited resources for dealing with risks and choices will often have to be made
between mitigation strategies. For example, should funds be invested in a new higher-capacity
backup system or should those same funds be used to upgrade network security? Both are
arguably essential to maintaining business operations, but there may be funds for only one.

Investment options can be measured in several ways:


● Return on investment (ROI)
● Internal rate of return (IRR)
● Net present value (NPV)
● Payback period
Alternatively, several methods may be used in combination, much as with the balanced scorecard
model.

Time Constraints
Time constraints are also a factor. One may have the funds and staff with the technical skills to
address a problem but not the time. If a company acquires another firm with poorly designed
network architecture, should the new resources be redeployed following the company’s
architecture? Ideally yes, but it may require pulling senior systems administrators and network
managers away from other high-priority projects.

251
Chapter 12

Unknown Factors
There is little one can do about unknown factors except to plan in terms of broad generalities.
Natural disasters, security breaches, and systems failures are broad risks but you will never be
able to plan in detail for all types or understand the impact of all possible instances of these risks.
Another class of unknowns is the impact of risks. A fire that destroys a computer center is easily
quantified. The cost of a data loss incident is not so clear, but key factors include:
• Diminished brand value
• Loss of customer loyalty
• Fines and other compliance costs
In addition to these kinds of unknowns, another group of unknowns add to risk assessment
difficulties.

Unknown Frequencies of Risks


Weighing the impact of risks requires understanding the expected frequency of those risks
occurring. For example, if an IT center is built on a 50-year flood plain, one can estimate the
expected cost of the risk associated with floods.
Unfortunately, not all risks are so well understood. For example, there is little historical evidence
to estimate the probability of a successful breach of database security and the theft of customer
financial information. There is anecdotal evidence from isolated incidents but without measures
of the full breadth of breaches and details of each breach, it is difficult to assess the frequency.
In such situations, the risk tolerance of an organization is the best guide. Are executives willing
to invest $100,000 in new security measures to prevent a database breach? How about
$1,000,000? Defensive investments can be difficult to justify because there is always the chance
that they will never be required.

Technical Limitations
For some risks, you simply do not have adequate mitigating solutions. Information security has
always been a matter of responding to emerging threats that are motivated to circumvent existing
countermeasures. Some of the best methods for dealing with known risks impose unacceptable
limitations. For example, Windows Vista has been designed with improved security measures
but some existing software will not function under these new security measures. Users have a
choice to not run these applications or to run them with elevated privileges that provide similar
access to earlier versions of the Windows OS. As Figure 12.3 shows, what is desired and what is
achievable can be vastly different because of the constraints facing the organization.

252
Chapter 12

Figure 12.3: A variety of constraints limit an organization’s ability to reach the ideal level of risk mitigation.

Resource Constraints
Another constraint is the availability of resources, especially staff with sufficient skill sets.
Again, planning can mitigate some of these risks but there is always the potential for a key
person to leave a project at a critical time.

Responding to Risks
Once risks have been identified, an organization can respond to those risks in one of three ways:
• Accept the risk
• Mitigate the risk
• Transfer the risk
Accepting the risk means the organization understands the risk, has evaluated the potential costs
of the risk as well as the costs and benefits of deploying countermeasures to the risk but has
decided not to take any steps to reduce the risk. At first glance, this may sound somewhat
irresponsible, but this is often a reasonable strategy. For example, if a data center is in a 100-year
flood plain, a company may decide that moving the operation or deploying flood controls
outweighs the benefits; accepting the risk is then a reasonable strategy.

253
Chapter 12

Mitigating the risk means that countermeasures are taken to reduce the risk. You use risk
mitigation strategies constantly, although you may not think of them as such. Consider the
following:
• Deploying anti-malware on PCs
• Implementing content filtering on network traffic
• Establishing acceptable use policies for IT equipment
• Using clusters of computers instead of a single server for a mission-critical applications
• Conducting code reviews on custom-developed applications
• Using project management best practices
These are all examples of risk mitigation measures. Some of these, such as deploying anti-
malware programs, are obviously done to reduce a well-known risk. Others, such as project
management best practices, are not solely risk mitigation measures although it is a key proactive
risk mitigation technique. In the case of project management, the best practices reduce the risk of
cost overruns and delay of deliverables. Risk mitigation does not eliminate risks; that is not
possible. Instead, the goal is to reduce the risks as much as possible using reasonable resources.
The final option for dealing with risk is to transfer it. This means an organization purchases
insurance so that in the event the risk is realized, the insurance company bears the cost of the
risk. Like risk mitigation, risk transfer is appropriate in a variety of circumstance and its use will
depend on the balance of cost and benefits.
At the conclusion of step one, an organization should have an understanding of business
objectives and how IT can serve those objectives. At the same time, these steps provide some
perspective on risks and the ability to mitigate those risks. The next step is specific planning for a
move to a service-oriented model.

Roadmap Step 2: Planning Transition to Mature Service Model


The first step in the roadmap to implementing a service-oriented management model is largely
preparatory. It is akin to deciding where you want to go before you get in the car and start
driving. In step 2, you are still not driving but are planning how to get where you have decided to
go. The planning process consists of three core operations:
• Prioritizing needs
• Building a central management foundation
• Optimizing policies and procedures
• Creating an alternative, backup plan
Let’s begin with a review of the landscape of services that must be provided.

254
Chapter 12

Prioritizing Needs
Service-oriented management and systems management in general encompass a wide range of
operations and services. The first part of the planning process is to understand which of these
operations and services are the most important; common among top priorities are:
• Acquiring devices and applications
• Deploying devices and applications
• Providing service desk support
• Ensuring asset management
• Maintaining systems availability
• Monitoring systems
• Auditing and compliance reporting
• Developing applications
• Securing databases and hosts
• Enforcing policies
• Improving quality controls on IT procedures
• Ensuring application compatibility
Each of these could justifiably be considered top priorities depending on the circumstances.
There is no single right answer to the question, “Where should we start?” Rather than try to force
a one-size-fits-all answer to that question, it may be more useful to examine a few scenarios to
see how varying circumstances shift priorities. These will include:
• A new business without an existing systems management structure
• A company that has recently acquired another firm
• A company in a highly regulated market
Again, the goal is not to provide a black-and-white decision-making procedure for how to
proceed with prioritizing needs but to show some examples of the kinds of questions and issues
that may influence the prioritization process.

255
Chapter 12

New Business
Consider a new business that is started to provide online services to manufacturers. The services
are delivered through a combination of onsite consulting and online support through a customer
portal. (The details of the service are not important at this point.) The characteristics of the
market are:
• Relatively few compliance requirements because the customers are not in financial
services, healthcare, or another highly regulated area
• The company is privately held so the Sarbanes-Oxley Act (SOX) does not apply
• The market is competitive and customers can easily switch providers, so developing
customer loyalty is important
• Consultants and sales staff will need full access to IT resources from remote locations
• Customers will need access to the customer portal application but customer data should
be segregated so that customers can access only their own data
• Customers expect high availability of the customer portal
Given this set of requirements, high-priority operations include:
• Acquiring devices and applications
• Proving service desk support
• Maintaining systems availability
• Developing applications
• Securing databases and hosts
A new business will of course need to acquire devices and applications, so managing that process
well from the beginning is important. Also, as customer loyalty is so important in this market,
service desk support will be a top priority as well. Supporting application development, system
availability, and database and host security are the kinds of operations customers will not see
directly but are fundamental to delivering services that are at the front lines of the business.

256
Chapter 12

Post-Merger Organization
When two organizations merge, there are often plenty of technical issues to resolve. Integrating
network architectures, databases, and applications requires knowledge of low-level details and
careful planning. Once the systems are integrated, systems managers will have to apply
management procedures consistently across all devices regardless of how they were managed in
the past. In this scenario, some of the most important operations are:
• Asset management
• Service desk support
• Databases and host security
• Policy enforcement
• Improved quality controls on IT procedures
• Configuration management
One of the first challenges to address in a post-merger situation is compiling an accurate
inventory. You cannot manage a device if you do not know you have it or if you do not know
where it is or what kinds of applications are running on it. Asset management is one of the top
few priorities in a post-merger environment.
Mergers can be disruptive to existing operations, so service desk support can be critical to
maintaining operations and efficiency. Disruptions and changes in network architecture can
introduce new and unforeseen security vulnerabilities. There is also the chance that existing
patch management operations are disrupted during a merger that in turn perpetuate existing
vulnerabilities. Another key area is to ensure policies are enforced and procedures continue to be
carried out across newly acquired assets. The priorities in organizations going through less-
disruptive changes are somewhat different.

Highly Regulated Organization


Highly regulated organizations, by definition, are subject to an array of compliance
requirements. Typically, regulations require that organizations establish and follow certain
policies and procedures as well as be able to demonstrate that these policies and procedures are
in force. Simply put, they not only have to comply but must be able to prove they comply. For
these reasons, some of the top priority areas for such an organization are:
• Asset management
• Systems monitoring
• Auditing and compliance reporting
• Databases and host security
• Policy enforcement
• Improved quality controls on IT procedures

257
Chapter 12

Asset management is a fundamental service that enables several others. Having detailed
information about the location, configuration, and status of all devices in the organization is the
basis for reporting on them and demonstrating that they are in compliance. Asset management
covers the full life cycle of hardware management, from procurement to disposal. Monitoring
systems is another part of maintaining compliance because it is an early warning procedure that
can help detect and control security breaches as well as other problems that can disrupt
operations.
Auditing and compliance reporting are obviously necessary in this situation. Auditing involves
more than the annual review by external auditors. Continuous monitoring and auditing of key
events, such as failed access attempts, changes to deployed code, and configuration
modifications should be logged and reviewed regularly.
Some of the most high-profile security breaches have involved the theft of information from
databases. Part of database security is maintained with the database system itself, but much of
that depends upon a secure host. Systems managers play a key role in securing databases by
hardening host OSs and regularly monitoring the device for signs of security problems.
Establishing polices that meet auditor expectations can be challenging enough but ensuring those
policies are enforced at all times in all applicable cases brings its own host of difficulties.
Policies, for example, are platform neutral and must be enforced regardless of the device
performing an operation. Consider the process of accessing a customer financial record; this
could occur from:
• A desktop PC used by a customer support representative
• A batch process that runs from a central server updating account information on a regular
basis
• A notebook used by an analyst investigating a problem with the account
• A smartphone, which combines cell phone and PDA functionality, used by the customer
to transfer funds between accounts while traveling
Effective policy enforcement requires a combination of thorough planning and automation to
ensure that all use cases are accommodated. This relates to the final high-priority need,
improving quality controls on IT procedures. Monitoring operations is necessary but it may
disclose weaknesses in some areas. For this reason, it is important to be able to measure the
performance of IT operations, especially as it relates to compliance-oriented policies and
procedures. Management reporting on operational procedures can help isolate problem areas and
measure the effectiveness of various remediation plans so that procedures eventually meet
expectations.
There is no absolute ordering of priorities that applies equally well to all organizations. Priorities
will largely be driven by the business strategies of the organization (which are assessed in the
first step of the roadmap process) and the current state of the organization. Although the
priorities will vary, two themes are common across organizations making the move to service-
oriented management: the need for a centralized repository of information and reporting and the
benefits of optimizing policies and procedures.

258
Chapter 12

Building a Central Management Foundation


Most, if not all, tasks associated with service-oriented management require or are made more
efficient with the use of a CMDB, and some in particular, are virtually impossible without it:
• Change management
• Incident management
• Patch management
All these operations depend upon knowing what assets are in the organization, how they are
configured, and how they relate to one another. A centralized repository provides information
about individual assets as well as relationships between assets to delivered and agreed upon
services. Gathering, verifying, and maintaining this can be a significant undertaking.

Creating and maintaining a CMDB is a well-established systems management practice. It is a central


element of the ITIL framework; for more information about ITIL see, http://www.itil.co.uk/.

Types of Information in Centralized Repository


Building a centralized repository of information in a CMDB is best done in stages. First, the
sections of the IT infrastructure should be prioritized and the most important configuration items
addressed. This may include:
• Mission-critical servers
• PCs used in front-line support
• Key network devices
• In some cases, lines of business and other organizational structures
For each of these, the following information should be collected:
• OS running on the device
• OS version and patch status
• Applications on the device
• Location within the logical network
• Status of required software, such as anti-malware and host-based firewalls
• Network information, such as IP address and DNS server
In addition, relationships between devices should also be collected. This can include
dependencies, such as:
• Devices dependent on particular routers and switches
• IP devices dependent on particular DNS servers
• Content filters and intrusion prevention systems (IPSs) assigned to particular subnets
• Outputs from one organizational unit as inputs to another unit
With a sense of the types of information to collect, the next challenge is how to collect it.

259
Chapter 12

Methods for Collecting and Verifying Configuration Item Data


The level of information that should be tracked with configuration items is detailed enough and
some of it changes frequently enough that automated methods are required to collect and
maintain it. Two basic approaches are available: agent-based and agentless collection.
Agent-based collection depends upon a typically small program resident on devices that collect
local information and update the configuration database. Although agents can be carefully
designed to accommodate the specifics of each OS, this does introduce some management
overhead. For example, the agents much be distributed, installed, and updated just like other
applications on the devices. Furthermore, some systems managers might not want another piece
of code running on devices that could in any way interfere with existing applications.
An alternative method is agentless data collection. With agentless information gathering, devices
on the network are queried from a central server and no additional software is required on the
device. This approach has several advantages, including:
• No software is required to remain resident on the client
• New devices can be detected without installing agents on the device
• Collection procedures are centrally managed
With a CMDB in place, an organization is just about ready to begin implementing service-
oriented management practices. There is, however, one other area that must be attended to—
optimizing policies and procedures.

Optimizing Policies and Procedures


Automation and centralized management of information can help streamline operations and
make them more efficient; however, doing so cannot optimize operations to provide the greatest
benefit to the organization. Consider an example. A hypothetical company has an ad-hoc
approach to management and addresses basic tasks in the following ways:
• When new employees join the company, a notebook is ordered for the new worker unless
one is readily available, perhaps from someone who just left the company. There is no
central tracking of assets, so finding an existing asset depends on the memory of
individual managers.
• When a vendor announces a patch, the systems managers decide at that time to install it
or not, sometimes testing on non-production systems first and sometimes not. Without a
prioritized ranking of assets, there is no way to quickly determine all the devices that
should get the patch and in what order.
• Frontline IT support occasionally makes the rounds to check the status of anti-malware
on notebooks. Most of the sales staff spend most of their time out of the office, so finding
time to check their devices is a challenge. There is no single system for tracking which
devices have been checked and which have not.

260
Chapter 12

In such as extreme case as the one just described, one could install a CMDB, collect information
about configuration items, and even keep it up to date with regular refreshes. The problem is it
would do some good but not as much as possible. Potential benefits include:
• A single reporting system for which applications and OSs are running on each device
• A single reporting system for determining the patch level of each device
• A rudimentary asset-tracking system that could at least catalog basic information about
devices on the network
What this approach would miss are the benefits that come from a combination of well-
formulated management policies and automated services:
• Prioritizing devices in terms of mission-critical functions
• Linking documentation to configuration items
• Integrating asset management information with other management tools, such as patch
management and deployment systems
• Enforcing policies based on attributes of devices and their users
Optimizing policies and procedures requires:
• An understanding of business goals and strategies
• Overall risk tolerance of the organization
• Regulations and other constraints on the organization
• An understanding of the existing IT infrastructure and plans for future changes
• A commitment to follow established procedures when carrying out IT management tasks
The last bullet point is one of the most important. The specific details of how one manages
patches, deployments, or testing is often less important than the fact that one is following an
established set of procedures. This is the topic addressed in the final step of the roadmap.

261
Chapter 12

Roadmap Step 3: Implementing a Service Model for Systems


Management
In many ways, the most difficult work is done by the time you reach the final step of the
roadmap. Prior to this step, many of the issues have dealt with organizational readiness and the
ability to adopt formal management procedures. The foundation has been set with the
introduction of a CMDB and the tuning of policies and procedures. The next step is to put the
product of these efforts into day-to-day management. For that, there are three factors to keep in
mind:
• Adapting best practices
• Measuring operations
• Adapting to changing business requirements
Each of these high-level directives can help ensure the greatest benefit is derived from adopting a
service-oriented management approach.

Adapting Best Practices


Best practices, like ITIL and COBIT, are complementary frameworks for understanding the
kinds of tasks that must be performed to effectively manage and govern an IT operation. As
anyone who has been in IT for a decade or more knows, best practices are like fashions:
sometimes they are “in” and sometimes they are “out” but if you wait long enough, they will be
back.
There is some logic to this cycle. Any best practice will address some of the needs of
management. There are limits, though. Any best practice, such as ITIL, will not cover every
conceivable issue facing an IT manager. For example, ITIL does not adequately address
measuring management functions. Even if you completely implement ITIL practices, there will
be other tasks that need attention. Some will see this as a flaw in ITIL and advocate some other
set of best practices that are stronger on measurement. It is highly likely, though, that the new
framework will be weak in some other area. The point is that no best practice framework will
address all of a manager’s needs.
One way to benefit from a best practice frameworks is to use what they have, applying the ideas
incrementally and in combination with other frameworks to find the most appropriate solution
for your organization. Best practices do not manage for you; they do not alleviate the need to
experiment and formulate your own solutions. They are excellent starting points, not final
destinations.
Though out this guide, the discussion of service-oriented management has built on several best
practices and frameworks, including:
• ITIL for infrastructure management
• COBIT for governance
• ISO-17799 for security
These will surely evolve and improve but are also sufficiently useful for immediate adoption.

262
Chapter 12

COBIT and ITIL: Complementary not Competitive


COBIT and ITIL are both popular frameworks for managing IT, but they address different levels of
management. They are best seen as complementing each other, not competing with each other.
COBIT is a governance framework. The goals of COBIT are to align IT operations with business
objectives and to ensure successful implementation of those objectives. COBIT is divided into four main
areas:
● Planning and organizing
● Acquiring and deploying
● Delivering and supporting
● Monitoring and evaluating
These cover the full breadth of IT operations. ITIL, however, is more focused on delivery of services and
support, assuming proper alignment and governance are already in place. The core areas of ITIL are:
● Service support
● Service delivery
● Planning service management
● Security management
● Infrastructure management
● Application and software asset management
Both COBIT and ITIL can be implemented independently but the high-level executive perspective of
COBIT also works well alongside the procedural management perspective of ITIL.
Another framework that is commonly used is the COSO framework (from Committee of Sponsoring
Organizations of the Treadway Commission), which primarily addresses financial governance and
management. This best practice addresses several areas, including:
● Internal control environment
● Objective setting
● Event identification
● Risk assessment
● Risk response
● Control activities
● Information and communication
● Monitoring
The objective of this framework is to identify the processes and executive management tasks associated
with the rational pursuit of organizational objectives, the efficient use of resources, and proper and
responsible reporting to stakeholders. COSO addresses broad organizational functions, not just IT
operations, but it is often discussed, along with COBIT, as a means of achieving compliance with
government regulations, particularly SOX.

263
Chapter 12

Measuring Operations
There is an old saying that you cannot manage what you do not measure. This is certainly true in
IT. One does not need to measure every aspect of every procedure and operation. Rather, it is
better to find representative measures for key services, such as:
• In service support, the number of service desk calls, the duration of calls, and the number
of calls escalated
• In patch management, the number of patches applied, the number of failed patch
operations, and the time required to apply patches
• In deployment management, the number of devices updated, the number of failed
deployment attempts, and the staff hours required
• In change management, the number of changes, the time to approve changes, and the
number of emergency changes
Like best-practice frameworks, these examples are starting points for formulating sets of
measures that reflect the state of IT infrastructure and operations.

Adapting to Changing Business Requirements


Business requirements are dynamic. Sound IT management practices are relatively well
structured but still accommodate change. The goal for IT management is to provide a stable
infrastructure that can be applied in different ways and can be changed relatively easily. Several
factors contribute to this:
• Use of management best practices—There is no re-inventing of the wheel
• Use of standardized architecture—Variations and exceptions increase management
challenges
• Commitment to policies and procedures—Every exception to these creates potentially
more work for systems administrators at later times
• Centralized repository of configuration data so that information is available when needed
Service-oriented management as described throughout this guide contributes to the adaptability
of an organization and, when used in conjunction with best practices such as ITIL and COBIT,
provides the foundation for an adaptable IT organization.

264
Chapter 12

Summary
Management models that have worked in the past in more slowly changing business
environments are no longer sufficient for the dynamics of today’s IT operations. To enable an
adaptable operation, you must assess the current status of IT operation, if necessary, plan a
transition to a mature service model based on frameworks such as service-oriented management,
ITIL, and COBIT, and finally implement and maintain the practices outlined there. IT
management is demanding but the tools and practices are established to help you bring direct
value to organization.
Throughout this guide, service-oriented management has been presented as a means to address
the key challenges facing IT operations, including:
• Business objectives and IT alignment
• Planning and risk management
• Business continuity and operational integrity
• Security and compliance
• Capacity planning
• Asset management
• Service delivery
The key features of a service-oriented management strategy that serve this goal include:
• Modularity of services
• Comprehensive management of configuration items in a centralized repository—the
CMDB
• Ability to report on assets and dependencies between assets
• Support for maintaining adequate security in the information infrastructure
• Support for asset management
• Support for the delivery of new IT services and applications
There is no single process or methodology that will guarantee the success of an IT operation.
There are, however, well-developed best practices that provide ideal starting points and detailed
guidance on managing a significant part of any information management operation. That in
conjunction with the ability to adapt to the particular needs of your own organization is the best
approach to meeting your organization’s long-term goals and objectives.

265
Chapter 12

Download Additional eBooks from Realtime Nexus!


Realtime Nexus—The Digital Library provides world-class expert resources that IT
professionals depend on to learn about the newest technologies. If you found this eBook to be
informative, we encourage you to download more of our industry-leading technology eBooks
and video guides at Realtime Nexus. Please visit http://nexus.realtimepublishers.com.

266

Vous aimerez peut-être aussi