Vous êtes sur la page 1sur 10

Are Your Capacity Management Processes Fit For The

Cloud Era?
An Intelligent Roadmap for Capacity Planning
BACKGROUND
In any utility environment (electricity, gas, water, mobile telecoms) ensuring there is enough
resource or capacity available to meet demands is critical in satisfying the needs of the
consumer.

At the same time, it is important to ensure that there is not an over supply of capacity in a
market, as this will ultimately impact profitability and viability of the suppliers business model.

These rules not only apply to traditional utilities, but also to the provisioning of IT infrastructure
used to host applications that support the business.

Virtualization accelerated this concept through its ability to share compute resources amongst
multiple application workloads, and the agility it provides to rapidly provision and reconfigure
compute resources through software.

Industrialization of capacity management processes in virtualized data centers presents a


significant opportunity to optimize ongoing CAPEX and OPEX, whilst assuring consistent delivery
of application performance. It also ensures greater business agility by reducing the lead-time
to stand up new application services.

Whilst virtualization is a great enabler to get a better return on investment from virtualized IT
infrastructure, many Enterprise IT departments and Service Providers have relatively immature
capacity management processes and are not exploiting the latest innovations that would
enable them to transform their situation.

2
ARE YOUR CAPACITY MANAGEMENT PROCESSES FIT FOR THE CLOUD ERA?

TODAYS CAPACITY MANAGEMENT PARADIGM AND ITS PITFALLS


Today, many organizations apply very simple principles to determine their requirements for
compute capacity in their virtualized data centers.

This is typically based on a resource allocation model which takes the total amount of memory
and CPU capacity allocated to all virtual machines in a compute cluster and a level of over
provisioning (e.g. 2:1, 4:1, 8:1, 12:1 ) is assumed in order to calculate the requirement for physical
resources.

The level of over provisioning is often directly related to different tiers of infrastructure and the
service levels offered by these. For example, a Platinum Service level may be offered on a
Compute Cluster that has a conservative overprovision ratio of 2:1, and a Bronze Service might
be offered on a Compute Cluster with a more aggressive overprovision ratio of 12:1.

The application owner will make a decision about the tier of infrastructure on which they want
their application to run based on a trade off between the level of risk they are willing to assume
and the cost.

This process of capacity management is typically managed in spreadsheets or in simple


databases that do not take into account the actual resource consumption driven by each
application workload running in the operational environment.

Spreadsheet planning is often


augmented with simple monitoring
tools that send alerts to Operations
when resources within the virtual
infrastructure cross predetermined
thresholds, indicating the risk of
performance issues. In such
circumstances, Operations will be
notified of performance risks and
mobilize their technical resources to
investigate potential issues and
devise remediation plans.

A capacity management strategy


based on resource allocation and
over provisioning ratios is further
floored because application owners Figure 1: The average utilization in the cluster is less than 30%.
typically over specify the amount of
CPU and Memory resources their application is going to need when they request virtual
machines. This invariably results in larger virtual machine configurations than are actually required
to run the applications reliably, but because this capacity management strategy is based on
allocation and not on actual utilization, it inherently corrodes the level of efficiency that can be
driven from the underlying infrastructure.

3
Figure 1 shows a compute cluster implemented with a ratio of 4:1 over provisioning. Utilization
levels are very low relative to the point where there would be any risk of resource contention that
would impact application performance. The result is a highly inefficient use of resources.

In compute clusters with higher ratios of over provisioning, there is greater risk of performance
issues, because hypervisor schedulers focus on balancing resources based on the deviation of
memory or CPU utilization across hosts within a cluster. They do not look at the problem from the
perspective of how to best meet the current and historical resource requirements of the actual
application workloads. In addition to this, they do not consider key resources such as CPU
scheduling, Memory Ballooning
and Swapping, I/O, and
network resources in their
decision of where to place
workloads. In brief, they are
simple much to the detriment
of efficiency.

The results of hypervisor


scheduling can be seen in
Figure 2, on a compute cluster
with an over provisioning ratio
of 10:1. These images were
recorded after the failure of
Figure 2: Hypervisor scheduling has created unnecessary risk and
one of the hosts in a cluster inefficiencies across the cluster.
running DRS, and demonstrate
how a simple hardware failure can introduce business risk from the hypervisor schedulers inability
to accommodate application workloads. The result is far from perfect, as certain hosts are
running at very high levels of utilization, and there is no contingency on these hosts to support
dynamic changes in application workload demand, whilst others are underutilized.

VMTurbo has benchmarked hundreds of virtualized data centers and commissioned


independent surveys of 150 enterprise IT organizations by TechValidate. In sum, these studies
reveal that in more than 80% of situations there are significant opportunities to drive much greater
levels of infrastructure efficiency, drive down the unit cost of compute services, and improve the
level of service delivered through the industrialization of capacity management.

4
ARE YOUR CAPACITY MANAGEMENT PROCESSES FIT FOR THE CLOUD ERA?

COMPLEXITIES OF ACHIEVING A MORE DESIRABLE OUTCOME


In order to achieve the Desired outcome of an effective capacity
management strategy one where infrastructure efficiency is fully
optimized whilst the risk of application performance issues is traded
off with the level of investment in infrastructure there is a complex
problem to be solved.

At VMTurbo we refer to this problem as the Intelligent Workload


Management Problem. This is a complex problem to solve because
it involves holistically controlling resource allocation decisions by
considering many different dimensions of resources, configuration
and business constraints across the virtualized IT stack, on an ongoing basis. These dimensions
include the following:

Multiple Infrastructure Resources Virtualized application workloads depend on multiple


resources in the infrastructure and they must be able to get the resources they need to operate
reliably. This includes resources such as allocated vMem, vCPU, and disk space, as well as
physical compute and storage resources like memory, ballooning, swapping, CPU capacity, CPU
scheduling, storage I/O, disc space, and network resources. All of these and more must be
considered holistically when making resource allocation decisions in order to avoid the pitfalls of
overprovision-ratio strategies. These misguided approaches optimize for one type of resource only
to have a detrimental impact on other resources.

Infrastructure Constraints The way that networks and storage are provisioned constrains where
specific workloads can be placed and moved within the environment. These constraints should
be taken into account when making resource allocation decisions to reflect what is really
possible in the real world.

Business Constraints Organizations typically have business policies that also restrict where
workloads can run within the environment. These constraints may assist business continuity by
ensuring that redundant application instances do not co-exist on the same physical resource.
They may also support software-licensing requirements or adhere to security policies. These rules,
while essential, constrain how the available compute resources can be exploited.

Time When making resource allocation decisions, you need to consider the time dimension of
the past, because application workloads fluctuate and peak at different times, driven by
fluctuations in the business process they support. They may also grow or contract over time in their
consumption of resources. These characteristics should be factored into any workload placement
and resource allocation decisions.

The Future - Most organizations are constantly changing their virtualized data center, driven by
the need to deliver new projects, execute ongoing tech refreshes, and/or support organic
growth in demand of existing applications. Having the ability to look at the past and accurately
predict the impact of changes in the future is critical in planning what resource allocation
decisions should be taken to accommodate planned changes.

5
Complex Puzzle What resource allocation decisions
should be taken on an ongoing basis to accommodate
application workload demand?

The net reality is that in an environment with just a few


hundred virtualized application workloads, thousands of
variables need to be continually analyzed to derive the
decisions that must be taken on an ongoing basis to
maintain the environment in the Desired State.

This puzzle is not one that can be solved by human brainpower at least not within the necessary
time constraint and therefore most organizations accept the costly tradeoff of over investing in
infrastructure capacity to try and mitigate risk. Even when organizations do this, they may still
experience quality of service issues due to complexity of the Intelligent Workload Management
Problem.

The alternative to overinvestment is to employ state of the art capacity management software
that employs a mathematical approach to solve the Intelligent Workload Management
Problem.

DEFINING FUNCTIONAL REQUIREMENTS FOR YOUR PERFORMANCE & CAPACITY


MANAGEMENT STRATEGY
When architecting your performance and capacity management strategy, it is important to
differentiate between the monitoring/reporting functions and other resource management
functions that solve the Intelligent Workload Management problem. Both have valid use cases,
but provide a different outcome.

MONITORING AND REPORTING

There are literally hundreds of toolsets in the marketplace today that provide monitoring and
reporting capabilities, and they are a necessity in all mission-critical IT environments.

Many of the solutions available today have embedded analytical capabilities to assist users in
turning large volumes of monitoring data into information that can be consumed by IT Operations
and Engineering.

Examples of data reduction techniques include threshold alerting based on real time exceptions,
or trending in resource consumption to predict when resources will run out in the future. More
recently, solutions in this space have embedded analytics that focus on anomaly detection.
These solutions seek abnormal patterns in IT monitoring data, and infer pending issues that could
impact service delivery. These types of analytics can be classified as Data Analysis.

6
ARE YOUR CAPACITY MANAGEMENT PROCESSES FIT FOR THE CLOUD ERA?

Whilst these solutions may focus Operations on potential issues that could impact service delivery,
they do not provide the Decision Analytics required to determine what actions should be taken
to prevent and/or resolve resource contention and/or drive greater efficiency from the IT
infrastructure. In aggregate, these decisions could range from the hundreds to the thousands for
any single point in time. The inability of any human to not only interpret Data Analysis output, but
also act on such output highlights an important gap that can be addressed by a new category
of solution.

RESOURCE ALLOCATION CONTROLLER

Virtualization has enabled us to change the paradigm of how we manage IT service delivery
through the ability to dynamically allocate resources through software controls. Whilst hypervisor
scheduling is marketed as a mechanism to control resource allocation in virtualized data centers,
the scope of decision-making and control they provide is very limited.

What is required to solve the Intelligent Workload Management Problem as part of a


performance and capacity management architecture is a discrete mechanism to perform
Decision Analysis and Control, capable of accounting for all of the resources and constraints
described earlier, and driving a broad set of actions across the IT stack to continually maintain
the environment in the Desired State. These actions should include decisions on where to place
application workloads and how to allocate, provision, or decommission resources, all based on
the changing picture of application workload demand and priorities.

By maintaining the environment in the Desired State, these functions prevent resource
contention by sustaining the service levels of virtualized applications, and cut down the number
of problems and incidents that
Operations must handle.

In addition to these benefits, Desired


State functionality also drives greater
adoption of existing service
management and monitoring systems
within the Operations community,
because it eliminates a significant
level of noise within the user
communities of these systems.

Maintaining the environment in the


Desired State also dramatically
reduces the cost of delivering
compute services, by significantly
increasing the efficiency of the
provisioned infrastructure.
TechValidates survey of 150 VMTurbo
customers revealed more that 80% Figure 3: Reduced risk and greater headroom to provision
experienced improvements of 20-40% additional workloads driving higher levels of efficiency.

7
in infrastructure efficiency.

When contextualized by the growth in the number of virtualized applications workloads


experienced in most organizations, and the incremental infrastructure required to support this,
efficiency improvements of 20-40% can have a very significant impact on future CAPEX
requirements, making the investment required to deliver Decision Analysis and Control functions
completely self funding in just months.

Figure 3 shows the improvement that intelligent workload placement decisions had on a
compute cluster bearing a 12:1 over provisioning strategy. The risks of application performance
issues were significantly reduced
and a more efficient use of
resources was achieved.

In a cluster with a 4:1 over


provisioning strategy, intelligent
workload placement decisions
enabled the application workloads
to run on a significantly smaller
hardware footprint whilst mitigating
the risk of performance issues (Figure
4).

Figures 5 and 6 illustrate examples Figure 4: Existing workload can be accommodated on a much
of the types of resource allocation smaller hardware footprint without impacting service.

decisions that can be driven through Decision Analysis & Control functions to control the
environment in the "Desired State", assuring applications get the resources they need to
operate reliably whilst maximizing infrastructure efficiency.

Figure 5: The resource management decisions that need to be taken to address existing and future performance risks
and inefficiencies.

8
ARE YOUR CAPACITY MANAGEMENT PROCESSES FIT FOR THE CLOUD ERA?

Figure 6: Consolidate workloads on fewer physical resources and reclaim unused virtual resources.

SUMMARY
1. Review your organizations current capacity management strategy, and identify
opportunities to transform the approach by better aligning resource supply and demand.
2. When assessing your performance and capacity management capabilities as part of
systems management strategy, make sure you differentiate between monitoring and
reporting functions from the Decision Analysis and Control functions required to
proactively maintain your environment in the Desired State on an ongoing basis.
3. Deploy a Proof of Concept in an operational environment to benchmark the business
benefit that can be realistically derived by maintaining your environment in the Desired
State, reducing risk and driving greater operational and infrastructure efficiency.

9
ABOUT VMTURBO
Founded in 2009, VMTurbo is a company founded on the belief that IT operations
management needs to be fundamentally changed to allow your organization to unlock the
full value of todays virtualized infrastructure and cloud services. Our charter is to transform IT
operations in cloud and virtualized environments from a complex, labor intensive, and
volatile process to one that is simple, automated and predictabledelivering greater
control in maintaining a healthy state and consistent service delivery.

VMTurbo offers an innovative control system for virtualized data centers. By leveraging the
dynamic resource allocation abilities of virtualization and automating decisions for resource
allocation and workload placement in software, our solution ensures applications get the
resources required while maximizing utilization of IT assets. Over 9,000 enterprises worldwide
have selected VMTurbo, including British Telecom, Colgate, CSC and the London School of
Economics.

VMTurbo is headquartered in Massachusetts, with offices in New York, California, United


Kingdom and Israel.

www.vmturbo.com

10

Vous aimerez peut-être aussi