Vous êtes sur la page 1sur 21

Why Nutanix Acropolis hypervisor (AHV) is the next

generation hypervisor Part 2 Simplicity


Posted on November 13, 2015 by Josh Odgers
Let me start by saying I believe complexity is one of the biggest and potentially the most
overlooked, issue in modern datacenters.

Virtualization has enabled increased flexibility and solved countless problems within the
datacenter. But over time I have observed an increase in complexity especially around the
management components which for many customers is a major pain point.

Complexity leads to things like increased cost (both CAPEX & OPEX) and risk, which commonly
leads to reduced availability/performance.

In Part 10, I will cover Cost in more depth so lets park it for the time being.
When architecting solutions for customers, my number one goal is to meet/exceed all my
customers requirements with the simplest solution possible.

Acropolis is where web-scale technology delivers enterprise grade functionality with


consumer-grade simplicity, and with AHV the story gets even better.

Removing Dependencies
A great example of the simplicity of the Nutanix Xtreme Computing Platform (XCP) is its lack of
external dependencies. There is no requirement for any external databases when running
Acropolis Hypervisor (AHV) which removes the complexity of designing, implementing and
maintaining enterprise grade database solutions such as Microsoft SQL or Oracle.

This is even more of an advantage when you take into account the complexity of deploying
these platforms in highly available configurations such as AlwaysOn Availability Groups (SQL)
or Real Application Clusters (Oracle RAC) where SMEs need to be engaged for design,
implementation and maintenance. As a result of not being dependent on 3rd party database
products, AHV reduces/removes complexity around product interoperability or the need to call
multiple vendors if something goes wrong. This also means no more investigating Hardware
Compatibility Lists (HCLs) and Interoperability Matrixs when performing upgrades.
Management VMs
Only a single management virtual machine (Prism Central) needs to be deployed even for
multi-cluster globally distributed AHV environments. Prism Central is an easy to deploy
appliance and since its state-less, it does not require backing up. In the event the appliance is
lost, an administrator simply deploys a new Prism Central appliance and connects it to the
clusters which can be done in a matter of seconds per cluster. No historical data is lost as the
data is maintained on the clusters being managed.

Because Acropolis requires no additional components, it all but eliminates the


design/implementation and operational complexity for management compared to other
virtualization / HCI offerings.

Other supported hypervisors commonly require multiple management VMs and backend
databases even for relatively small scale/simple deployments just to provide basic
administration, patching and operations management capabilities.

Acropolis has zero dependencies during the installation phase, customers can implement a
fully featured AHV environment without any existing hardware/software in the datacenter. Not
only does this make initial deployment easy, but it also removes the complexity around
interoperability when patching or upgrading in the future.

Ease of Management
Nutanix XCP clusters running any hypervisor can be managed individually using Prism Element
or centrally via Prism Central.

Prism Element requires no installation; it is available and performs optimally out-of-the-box.


Administrators can access Prism Element via the XCP Cluster IP address or via any Controller
VM IP address.

Administrators of Legacy virtualization products often need to use hypervisor-specific tools to


complete various tasks requiring design/deployment and management of these components
and their dependencies. With AHV, all hypervisor level functionality is completed via Prism
providing a true single pane of glass interface for everything from Storage, Compute, Backup,
Data Replication, Hardware monitoring and more.

The image below shows the PRISM Central Home Screen that provides a high-level summary of
all clusters in the environment. From this screen, you can drill down to individual clusters to
get more granular information where required.
Administrators perform all upgrades from PRISM without the requirement for external
update management applications/appliances/VMs or supporting back end databases.

PRISM performs one-click fully automated rolling upgrades to all components including
Hypervisor, Acropolis Base Platform (formally known as NOS), Firmware and Nutanix
Cluster Check (NCC).

Further Reduced Storage Complexity


Storage has long been, and continues for many customers to be, a major hurdle to
successful virtual environments. Nutanix has essentially made storage invisible over the
past few years by removing the requirement for dedicated Storage Area Networks, Zoning,
Masking, RAID and LUNs. When combined with AHV, XCP has taken this innovation yet
another big step forward by removing the concepts of datastores/mounts and virtual SCSI
controllers.

For each Virtual Machine disk, AHV presents the vDisk directly to the VM, and the VM
simply sees the vDisk as if it were a physically attached drive. There is no in-guest
configuration. It just works.

This means there is no complexity around how many virtual SCSI controllers to use, or
where to place a VM or vDisk and as such, Acropolis has eliminated the requirement for
advanced features to manage virtual machine placement and capacity management such
as vSpheres Storage DRS.

Dont get me wrong, Storage DRS is a great feature which helps solve serious problems
with traditional storage. With XCP these problems just dont exist.

The following screen shot shows just how simple vDisks appear under the VM configuration
menu in Prism Element. There is no need to assign vDisks to Virtual SCSI controllers which
ensures vDisks can be added quickly and perform optimally.
Node Configuration
Configuring an AHV environment via Prism automatically applies all changes to each node
within the cluster. Critically, Acropolis Host Profiles functionality does not need to be
enabled or configured, nor do Administrators have to check for compliance or create/apply
profiles to nodes.

In AHV all networking is fully distributed similar to the vSphere Distributed Switch (VDS)
from VMware. AHV network configuration is automatically applied to all nodes within the
cluster without requiring the administrator to attach nodes/hosts to the virtual networking.
This helps ensure a consistent configuration throughout the cluster.

The reason the above points are so important is each dramatically simplifies the
environment by removing (not just abstracting) many complicated design/configuration
items such as:

Multipathing
Deciding How many datastores are required & what size each should be
Considering how many VMs should reside per datastore/LUN.
Configuration maximums for Datastores / Paths
Managing consistent configuration across nodes/hosts
Managing Network Configuration
Administrators can optionally join Acropolis built-in authentication to an Active Directory
domain, removing the requirement for additional Single Sign-On components. All Acropolis
components include High Availability out-of-the-box, removing the requirement to design
(and license) HA solutions for individual management components.
Data Protection / Replication
The Nutanix CVM includes built-in data protection and replication components, removing
the requirement to design/deploy/manage one or more Virtual Appliances. This also avoids
the need to design, implement and scale these components as the environment grows.

All of the data protection and replication features are also available via Prism and,
importantly, are configured on a per VM basis making configuration easier and reducing
overheads.

Summary
In summary the simplicity of the AHV eliminates:

1. Single points of failures for all management components out of the box
2. The requirement for dedicated management clusters for Acropolis components
3. Dependency on 3rd Party Operating Systems & Database platforms
4. The requirement for design, implementation and ongoing maintenance for Virtualization
management components
5. The need to design, install, configure & maintain a Web or Desktop type
6. Complexity such as
A. The requirement to install software or appliances to allow patching / upgrading
B. The requirement for an SME to design a solution to make management
components highly available
C. The requirement to follow complex Hardening Guides to achieve security
compliance.
D. The requirement for additional Appliances/interfaces and external dependencies
(i.e.: Database Platforms)
7. The requirement to license features to allow Centralised configuration management of
nodes.

Storage Distributed Resource


Scheduler (DRS)
VMware DRS (Distributed Resource Scheduler) is a load balancing utility that assigns and
moves computing workloads to available hardware resources in a virtualized environment.

DRS can be configured to recommend workload balancing or to automatically move workloads.


VMware DRS users can refine resource allocation with affinity rules and anti-affinity rules. The
utility allows VMware administrators to prioritize resources according to application importance.
The Distributed Power Management (DPM) feature of DRS can consolidate workloads in off-
peak hours to minimize energy consumption in the data center. VMware DRS also aids in
scheduled server maintenance by balancing VMs' workloads across other hosts.

VMware DRS is part of VMware Inc's vSphere virtualization suite, Enterprise and Enterprise
Plus editions. With vSphere 5.0 and newer releases, VMware introduced Storage DRS, which
balances VM storage consumption with data store clusters, based on the same principles as
VMware DRS.

Storage Distributed Resource Scheduler

vSphere 5 introduced Storage Distributed Resource Scheduler (Storage


DRS), a feature that enables load and capacity balancing of virtual machines
across data stores. Using vCenter Server, multiple data stores can now be
placed into an administrative cluster among which virtual machines can be
moved, depending on I/O load and capacity.

The feature operates in two modes. Initial Placement determines the best
place to deploy a virtual machine based on the current capacity and load on
each data store within a cluster. From then on, DRS can provide
recommendations on where to move a virtual machine to improve either the
I/O response time, capacity or both.

Storage DRS can report on recommended virtual machine moves or have


vSphere move virtual machine files automatically once certain thresholds
have been reached. Storage DRS operates at the vCenter Server level and
requires a Storage vMotion licence.

VMware recommends that all data stores within a cluster have similar
performance characteristics to effectively load balance across all resources
VMwares creeping influence over storage

With the release of vSphere 5, VMware has further blurred the boundaries between
storage and the hypervisor. vSphere can now relocate virtual machines based on the
performance and capacity of the underlying storage. While this is a good feature for
basic storage arrays, there is the risk of causing performance issues with more
complicated storage environments. VMware is looking to push further into the storage
market with its vSphere Storage Appliance, which removes the need for many
organisations to implement a dedicated storage array. VMwares parent company,
EMC, is developing the ability to run virtual machines on the storage array, so we can
see a future where hypervisor and storage move closer together. The job of the
traditional storage administrator is changing to one of a more generalist nature, with the
need to have a range of other skills, including networking and virtualisation. It is unlikely
that the role of storage administrator will continue to be a dedicated practice for much
longer.

Why Nutanix Acropolis hypervisor (AHV) is the next


generation hypervisor Part 3 Scalability
Posted on November 13, 2015 by Josh Odgers

Scalability is not just about the number of nodes that can form a cluster or the maximum
storage capacity. The more important aspects of scalability is how an environment expands
from many perspectives including Management, Performance, Capacity, Resiliency and how
scaling effects Operational aspects.

Lets start with scalability of the components required to Manage/Administrator AHV:

Management Scalability
AHV automatically sizes all Management components during deployment of the initial cluster,
or when adding node/s to the cluster. This means there is no need to do initial sizing or manual
scaling of XCP management components regardless of the initial and final size of the cluster/s.

Where Resiliency Factor of 3 (N+2) is configured, the Acropolis management components will
be automatically scaled to meet the N+2 requirement. Lets face it, there is no point having N+2
at one layer and not at another because availability, like a Chain, is only as good as its weakest
link.

Storage Capacity Scaling


The Nutanix Distributed Storage Fabric (DSF) has no maximum Storage Capacity, additionally,
storage capacity can even be scaled separately to compute with Storage-only nodes such as
the NX-6035C. Nutanix storage only nodes help eliminate the problems when scaling capacity
compared to traditional storage.

Scaling Storage-only nodes run AHV (which are interoperable with other supported
hypervisors) allowing customers to scale capacity regardless of Hypervisor. Storage-only nodes
do not require hypervisor licensing or separate management. Storage only nodes also fully
support all one-click upgrades for the Acropolis Base Software and AHV just like
compute+storage nodes. As a result, storage only nodes are invisible, well apart from the
increased capacity and performance which the nodes deliver.

Nutanix Storage only nodes help eliminate the problems when scaling capacity compared to
traditional storage, for more information see: Scaling problems with traditional shared storage.
Some of the scaling problems with traditional storage is adding shelves of drives and not
scaling data services/management. This leads to problems such as lower IOPS/GB and higher
impact to workloads in the event of component failures such as storage controllers.

Scaling storage only nodes is remarkably simple. For example a customer added 8 x NX6035C
nodes to his vSphere cluster via his laptop on the showroom floor of vForum Australia in
October of this year.

https://twitter.com/josh_odgers/status/656999546673741824
As each storage-only node is added to the cluster, a light-weight Nutanix CVM joins the cluster
to provide data services to ensure linear scale out management and performance capabilities,
thus avoiding the scaling problems which plague traditional storage.

For more information on Storage only nodes, see: http://t.co/LCrheT1YB1


Compute Scalability
Enabling HA within a cluster requires reserving one or more nodes for HA. This can create
unnecessary inefficiencies when the hypervisor limits the maximum cluster size. AHV not only
has no limit to the number of nodes within a cluster. As a result, AHV can help avoid
unnecessary silos that can lead to inefficient use of infrastructure due to requiring one or more
nodes per cluster to be reserved for HA. AHV nodes are also automatically configured with all
required settings when joining an existing cluster. All the administrator needs to provide is
basic IP address information, Press Expand cluster and Acropolis takes care of the rest.

Analytics Scalability
AHV includes built-in Analytics and as with the other Acropolis Management components,
Analysis components are sized automatically during initial deployment and scales
automatically as nodes are added.

This means there is never tipping point where there is a requirement for an administrator to
scale or deploy new Analysis instances or components. The analysis functionality and its
performance remains linear regardless of scale.

This means AHV eliminates the requirement for seperate software instances and
database/s to provide analytics.

Resiliency Scalability
As Acropolis uses the Nutanix Distributed Storage Fabric, in the event drive/s or node/s fail,
all nodes within the cluster participate in restoring the configured resiliency factor (RF) for
the impacted data. This occurs regardless of Hypervisor, however, AHV includes fully
distributed Management components; the larger the cluster, the more resilient the
management layer also becomes.

For example, the loss of a single node in a 4-node cluster would have potentially a 25%
impact on the performance of the management components. In a 32-node cluster, a single
node failure would have a much lower potential impact of only 3.125%. As an AHV
environment scales, the impact of a failure decreases and the ability to self-heal increases
in both speed to recover and number of subsequent failures which can be supported.

For information about increasing resiliency of large clusters, see: Improving Resiliency of
Large Clusters with EC-X
Performance Scalability
Regardless of hypervisor, as XCP clusters grow, the performance improves. The moment
new node(s) are added to the cluster, the additional CVM/s start participating in
Management and Data Resiliency tasks even when no VMs are running on the nodes.
Adding new nodes allows the Storage Fabric to distribute RF traffic among more Controllers
which enhances Write I/O & resiliency while helping decrease latency.

The advantage that AHV has over the other supported hypervisors is that the performance
of the Management components (many of which have been previously discussed)
dynamically scale with the cluster. Similar to Analytics, AHV management components
scale out. There is never a tipping point requiring manual scale out of management or new
deploying instances of management components or their dependencies.

Importantly, for all components, the XCP distributes data and management functions across
all nodes within the cluster. Acropolis does not use mirrored components/hardware or
objects which ensures no two nodes or components/hardware become a bottleneck or point
of failure.

Why Nutanix Acropolis hypervisor (AHV) is the next


generation hypervisor Part 4 Security
Posted on November 13, 2015 by Josh Odgers

Security is a major pillar of the XCP design. The use of innovative automation results in perhaps
the most hardened, simple and comprehensive virtualization infrastructure in the industry.

AHV is not designed to work with a comprehensive HCL of hardware vendors, nor does it have
countless bolt-on style products which need to be catered for. Instead Acropolis hypervisor has
been optimized to work with the Nutanix Distributed Storage Fabric and approved appliances
from Nutanix and OEM partners to provide all services/functionality in a truly Web scale
manner.

This allows for much tighter and targeted quality assurance and dramatically reduces the
attack surface compared to hypervisors.

The Security Development Lifecycle (SecDL) is leveraged across the entire Acropolis platform
ensuring every line of code is production ready. This design follows a defense-in-depth model
that removes all unnecessary services for libvirt/QEMU (SPICE, unused drivers), leverages
libvirt non-root group sockets for principle of least privilege, SELinux confined guests for
vmescape protection, and an embedded intrusion detection system.

Acropolis hypervisor has a documented and supported security baseline (XCCDF STIG),
and introduces the self-remediating hypervisor. On a customer defined interval, the
hypervisor is scanned for any changes to the supported security baseline, and resets the
baseline back to the secure state if any anomaly is detected in the background with no user
intervention.

The Acropolis platform also boats a comprehensive list of security certifications/validations:

Summary
Acropolis provides numerous security advantages including:
1. In-Built and self auditing Security Technical Implementation Guides (STIGs)
2. Hardened hypervisor out of the box without the requirement for administrators to apply
hardening recommendations
3. Reduced attack surface compared to other supported hypervisors

Why Nutanix Acropolis hypervisor (AHV) is the next


generation hypervisor Part 5 Resiliency
Posted on November 13, 2015 by Josh Odgers

When discussing resiliency, it is common to make the mistake of only looking at data resiliency
and not considering resiliency of the storage controllers and the management components
required to service the business applications.

Legacy technologies such as RAID and Hot Spare drives may in some cases provide high
resiliency for data, however if they are backed by a dual controller type setups which
cannot scale out and self heal, the data may be unavailable or performance/functionality
severely degraded following even a single component failure. Infrastructure that is dependant
on HW replacement to restore resiliency following a failure is fundamentally flawed as I have
discussed in: Hardware support contracts & why 247 4 hour onsite should no longer be
required.
In addition if the management application layer is not resilient, then data layer high-
availability/resiliency may be irrelevant as the business applications may not be functioning
properly (i.e.: At normal speeds) or at all.

The Acropolis platform provides high resiliency for both the data and management layers at a
configurable N+1 or N+2 level (Resiliency Factor 2 or 3) which can tolerate up to two
concurrent node failures without losing access to Management or data. In saying that, with
Block Awareness, an entire block (up to four nodes) can fail and the cluster still maintains full
functionality. This puts the resiliency of data and management components on XCP up to N+4.

In addition, the larger the XCP cluster, the lower the impact of a node/controller/component
failure. For a four node environment, N-1 is 25% impact whereas for an 8 node cluster N-1 is
just a 12.5% impact. The larger the cluster the lower the impact of a controller/node failure. In
contrast a dual controller SAN has a single controller failure, and in many cases the impact is
50% degradation and a subsequent failure would result in an outage. Nutanix XCP
environments self heal so that even for an environment only configured for N-1, it is possible
following a self heal than subsequent failures can be tolerated without causing high impact or
outages.

In the event the Acropolis Master instance fails, full functionality will return to the environment
after an election which completes within <30 seconds. This equates to management availability
greater than six nines (99.9999%). Importantly, AHV has this management resiliency built-in;
it requires zero configuration!
As for data availability, regardless of hypervisor the Nutanix Distributed Storage Fabric
(DSF) maintains two or three copies of data/parity and in the event of a SSD/HDD or node
failure, the configured RF is restored by all nodes within the cluster.

Data Resiliency
While we have just covered why resiliency of data is not the only important factor, it is still
key. After all, if a solution which provides shared storage looses data, its not fit for purpose
in any datacenter.

As data resiliency is such a foundation to the Nutanix Distributed Storage Fabric, the Data
resiliency status is displayed on the Prism Home Screen. In the below screenshot we can
see is that the ability to provide resiliency in both steady state and in the event of a failure
(Rebuild Capacity) are both tracked.

In this example, all data in the cluster is compliant with the configured Resiliency Factor
(RF2 or 3) and the cluster has at least N+1 available capacity to rebuild after the loss of a
node.

To dive deeper into the resiliency status, simply click on the above box and it will expand to
show more granular detail of the failures which can be tolerated.

The below screen shot shows things like Metadata, OpLog (Persistent Write Cache) and
back end functions such as Zookeeper are also monitored and alerted when required.

In the event either of these is not in a normal or Green state, PRISM will alert the
administrator. In the event the alert is the cause of a node failure, Prism automatically
notifies Nutanix support (via Pulse) and dispatches the required part/s, although typically an
XCP cluster will self-heal long before delivery of hardware even in the case of an
aggressive Hardware Maintenance SLA such as 4hr Onsite.

This is yet another example of Nutanix not being dependent on Hardware (replacement) for
resiliency.

Data Integrity
Acknowledging a Write I/O to a guest operating system should only occur once the data is
written to persistent media because until this point, it is possible for data loss to occur even
when storage is protected by battery backed cache and uninterruptible power supplies
(UPS).

The only advantage to acknowledging writes before this has occurred is performance, but
what good is performance when your data lacks integrity or is lost?

Another commonly overlooked requirement of any enterprise grade storage solution is the
ability to detect and recover from Silent Data Corruption. Acropolis performs checksums in
software for every write AND on every read. Importantly Nutanix is in no way dependent on
the underlying hardware or any 3rd party software to maintain data integrity, all check
summing and remediation (where required) is handled natively.
Pro tip: If a storage solution does not perform checksums on Write AND Read, DO NOT use
it for production data.

In the event of Silent Data Corruption (which can impact any storage device from any
vendor), the checksum will fail and the I/O will be serviced from another replica which is
stored on a different node (and therefore physical SSD/HDD). If a checksum fails in an
environment with Erasure Coding, EC-X recalculates the data the same way as if a
HDD/SSD failed and services the I/O.

In the background, the Nutanix Distributed Storage Fabric will discard the corrupted data
and restore the configured Resiliency Factor from the good replica or stripe where EC-X is
used.

This process is completely transparent to the virtual machine and end user, but is a critical
component of the XCPs resiliency. The underlying Distributed Storage Fabric (DFS) also
automatically protects all Acropolis management components, this is an example of one of
the many advantages of the Acropolis architecture where all components are built together,
not bolted on afterwards.

An Acropolis environment with a container configured with RF3 (Replication Factor 3)


provides N+2 management availability. As a result, it would take an extraordinarily unlikely
failure of three concurrent node failures before a management outage could potentially
occur. Luckily XCP has an answer for this albeit unlikely scenario as well, Block Awareness
is a capability where with 3 or more blocks the cluster can tolerate the failure of an entire
block (up to 4 nodes) without causing data or management to go offline.

Part of the Acropolis story around resiliency goes back to the lack of complexity. Acropolis
enables rolling 1-click upgrades and includes all functionality. There is no single point of
failure; in the worst-case scenario if the node with Acropolis master fails, within 30 seconds
the Master role will restart on a surviving node and initiate VMs to power on. Again this is in-
built functionality, not additional or 3rd party solutions which need to be designed/installed &
maintained.
The above points are largely functions of the XCP rather than AHV itself, so I thought I
would highlight a AHVs Load Balancing and failover capabilities.

Unlike traditional 3-tier infrastructure (i.e.: SAN/NAS) Nutanix solutions do not require multi-
pathing as all I/O is serviced by the local controller. As a result, there is no multi-pathing
policy to choose which removes another layer of complexity and potential point of failure.

However in the event of the local CVM being unavailable for any reason we need to service
I/O for all the VMs on the node in the most efficient manner. AHV does this by redirecting
I/O on a per vDisk level to a random remote stargate instance as shown below.
AHV can do this because every vdisk is presented via iSCSI and is its own target/LUN
which means it has its own TCP connection. What this means is a business critical
application such as MS SQL / Exchange or Oracle with multiple vDisks will be serviced by
multiple controllers concurrently.

As a result all VM I/O is load balanced across the entire Acropolis cluster which ensures no
single CVM becomes a bottleneck and VMs enjoy excellent performance even in a failure or
maintenance scenario.

For more information see: Acropolis Hypervisor (AHV) I/O Failover & Load Balancing
Summary:

1. Out of the box self healing capabilities for:


A. SSD/HDD/Node failure/s
B. Acropolis and PRISM (Management layer)
2. In-Built Data Integrity with software based checksums
3. Ability to tolerate up to 4 concurrent node failures
4. Management availability of >99.9999 (Six Nines)
5. No dependency on Hardware for data or management resiliency

Why Nutanix Acropolis hypervisor (AHV) is the next


generation hypervisor Part 6 Performance
Posted on November 13, 2015 by Josh Odgers
When talking about performance, its easy to get caught up in comparing unrealistic speed and
feeds such as 4k I/O benchmarks. But, as any real datacenter technology expert knows, IOPS
are just a small piece of the puzzle which, in my opinion, get far too much attention as I
discussed in my article Peak Performance vs Real World Performance.
When I talk about performance, I am referring to all the components within the datacenter
including the Management components, Applications/VMs, Analytics, Data Resiliency and
everything in between.

Lets look at a few examples of how Nutanix XCP running Acropolis Hypervisor (AHV) ensures
consistent high performance for all components:

Management Performance:
The Acropolis management layer includes the Acropolis Operating System (formally NOS),
Prism (HTML 5 GUI) and Acropolis Hypervisor (AHV) management stack made up of Master
and Slave instances.

This architecture ensures all CVMs actively and equally contribute to ensuring all areas of the
platform continue running smoothly. This means there is no central application, database or
component which can cause a bottleneck, being fully distributed is key to delivering a web-
scale platform.

Each Controller VM (CVM) runs the components required to manage the local node and
contribute to the distributed storage fabric and management tasks.

For example: While there is a single Acropolis Master it is not a single point of failure nor is it
a performance bottleneck.

The Acropolis Master is responsible for the following tasks:


1. Scheduler for HA
2. Network Controller
3. Task Executors
4. Collector/Publisher of local stats from Hypervisor
5. VNC Proxy for VM Console connections
6. IP address management
Each Acropolis Slave is responsible for the following tasks:
1. Collector/Publisher of local stats from Hypervisor
2. VNC Proxy for VM Console connections
Regardless of being a Master or Slave, each CVM performs the two heaviest tasks: The
Collection & Publishing of Hypervisor stats and, when in use, the VM console connections.

The distributed nature of the XCP platform allows it too achieve consistently high performance.
Sending stats to a central location such as a central management VM and associated database
server not only can become a bottleneck, but without introducing some form of application
level HA (e.g.: SQL Always On Availability Group) it also could be a single point of failure which
is for most customers unacceptable.

The roles which are performed by the Acropolis Master are all lightweight tasks such as the HA
scheduler, Network Controller, IP address management and Task Executor.

The HA scheduler task is only active in the event of a node failure which makes it a very low
overhead for the Master. The Network Controller task is only active when tasks such as new
VLANs are being configured and Task Execution is simply keeping track of all tasks and
distributing them for execution across all CVMs. IP address management is essentially a DHCP
service, which is also an extremely low overhead.

In part 8, we will discuss more about Acropolis Analytics.


Data Locality
Data locality is a unique feature of XCP where new I/O writes to the local node where the VM is
running as well as replicated to other node/s within the cluster. Data locality eliminates the
requirement for servicing subsequent Read I/O by traversing the network and utilizing a
remote controller.

As VMs migrate around a cluster, Write I/O is always written locally and remote reads will only
occur if remote data is accessed. If data is remote and never accessed, no remote I/O will occur.
As a result, it is typical for >90% of I/O to be serviced locally.

Currently bandwidth and latency across a well designed 10Gb network may not be an issue for
some customers, however as flash performance exponentially increases the network could
quite easily become a major bottleneck without moving to expensive 40Gb (or higher)
networking. Data locality helps minimize the dependency on the network by servicing the
majority of Read I/O locally and by writing one copy locally it reduces the overheads on the
network for Write I/O. Therefore Data Locality allows customers to run lower cost networking
without compromising performance.
While data locality works across all supported hypervisors, AHV is unique as it supports data-
aware virtual machine placement: Virtual Machines are powered onto the node with the
highest percentage of local data for that VM which minimizes the chance of remote I/O and
reduces the overheads involved in servicing I/O for each VM following failures or maintenance.

In addition, Data Locality also applies to the collection of back end data for Analysis such as
hypervisor and virtual machine statistics. As a result, statistics are written locally and a second
(or third for environments configured with RF3) written remotely. This means stats data which
can be a significant amount of data has the lowest possible impact on the Distributed File
System and cluster as a whole.

Summary:

1. Management components scale with the cluster to ensure consistent performance


2. Data locality ensures data is as close to the Compute (VM) as possible
3. Intelligent VM placement based on Data location
4. All Nutanix Controller VMs work as a team (not in pairs) to ensure optimal performance
of all components and workloads (VMs) in the cluster

Why Nutanix Acropolis hypervisor (AHV) is the next


generation hypervisor Part 7 Agility (Time to
value)
Posted on November 13, 2015 by Josh Odgers
Deploying other hypervisors and management solutions typically requires considerable design
effort and expertise in order to ensure consistent performance and to help minimize risk of
downtime while enabling as much agility as possible. Acropolis management requires almost
no design at all as the In-built-in management is optimized and highly available out-of-the-box.
This enables much faster deployment of AHV than any other hypervisor and associated
management components.

Regardless of the starting size on an AHV-based environment, all management, Analysis, Data
Protection and BC/DR components are automatically deployed and suitably sized. Regardless
of the AHV cluster, no management design effort is required. This results in a very fast
(typically <1hr for a single block deployment) time to value.

AHV also provides numerous features which ensure customers can deploy solutions in a timely
manner:

In-Built Management & Analytics


The fact that all tools required for cluster management are deployed automatically with the
cluster means time to value is not dependant on design/deployment/validation of these tools.
There isnt even a need to install a client to manage AHV, it is simply accessed via a Web
Browser.

Out of the box hardened configuration with In-Built Security/Compliance Auditing


Being hardened by default removes the risk of security flaws being introduced during
implementation phase while the automated auditing ensures in the event security settings are
modified during business as usual operations that the setting/s are returned to the required
security profile.

Intelligent cloning
The Distributed Storage Fabric combined with AHV to allow near instant clones of a Virtual
Machine. This feature works regardless of the power state of the VM, so its not restricted to
VMs which are powered off as with other hypervisors.

For a demo of this capability see: Nutanix Acropolis hypervisor acli cloning operations
Note: Cloning can be performed via Prism or acli (Acropolis CLI)

Summary:

1. Minimal design/implementation effort for AHV management is required


2. Where Multi-cluster central management is required, only a single VM is required (Prism
Central) which is deployed as a virtual appliance
3. No additional appliances/components to install for Analytics, Data Protection,
Replication or Management High Availability
4. No Subject Matter Experts required for an optimal Acropolis platform deployment

Why Nutanix Acropolis hypervisor (AHV) is the next


generation hypervisor Part 8 Analytics
(Performance / Capacity Management)
Posted on November 13, 2015 by Josh Odgers

Acropolis provides a powerful yet simple-to-use Analysis solution which covers the Acropolis
Platform, Compute (Acropolis Hypervisor / Virtual Machines) and Storage (Distributed Storage
Fabric).

Unlike other Analysis solutions, Acropolis requires no additional software licensing,


management infrastructure or virtual machines/applications to design/deploy or configure.
The Nutanix Controller VM includes built-in Analysis which have no external dependencies.
There is no need to extract/import data into another product or Virtual appliance meaning
lower overheads e.g.: Less data is required to be stored and less impact on storage.
Not only is this capability built in day one, but as the environment grows over time, Acropolis
automatically scales the analytics capability; there is never a tipping point where you need to
deploy additional instances, increase compute/storage resources assigned to Analytics Virtual
Appliances or deploy additional back end databases.

Summary:

1. In-Built analysis solution


2. No additional licensing required
3. No design/implementation or deployment of VMs/appliances required
4. Automatically scales as the XCP cluster/s grow
Lower overheads due to being built into Acropolis and utilizing the Distributed Storage
Fabric

Why Nutanix Acropolis hypervisor (AHV) is the next


generation hypervisor Part 10 Cost
Posted on November 13, 2015 by Josh Odgers
You may be surprised cost is so far down the list but as you have probably realized by reading
the previous 9 parts is that AHV is in many ways a superior virtualization platform to other
products on the market. In my opinion, it would be a mistake to think AHV is a low-cost
option or a commodity hypervisor with limited capabilities just because it happens to be
included with Starter Edition (making it effectively free for all Nutanix customers).

Apart from the obvious removal of hypervisor and associated management component
licensing/ELA costs, the real cost advantage of using AHV is the dramatic reduction in effort
required in the design, implementation, operational verification phases as well as ongoing
management.

This is due to many factors:

Simplified Design Phase

As all AHV Management components are in-built, highly available and auto scaling, there is no
need to engage a Subject Matter Expert (SME) to design the management solution. As a person
who has designed countless highly available virtualization solutions over the years, I can tell
you AHV out of the box is what I have all but dreamed of creating with other products for
customers in the past.

Simplified Implementation Phase


All management components (with the exception of Prism Central) are deployed automatically
removing the requirement for an engineer to install/patch/harden these components.

Building Acropolis and all management components into the CVM means there are fewer
moving parts that can go wrong and therefore that need to be verified.

In my experience, Operational Verification is one of the areas regularly overlooked and


infrastructure is put into production without having proven it meets the design requirements
and outcomes. With AHV management components deployed automatically, the risk of
components not delivering is all but eliminated and where Operational Verification is
performed, it can be completed much faster than traditional products due to having much
fewer moving parts.

Simplified ongoing operations

Acropolis provides One-Click fully automated rolling upgrades for Acropolis Base Software
(formally known as NOS), Acropolis Hypervisor, Firmware and Nutanix Cluster Check (NCC). In
addition, upgrades can be automatically downloaded removing the risk of installing
incompatible versions and the requirement to check things such as Hardware Compatability
Lists (HCLs) and interoperability matrix before upgrades.

AHV dramatically simplifies Capacity management by only requiring capacity management to


be done at the Storage Pool layer; there is no requirement for administrators to manage
capacity between LUNs/NFS mounts or Containers. This capability also eliminates the
requirement for well-known hypervisor features such as vSpheres Storage DRS.

Reduced 3rd party licensing costs

AHV includes all management components, or in the case of Prism Central, come as a
prepackaged appliance. There is no need to license any operating systems. The highly resilient
management components on every Nutanix node eliminates the requirement for 3rd party
database products such as Microsoft SQL or Oracle or best case scenario, the deployment of
Virtual Appliances which may not be highly available and which needs to be backed up and
maintained.

Reduced Management infrastructure costs

It is not uncommon for virtualization solutions to require a dozen or more management


components (each potentially on a dedicated VM) even for small deployments to get all the
functionality such as centralized management, patching and performance/capacity
management. As deployments grow or have higher availability requirements, the number of
management VMs and their compute requirements tend to increase.
As all management components run within the Nutanix Controller VM (CVM) which resides on
each Nutanix node, there is no need to have a dedicated management cluster. The amount of
compute/storage resources are also reduced.

The indirect cost savings for the reduced management infrastructure include:

1. Less rack space (RU)


2. Less power/cooling
3. Fewer network ports
4. Less compute nodes
5. Lower storage capacity & performance requirements
Last but not least, what about the costs associated with maintenance windows or outages?

Because Acropolis provides fully non-disruptive one-click upgrades and removes numerous
points of failure (e.g.: 3rd Party Databases) while providing an extremely resilient platform,
AHV also reduces the cost to the customer of maintenance and outages.

Summary:

1. No design required for Acropolis management components


2. No ongoing maintenance required for management components
3. Reduced complexity reduces the chance of downtime as a result of human error

Vous aimerez peut-être aussi