Académique Documents
Professionnel Documents
Culture Documents
Virtualization has enabled increased flexibility and solved countless problems within the
datacenter. But over time I have observed an increase in complexity especially around the
management components which for many customers is a major pain point.
Complexity leads to things like increased cost (both CAPEX & OPEX) and risk, which commonly
leads to reduced availability/performance.
In Part 10, I will cover Cost in more depth so lets park it for the time being.
When architecting solutions for customers, my number one goal is to meet/exceed all my
customers requirements with the simplest solution possible.
Removing Dependencies
A great example of the simplicity of the Nutanix Xtreme Computing Platform (XCP) is its lack of
external dependencies. There is no requirement for any external databases when running
Acropolis Hypervisor (AHV) which removes the complexity of designing, implementing and
maintaining enterprise grade database solutions such as Microsoft SQL or Oracle.
This is even more of an advantage when you take into account the complexity of deploying
these platforms in highly available configurations such as AlwaysOn Availability Groups (SQL)
or Real Application Clusters (Oracle RAC) where SMEs need to be engaged for design,
implementation and maintenance. As a result of not being dependent on 3rd party database
products, AHV reduces/removes complexity around product interoperability or the need to call
multiple vendors if something goes wrong. This also means no more investigating Hardware
Compatibility Lists (HCLs) and Interoperability Matrixs when performing upgrades.
Management VMs
Only a single management virtual machine (Prism Central) needs to be deployed even for
multi-cluster globally distributed AHV environments. Prism Central is an easy to deploy
appliance and since its state-less, it does not require backing up. In the event the appliance is
lost, an administrator simply deploys a new Prism Central appliance and connects it to the
clusters which can be done in a matter of seconds per cluster. No historical data is lost as the
data is maintained on the clusters being managed.
Other supported hypervisors commonly require multiple management VMs and backend
databases even for relatively small scale/simple deployments just to provide basic
administration, patching and operations management capabilities.
Acropolis has zero dependencies during the installation phase, customers can implement a
fully featured AHV environment without any existing hardware/software in the datacenter. Not
only does this make initial deployment easy, but it also removes the complexity around
interoperability when patching or upgrading in the future.
Ease of Management
Nutanix XCP clusters running any hypervisor can be managed individually using Prism Element
or centrally via Prism Central.
The image below shows the PRISM Central Home Screen that provides a high-level summary of
all clusters in the environment. From this screen, you can drill down to individual clusters to
get more granular information where required.
Administrators perform all upgrades from PRISM without the requirement for external
update management applications/appliances/VMs or supporting back end databases.
PRISM performs one-click fully automated rolling upgrades to all components including
Hypervisor, Acropolis Base Platform (formally known as NOS), Firmware and Nutanix
Cluster Check (NCC).
For each Virtual Machine disk, AHV presents the vDisk directly to the VM, and the VM
simply sees the vDisk as if it were a physically attached drive. There is no in-guest
configuration. It just works.
This means there is no complexity around how many virtual SCSI controllers to use, or
where to place a VM or vDisk and as such, Acropolis has eliminated the requirement for
advanced features to manage virtual machine placement and capacity management such
as vSpheres Storage DRS.
Dont get me wrong, Storage DRS is a great feature which helps solve serious problems
with traditional storage. With XCP these problems just dont exist.
The following screen shot shows just how simple vDisks appear under the VM configuration
menu in Prism Element. There is no need to assign vDisks to Virtual SCSI controllers which
ensures vDisks can be added quickly and perform optimally.
Node Configuration
Configuring an AHV environment via Prism automatically applies all changes to each node
within the cluster. Critically, Acropolis Host Profiles functionality does not need to be
enabled or configured, nor do Administrators have to check for compliance or create/apply
profiles to nodes.
In AHV all networking is fully distributed similar to the vSphere Distributed Switch (VDS)
from VMware. AHV network configuration is automatically applied to all nodes within the
cluster without requiring the administrator to attach nodes/hosts to the virtual networking.
This helps ensure a consistent configuration throughout the cluster.
The reason the above points are so important is each dramatically simplifies the
environment by removing (not just abstracting) many complicated design/configuration
items such as:
Multipathing
Deciding How many datastores are required & what size each should be
Considering how many VMs should reside per datastore/LUN.
Configuration maximums for Datastores / Paths
Managing consistent configuration across nodes/hosts
Managing Network Configuration
Administrators can optionally join Acropolis built-in authentication to an Active Directory
domain, removing the requirement for additional Single Sign-On components. All Acropolis
components include High Availability out-of-the-box, removing the requirement to design
(and license) HA solutions for individual management components.
Data Protection / Replication
The Nutanix CVM includes built-in data protection and replication components, removing
the requirement to design/deploy/manage one or more Virtual Appliances. This also avoids
the need to design, implement and scale these components as the environment grows.
All of the data protection and replication features are also available via Prism and,
importantly, are configured on a per VM basis making configuration easier and reducing
overheads.
Summary
In summary the simplicity of the AHV eliminates:
1. Single points of failures for all management components out of the box
2. The requirement for dedicated management clusters for Acropolis components
3. Dependency on 3rd Party Operating Systems & Database platforms
4. The requirement for design, implementation and ongoing maintenance for Virtualization
management components
5. The need to design, install, configure & maintain a Web or Desktop type
6. Complexity such as
A. The requirement to install software or appliances to allow patching / upgrading
B. The requirement for an SME to design a solution to make management
components highly available
C. The requirement to follow complex Hardening Guides to achieve security
compliance.
D. The requirement for additional Appliances/interfaces and external dependencies
(i.e.: Database Platforms)
7. The requirement to license features to allow Centralised configuration management of
nodes.
VMware DRS is part of VMware Inc's vSphere virtualization suite, Enterprise and Enterprise
Plus editions. With vSphere 5.0 and newer releases, VMware introduced Storage DRS, which
balances VM storage consumption with data store clusters, based on the same principles as
VMware DRS.
The feature operates in two modes. Initial Placement determines the best
place to deploy a virtual machine based on the current capacity and load on
each data store within a cluster. From then on, DRS can provide
recommendations on where to move a virtual machine to improve either the
I/O response time, capacity or both.
VMware recommends that all data stores within a cluster have similar
performance characteristics to effectively load balance across all resources
VMwares creeping influence over storage
With the release of vSphere 5, VMware has further blurred the boundaries between
storage and the hypervisor. vSphere can now relocate virtual machines based on the
performance and capacity of the underlying storage. While this is a good feature for
basic storage arrays, there is the risk of causing performance issues with more
complicated storage environments. VMware is looking to push further into the storage
market with its vSphere Storage Appliance, which removes the need for many
organisations to implement a dedicated storage array. VMwares parent company,
EMC, is developing the ability to run virtual machines on the storage array, so we can
see a future where hypervisor and storage move closer together. The job of the
traditional storage administrator is changing to one of a more generalist nature, with the
need to have a range of other skills, including networking and virtualisation. It is unlikely
that the role of storage administrator will continue to be a dedicated practice for much
longer.
Scalability is not just about the number of nodes that can form a cluster or the maximum
storage capacity. The more important aspects of scalability is how an environment expands
from many perspectives including Management, Performance, Capacity, Resiliency and how
scaling effects Operational aspects.
Management Scalability
AHV automatically sizes all Management components during deployment of the initial cluster,
or when adding node/s to the cluster. This means there is no need to do initial sizing or manual
scaling of XCP management components regardless of the initial and final size of the cluster/s.
Where Resiliency Factor of 3 (N+2) is configured, the Acropolis management components will
be automatically scaled to meet the N+2 requirement. Lets face it, there is no point having N+2
at one layer and not at another because availability, like a Chain, is only as good as its weakest
link.
Scaling Storage-only nodes run AHV (which are interoperable with other supported
hypervisors) allowing customers to scale capacity regardless of Hypervisor. Storage-only nodes
do not require hypervisor licensing or separate management. Storage only nodes also fully
support all one-click upgrades for the Acropolis Base Software and AHV just like
compute+storage nodes. As a result, storage only nodes are invisible, well apart from the
increased capacity and performance which the nodes deliver.
Nutanix Storage only nodes help eliminate the problems when scaling capacity compared to
traditional storage, for more information see: Scaling problems with traditional shared storage.
Some of the scaling problems with traditional storage is adding shelves of drives and not
scaling data services/management. This leads to problems such as lower IOPS/GB and higher
impact to workloads in the event of component failures such as storage controllers.
Scaling storage only nodes is remarkably simple. For example a customer added 8 x NX6035C
nodes to his vSphere cluster via his laptop on the showroom floor of vForum Australia in
October of this year.
https://twitter.com/josh_odgers/status/656999546673741824
As each storage-only node is added to the cluster, a light-weight Nutanix CVM joins the cluster
to provide data services to ensure linear scale out management and performance capabilities,
thus avoiding the scaling problems which plague traditional storage.
Analytics Scalability
AHV includes built-in Analytics and as with the other Acropolis Management components,
Analysis components are sized automatically during initial deployment and scales
automatically as nodes are added.
This means there is never tipping point where there is a requirement for an administrator to
scale or deploy new Analysis instances or components. The analysis functionality and its
performance remains linear regardless of scale.
This means AHV eliminates the requirement for seperate software instances and
database/s to provide analytics.
Resiliency Scalability
As Acropolis uses the Nutanix Distributed Storage Fabric, in the event drive/s or node/s fail,
all nodes within the cluster participate in restoring the configured resiliency factor (RF) for
the impacted data. This occurs regardless of Hypervisor, however, AHV includes fully
distributed Management components; the larger the cluster, the more resilient the
management layer also becomes.
For example, the loss of a single node in a 4-node cluster would have potentially a 25%
impact on the performance of the management components. In a 32-node cluster, a single
node failure would have a much lower potential impact of only 3.125%. As an AHV
environment scales, the impact of a failure decreases and the ability to self-heal increases
in both speed to recover and number of subsequent failures which can be supported.
For information about increasing resiliency of large clusters, see: Improving Resiliency of
Large Clusters with EC-X
Performance Scalability
Regardless of hypervisor, as XCP clusters grow, the performance improves. The moment
new node(s) are added to the cluster, the additional CVM/s start participating in
Management and Data Resiliency tasks even when no VMs are running on the nodes.
Adding new nodes allows the Storage Fabric to distribute RF traffic among more Controllers
which enhances Write I/O & resiliency while helping decrease latency.
The advantage that AHV has over the other supported hypervisors is that the performance
of the Management components (many of which have been previously discussed)
dynamically scale with the cluster. Similar to Analytics, AHV management components
scale out. There is never a tipping point requiring manual scale out of management or new
deploying instances of management components or their dependencies.
Importantly, for all components, the XCP distributes data and management functions across
all nodes within the cluster. Acropolis does not use mirrored components/hardware or
objects which ensures no two nodes or components/hardware become a bottleneck or point
of failure.
Security is a major pillar of the XCP design. The use of innovative automation results in perhaps
the most hardened, simple and comprehensive virtualization infrastructure in the industry.
AHV is not designed to work with a comprehensive HCL of hardware vendors, nor does it have
countless bolt-on style products which need to be catered for. Instead Acropolis hypervisor has
been optimized to work with the Nutanix Distributed Storage Fabric and approved appliances
from Nutanix and OEM partners to provide all services/functionality in a truly Web scale
manner.
This allows for much tighter and targeted quality assurance and dramatically reduces the
attack surface compared to hypervisors.
The Security Development Lifecycle (SecDL) is leveraged across the entire Acropolis platform
ensuring every line of code is production ready. This design follows a defense-in-depth model
that removes all unnecessary services for libvirt/QEMU (SPICE, unused drivers), leverages
libvirt non-root group sockets for principle of least privilege, SELinux confined guests for
vmescape protection, and an embedded intrusion detection system.
Acropolis hypervisor has a documented and supported security baseline (XCCDF STIG),
and introduces the self-remediating hypervisor. On a customer defined interval, the
hypervisor is scanned for any changes to the supported security baseline, and resets the
baseline back to the secure state if any anomaly is detected in the background with no user
intervention.
Summary
Acropolis provides numerous security advantages including:
1. In-Built and self auditing Security Technical Implementation Guides (STIGs)
2. Hardened hypervisor out of the box without the requirement for administrators to apply
hardening recommendations
3. Reduced attack surface compared to other supported hypervisors
When discussing resiliency, it is common to make the mistake of only looking at data resiliency
and not considering resiliency of the storage controllers and the management components
required to service the business applications.
Legacy technologies such as RAID and Hot Spare drives may in some cases provide high
resiliency for data, however if they are backed by a dual controller type setups which
cannot scale out and self heal, the data may be unavailable or performance/functionality
severely degraded following even a single component failure. Infrastructure that is dependant
on HW replacement to restore resiliency following a failure is fundamentally flawed as I have
discussed in: Hardware support contracts & why 247 4 hour onsite should no longer be
required.
In addition if the management application layer is not resilient, then data layer high-
availability/resiliency may be irrelevant as the business applications may not be functioning
properly (i.e.: At normal speeds) or at all.
The Acropolis platform provides high resiliency for both the data and management layers at a
configurable N+1 or N+2 level (Resiliency Factor 2 or 3) which can tolerate up to two
concurrent node failures without losing access to Management or data. In saying that, with
Block Awareness, an entire block (up to four nodes) can fail and the cluster still maintains full
functionality. This puts the resiliency of data and management components on XCP up to N+4.
In addition, the larger the XCP cluster, the lower the impact of a node/controller/component
failure. For a four node environment, N-1 is 25% impact whereas for an 8 node cluster N-1 is
just a 12.5% impact. The larger the cluster the lower the impact of a controller/node failure. In
contrast a dual controller SAN has a single controller failure, and in many cases the impact is
50% degradation and a subsequent failure would result in an outage. Nutanix XCP
environments self heal so that even for an environment only configured for N-1, it is possible
following a self heal than subsequent failures can be tolerated without causing high impact or
outages.
In the event the Acropolis Master instance fails, full functionality will return to the environment
after an election which completes within <30 seconds. This equates to management availability
greater than six nines (99.9999%). Importantly, AHV has this management resiliency built-in;
it requires zero configuration!
As for data availability, regardless of hypervisor the Nutanix Distributed Storage Fabric
(DSF) maintains two or three copies of data/parity and in the event of a SSD/HDD or node
failure, the configured RF is restored by all nodes within the cluster.
Data Resiliency
While we have just covered why resiliency of data is not the only important factor, it is still
key. After all, if a solution which provides shared storage looses data, its not fit for purpose
in any datacenter.
As data resiliency is such a foundation to the Nutanix Distributed Storage Fabric, the Data
resiliency status is displayed on the Prism Home Screen. In the below screenshot we can
see is that the ability to provide resiliency in both steady state and in the event of a failure
(Rebuild Capacity) are both tracked.
In this example, all data in the cluster is compliant with the configured Resiliency Factor
(RF2 or 3) and the cluster has at least N+1 available capacity to rebuild after the loss of a
node.
To dive deeper into the resiliency status, simply click on the above box and it will expand to
show more granular detail of the failures which can be tolerated.
The below screen shot shows things like Metadata, OpLog (Persistent Write Cache) and
back end functions such as Zookeeper are also monitored and alerted when required.
In the event either of these is not in a normal or Green state, PRISM will alert the
administrator. In the event the alert is the cause of a node failure, Prism automatically
notifies Nutanix support (via Pulse) and dispatches the required part/s, although typically an
XCP cluster will self-heal long before delivery of hardware even in the case of an
aggressive Hardware Maintenance SLA such as 4hr Onsite.
This is yet another example of Nutanix not being dependent on Hardware (replacement) for
resiliency.
Data Integrity
Acknowledging a Write I/O to a guest operating system should only occur once the data is
written to persistent media because until this point, it is possible for data loss to occur even
when storage is protected by battery backed cache and uninterruptible power supplies
(UPS).
The only advantage to acknowledging writes before this has occurred is performance, but
what good is performance when your data lacks integrity or is lost?
Another commonly overlooked requirement of any enterprise grade storage solution is the
ability to detect and recover from Silent Data Corruption. Acropolis performs checksums in
software for every write AND on every read. Importantly Nutanix is in no way dependent on
the underlying hardware or any 3rd party software to maintain data integrity, all check
summing and remediation (where required) is handled natively.
Pro tip: If a storage solution does not perform checksums on Write AND Read, DO NOT use
it for production data.
In the event of Silent Data Corruption (which can impact any storage device from any
vendor), the checksum will fail and the I/O will be serviced from another replica which is
stored on a different node (and therefore physical SSD/HDD). If a checksum fails in an
environment with Erasure Coding, EC-X recalculates the data the same way as if a
HDD/SSD failed and services the I/O.
In the background, the Nutanix Distributed Storage Fabric will discard the corrupted data
and restore the configured Resiliency Factor from the good replica or stripe where EC-X is
used.
This process is completely transparent to the virtual machine and end user, but is a critical
component of the XCPs resiliency. The underlying Distributed Storage Fabric (DFS) also
automatically protects all Acropolis management components, this is an example of one of
the many advantages of the Acropolis architecture where all components are built together,
not bolted on afterwards.
Part of the Acropolis story around resiliency goes back to the lack of complexity. Acropolis
enables rolling 1-click upgrades and includes all functionality. There is no single point of
failure; in the worst-case scenario if the node with Acropolis master fails, within 30 seconds
the Master role will restart on a surviving node and initiate VMs to power on. Again this is in-
built functionality, not additional or 3rd party solutions which need to be designed/installed &
maintained.
The above points are largely functions of the XCP rather than AHV itself, so I thought I
would highlight a AHVs Load Balancing and failover capabilities.
Unlike traditional 3-tier infrastructure (i.e.: SAN/NAS) Nutanix solutions do not require multi-
pathing as all I/O is serviced by the local controller. As a result, there is no multi-pathing
policy to choose which removes another layer of complexity and potential point of failure.
However in the event of the local CVM being unavailable for any reason we need to service
I/O for all the VMs on the node in the most efficient manner. AHV does this by redirecting
I/O on a per vDisk level to a random remote stargate instance as shown below.
AHV can do this because every vdisk is presented via iSCSI and is its own target/LUN
which means it has its own TCP connection. What this means is a business critical
application such as MS SQL / Exchange or Oracle with multiple vDisks will be serviced by
multiple controllers concurrently.
As a result all VM I/O is load balanced across the entire Acropolis cluster which ensures no
single CVM becomes a bottleneck and VMs enjoy excellent performance even in a failure or
maintenance scenario.
For more information see: Acropolis Hypervisor (AHV) I/O Failover & Load Balancing
Summary:
Lets look at a few examples of how Nutanix XCP running Acropolis Hypervisor (AHV) ensures
consistent high performance for all components:
Management Performance:
The Acropolis management layer includes the Acropolis Operating System (formally NOS),
Prism (HTML 5 GUI) and Acropolis Hypervisor (AHV) management stack made up of Master
and Slave instances.
This architecture ensures all CVMs actively and equally contribute to ensuring all areas of the
platform continue running smoothly. This means there is no central application, database or
component which can cause a bottleneck, being fully distributed is key to delivering a web-
scale platform.
Each Controller VM (CVM) runs the components required to manage the local node and
contribute to the distributed storage fabric and management tasks.
For example: While there is a single Acropolis Master it is not a single point of failure nor is it
a performance bottleneck.
The distributed nature of the XCP platform allows it too achieve consistently high performance.
Sending stats to a central location such as a central management VM and associated database
server not only can become a bottleneck, but without introducing some form of application
level HA (e.g.: SQL Always On Availability Group) it also could be a single point of failure which
is for most customers unacceptable.
The roles which are performed by the Acropolis Master are all lightweight tasks such as the HA
scheduler, Network Controller, IP address management and Task Executor.
The HA scheduler task is only active in the event of a node failure which makes it a very low
overhead for the Master. The Network Controller task is only active when tasks such as new
VLANs are being configured and Task Execution is simply keeping track of all tasks and
distributing them for execution across all CVMs. IP address management is essentially a DHCP
service, which is also an extremely low overhead.
As VMs migrate around a cluster, Write I/O is always written locally and remote reads will only
occur if remote data is accessed. If data is remote and never accessed, no remote I/O will occur.
As a result, it is typical for >90% of I/O to be serviced locally.
Currently bandwidth and latency across a well designed 10Gb network may not be an issue for
some customers, however as flash performance exponentially increases the network could
quite easily become a major bottleneck without moving to expensive 40Gb (or higher)
networking. Data locality helps minimize the dependency on the network by servicing the
majority of Read I/O locally and by writing one copy locally it reduces the overheads on the
network for Write I/O. Therefore Data Locality allows customers to run lower cost networking
without compromising performance.
While data locality works across all supported hypervisors, AHV is unique as it supports data-
aware virtual machine placement: Virtual Machines are powered onto the node with the
highest percentage of local data for that VM which minimizes the chance of remote I/O and
reduces the overheads involved in servicing I/O for each VM following failures or maintenance.
In addition, Data Locality also applies to the collection of back end data for Analysis such as
hypervisor and virtual machine statistics. As a result, statistics are written locally and a second
(or third for environments configured with RF3) written remotely. This means stats data which
can be a significant amount of data has the lowest possible impact on the Distributed File
System and cluster as a whole.
Summary:
Regardless of the starting size on an AHV-based environment, all management, Analysis, Data
Protection and BC/DR components are automatically deployed and suitably sized. Regardless
of the AHV cluster, no management design effort is required. This results in a very fast
(typically <1hr for a single block deployment) time to value.
AHV also provides numerous features which ensure customers can deploy solutions in a timely
manner:
Intelligent cloning
The Distributed Storage Fabric combined with AHV to allow near instant clones of a Virtual
Machine. This feature works regardless of the power state of the VM, so its not restricted to
VMs which are powered off as with other hypervisors.
For a demo of this capability see: Nutanix Acropolis hypervisor acli cloning operations
Note: Cloning can be performed via Prism or acli (Acropolis CLI)
Summary:
Acropolis provides a powerful yet simple-to-use Analysis solution which covers the Acropolis
Platform, Compute (Acropolis Hypervisor / Virtual Machines) and Storage (Distributed Storage
Fabric).
Summary:
Apart from the obvious removal of hypervisor and associated management component
licensing/ELA costs, the real cost advantage of using AHV is the dramatic reduction in effort
required in the design, implementation, operational verification phases as well as ongoing
management.
As all AHV Management components are in-built, highly available and auto scaling, there is no
need to engage a Subject Matter Expert (SME) to design the management solution. As a person
who has designed countless highly available virtualization solutions over the years, I can tell
you AHV out of the box is what I have all but dreamed of creating with other products for
customers in the past.
Building Acropolis and all management components into the CVM means there are fewer
moving parts that can go wrong and therefore that need to be verified.
Acropolis provides One-Click fully automated rolling upgrades for Acropolis Base Software
(formally known as NOS), Acropolis Hypervisor, Firmware and Nutanix Cluster Check (NCC). In
addition, upgrades can be automatically downloaded removing the risk of installing
incompatible versions and the requirement to check things such as Hardware Compatability
Lists (HCLs) and interoperability matrix before upgrades.
AHV includes all management components, or in the case of Prism Central, come as a
prepackaged appliance. There is no need to license any operating systems. The highly resilient
management components on every Nutanix node eliminates the requirement for 3rd party
database products such as Microsoft SQL or Oracle or best case scenario, the deployment of
Virtual Appliances which may not be highly available and which needs to be backed up and
maintained.
The indirect cost savings for the reduced management infrastructure include:
Because Acropolis provides fully non-disruptive one-click upgrades and removes numerous
points of failure (e.g.: 3rd Party Databases) while providing an extremely resilient platform,
AHV also reduces the cost to the customer of maintenance and outages.
Summary: