Académique Documents
Professionnel Documents
Culture Documents
6
First Published On: 05-08-2017
Last Updated On: 11-29-2018
Table of Contents
1. Introduction
1.1.Executive Summary
1.2.Introduction to Intelligent Rebuilds
2. vSAN 6.6 Intelligent Rebuilds
2.1.Purpose of Intelligent Rebuilds
2.2.Intelligent Rebuilds using Enhanced Rebalancing
2.3.Intelligent Rebuilds using Smart, Efficient Repairs
2.4.Intelligent Rebuilds using Partial Repairs
2.5.Intelligent Rebuilds using Resumable Resync
2.6.Intelligent Rebuilds with maintenance mode decommissioning
3. Usage scenarios
3.1.Activities that benefit from improved Intelligent Rebuilds
4. Conclusion
4.1.Conclusion
1. Introduction
Executive Summary, and Introduction
Executive Summary
A common requirement of enterprise storage systems is the ability to maintain resiliency and expected
levels of performance in the face of hardware faults and other unforeseen conditions. VMware vSAN
is the industry leading distributed storage system built right into VMware vSphere, and is designed to
offer the highest level of resiliency and performance, with the maximum amount of agility should
hardware faults occur, or demands of the environment change. vSAN's awareness of data placement
is tightly integrated with routine activities such as host decommissioning, and VMware High
Availability (HA). vSAN can also place data intelligently based on site topology designs such as
stretched clusters, and user defined fault domains. These, and many more conditions are accounted
for in vSAN's ability to intelligently manage performance, efficiency, and availability of data stored on
a cluster powered by vSAN. These mechanisms fall under the category of "intelligent rebuilds" and will
be discussed in more detail in this document.
By contrast, VMware vSAN is a distributed storage system that uses physical storage devices on each
ESXi host in a cluster to contribute to the vSAN storage system. vSAN removes the concept and
limitations around defining LUNs, and presents storage as a single, distributed datastore visible to all
hosts in the vSphere cluster. VMware vSAN is an object-based storage system integrated into VMware
vSphere. Virtual machines that live on vSAN storage are comprised of a number of storage objects.
VMDKs, VM home namespace, VM swap areas, snapshot delta disks, and snapshot memory maps are
all examples of storage objects in vSAN.
In vSAN, RAID protection levels are defined and controlled in VMware vCenter, using storage policy
based management (SPBM). vSAN protects data at an object level, giving you the ability to protect
VMs using various levels of protection (RAID-1, RAID-5, RAID-6) and performance on a per VM, or
even per VMDK basis. vSAN uses the concept of a RAID tree to ensure protection of objects. A
"component" is the "leaf" of an object's RAID tree, and is how redundancy is provided to a given
object. When using a RAID-1 protection scheme, a “replica” represents a complete copy of all of the
components that make up an object. Figure 1 illustrates the relationship between between objects,
components, and replicas.
Components may be split into smaller components depending on environment conditions, policy
settings, and the size of the objects. These disaggregated pieces of an object can be stored separately
from the other components that make up an object. vSAN automatically manages the distribution of
components across the hosts that constitute a vSAN cluster, and will actively rebuild or resynchronize
components when VM objects are not currently adhering to their defined protection policies, severely
imbalanced, or in the event of some operational change in the environment.
vSAN 6.6 has a number of improvements designed to offer more intelligent rebuilds, optimizing the
return to normal operations and compliance quickly, and automatically.
vSAN 6.6 includes a number of changes related to component rebuilds, such as:
Each enhancement will be described in more detail throughout this document, and will include
recommendations to ensure the best use of these technologies for your vSAN environment.
Rebalancing efforts fall into two general categories: Proactive rebalancing, and reactive rebalancing.
• Proactive rebalancing occurs when disks are less than 80% full. The opportunity to run a
proactive rebalance will only occur when any single disk has consumed 30% more capacity than
the disk with the lowest used capacity. vSAN automatically checks for these conditions, and
will provide a health check alert that will allow the user to invoke a rebalance using the
"Proactive Rebalance Disks" button in the Health section of the vSAN UI.
• Reactive rebalancing occurs when any disk is more than 80% full. This is an automatic process
taken by vSAN to ensure the best distribution of components. An imbalance will automatically
be detected, and vSAN will automatically invoke efforts to achieve better balance.
• Better decision making in rebalancing efforts. When determining the best strategy for moving
components, vSAN 6.6 factors in additional information about the components on disks
exceeding 80% capacity. vSAN evaluates how much data must be moved out of the identified
disk in order to meet the desired balancing objective, and factors this into the component
selection process. A more sophisticated selection process of components has been introduced
in vSAN 6.6 that prioritizes components so that the components selected for moving will make
a more meaningful difference in the rebalancing effort. Previous editions of vSAN relied on a
rebalancing master to publish placement decisions for all components moving out of the
source disk and host. This information could grow stale by the time the hosts act on the
information. In vSAN 6.6, data placement decisions are now based on a more recent cluster
state, courtesy of more up to date information from other hosts. This improved decision
making process for component placement can reduce the number of components being
moved, and the number of unsuccessful component moves, thereby reducing the overall
amount of CPU and network resources used to maintain proper balance.
• Splitting large components during redistribution. vSAN 6.6 can now take an individual large
component, and break it into smaller pieces for more optimal distribution. The breaking of
components into smaller chunks will only occur during reactive balancing, when a disk has
consumed more than 80% of its capacity. During the rebalancing process, if vSAN finds a new
disk with sufficient capacity for the component without the need for breaking into multiple
components, it will place the component there in its entirety, and not split the component. As
seen in Figure 2, if vSAN can't find a new disk without breaking up the component, it will break
into two, and retry the rebalancing effort. If it still cannot find a disk after the initial split, it will
continue to further divide the component, and retry the previous step. This logic will be
followed for up to 8 splits. The splitting of a component will only occur on the component being
redistributed. Previous editions of vSAN did not have this ability, and rebalancing efforts could
either lead to less than ideal placement, or simply the inability to rebalance due to large
component sizes coupled with minimal free space.
• Improved visibility of rebalancing. Two improvements have been made to provide better
visibility when rebalancing. The rebalance status has been updated to provide more frequent
updates with more accurate reports of progress. The improvements make it much easier to
report the number of bytes being moved out for rebalancing, and new resync graphs have been
introduced in the performance service so that resync traffic can be monitored in greater detail.
As shown in Figure 3, this can be found at the disk group level in the host-related vSAN
metrics.
• Proactive rebalance is needed. This state is triggered after vSAN determines proactive
rebalancing is needed following a 30 minute monitoring period.
• Proactive rebalance is in progress. This state is shown when a proactive rebalance is in
progress.
• Proactive rebalance failed. This is when the object manager couldn’t find any components
that could be moved due to policy restrictions, lack of resources, etc.
• Reactive rebalance task is in progress. This state simply notifies the user that a reactive
rebalance is in progress.
Note that rebalancing of components aims to achieve a relatively even distribution of components
across the storage devices contributing to storage in a vSAN data store. This is designed both to
optimize performance, by distributing across the most devices, and to minimize the size of any
10
given fault domain. It will not rebalance data in such a way that it will compromise a given
protection policy. For instance, vSAN will never place the other leaf component of an object with a
RAID-1 mirror on the same drive for the sake of balance.
The throttle represents the total bandwidth used by an individual disk group on a host, not the
entire host. The vSAN UI will show the resync bandwidth for the highest disk group per host. Once
resync throttling is applied, it will apply this limit to all disk groups that attempt to exceed that
limit. Throttling can be restricted down to 1MB per second.
11
In vSAN 6.6, key enhancements have been made in the effort of restoring compliance of protection
policies to objects, while minimizing resources used for a rebuild.
• Additional repair method. Prior to version 6.6, vSAN offered a strict component rebuild process
invoked after components were marked as absent for 60 minutes or longer. Once this rebuild
process started, it would continue until completion, even if the affected host would come back
online shortly after the 60 minute timeout window. This process would also resync the slightly
outdated components on the recently restored host while rebuilding the entire component
elsewhere, but could only use the rebuilt component upon completion. vSAN 6.6 introduces a
new repair method that also allows vSAN to take advantage of the resyncronized component
on the host, should vSAN deem this the more efficient method to use.
• New logic to determine best method to use. When a host or device comes back online after a
60 minute window, vSAN will look at the amount of data remaining for a component rebuild to
complete, versus how long it would take to repair or resync the outdated component, and
choose the method that will complete with the least amount of effort, cancelling the other
rebuild operation.
These improvements significantly reduce the amount of data that needs to be rebuilt, and the time
that it takes for objects to regain compliance of their protection policies. Reducing the need for full
component rebuilds can also free up space that is not needed immediately for the rebuild process.
The added repair method, and the new logic in determining the best method for repair is extremely
beneficial for hosts that were in their maintenance window just slightly longer than the 1 hour timeout
period, as well as for hosts that hold a large amount of consumed capacity for objects.
In vSAN 6.6, this rebuild and repairing process has been enhanced through the concept of "partial
repairs." Previous editions of vSAN would only be able to successfully execute a repair effort if there
were enough resources to repair all of the degraded, or absent components in entirety. The repair
process in vSAN 6.6 will take a more opportunistic approach to healing by repairing as many
degraded, or absent components as possible, even if there are insufficient resources to ensure full
compliance. The effective result is that an object might remain non-compliant after a partial repair,
but will still gain increased availability from those components that are able to be repaired.
Any remaining components that are not repaired to meet their full level of compliance according to
the SPBM policy will be repaired as soon as enough capacity resources become available.
An example of a partial repair process can demonstrated in a scenario of a 6 node vSAN cluster, with a
VM configured for an FTT set to 2. As shown in Figure 6, In the event that two host failures occur, the
VM would still be available, but the effective FTT would be 0, as there would only be one replica
remaining. With the new partial repair process, vSAN will initiate and complete a repair of the objects
if enough resources are available to increase the effective FTT level as a result of the repair. While the
objects might remain non-compliant from the desired SPBM policy of FTT=2, the partial repair
process will have increased the effective availability to FTT=1. vSAN will eventually complete the
repair to make the object fully compliant when resources become available. Increased resource
availability could come from adding hosts, adding capacity, deleting unused VMs, or reducing
protection levels of other VMs.
12
13
Partial repairs work in both standard and stretched cluster environments. An example of a partial
repair process in a stretched cluster environment can be demonstrated in a scenario of an 8 node
stretched cluster, with a VM configured for PFTT=1 (remote protection level), and SFTT=1 (Local
protection level). As shown in Figure 7, in the event that a site failure occurs, and the remaining site
has an host failure, the VM would still be available in the remaining site. The effective PFTT would be
0, and SFTT would be 0, as it would be the last replica remaining. The new partial repair process will
initiate and complete a “best effort” repair in the surviving site. While the object might remain non-
compliant across sites (PFTT), the partial repair process will have increased the effective local
protection level to a locally compliant level of SFTT=1. vSAN will eventually complete the repair to
make the object fully compliant across sites when resources become available.
14
15
In both examples, vSAN will give priority to the partial repair of data components over witness
components.
vSAN 6.6 has improved the resync action so that it is more resilient, and efficient. In prior editions of
vSAN, an interrupted resync would need to start the resync process from the beginning. An example
of a resync interruption includes an absent host coming back online, running the resync process,
followed by a brief network interruption. Changing an SPBM policy on an object while the host
containing the owner object is offline is another scenario in which resyncs would need to start from
the beginning. As shown in Figure 8, vSAN 6.6 is now able to transparently resume a resync operation
where it left off following an interruption, avoiding the need to reprocess already resynchronized data.
16
Resumable resyncs are achieved by accepting writes and tracking the changes on the component that
remains available. vSAN identifies the last write that the component has when the host containing the
component goes offline, while keeping the previous effort of resynchronized data. The updated writes
that are committed to the component still available is tracked separately from the original resync
tracking. Once the component comes back from being temporarily unavailable, the tracked changes
on the active components are merged into the components temporarily offline so that it can resume
the resynchronization process. As data is resynchronized, vSAN will incrementally update its
understanding of what data has been committed so that in the event of another incremental outage, it
does not have to resync data already synchronized.
"Ensure data accessibility" simply allows the VMs to remain accessible, yet potentially in a degraded
state of redundancy. This is typical for most quick maintenance mode operations, and no data will be
moved. “No data evacuation” is most often related to a full cluster shutdown, and just as the name
implies, no data will be moved. “Evacuate all data” will rebuild the components onto other hosts to
maintain the desired protection policies assigned to the VMs. This last option is commonly used for
activities such as hardware maintenance, decommissioning, or possibly ondisk format changes
introduced in vSAN. These modes apply not only to host entering maintenance mode, but disk, and
diskgroup EMM activities.
In vSAN 6.6, evacuating data to other hosts has been optimized to reduce the amount of overhead and
data migration during an EMM operation. This translates to quicker time to complete the EMM
operation.
17
The object manager will no longer attempt to fix compliance at an object level across the cluster
during a full evacuation, but rather, only strive to move all components from the node entering
maintenance mode onto other nodes in the cluster. vSAN will preserve a current object effective FTT
level during this operation. If an object had been assigned FTT=2 in its policy, but had an effective
availability of FTT=1, it will preserve this FTT=1 status for the EMM effort. This reduces the amount of
time required, minimizes data movement across the cluster, and also increases the chance for a
successful maintenance mode operation. Previous editions would require all affected objects to be
fully compliant, including the repair of other unrelated components before completing the EMM
process.
RECOMMENDATION: Choose the correct maintenance mode operation for your intention. Typical
vSphere patches are often applied quickly, and the most significant downtime might just be the reboot
process of the host. If you’re VMs can run with less resiliency for a brief amount of time, then the
“Ensure Accessibility” maintenance mode option is an extremely efficient way to go. Other
maintenance activities on a server expected to take a longer time period might be more suitable for
the “full evacuate” option.
18
3. Usage scenarios
Scenarios that demonstrate where particular Intelligent Rebuild functionality comes into play.
19
RECOMMENDATION: Use a single host maintenance mode for rolling cluster updates. In traditional
three-tier architectures, persistent storage was housed separately, and if a cluster had sufficient
compute resources to tolerate more than one host offline at any given time, this could speed up the
remediation process for host updates. In vSAN, a single host maintenance mode rolling update is a
better strategy as it will reduce the amount component resyncing to ensure proper compliance.
20
21
4. Conclusion
Conclusion
22
4.1 Conclusion
Conclusion
All of the individual enhancements to the task of intelligent rebuilds deliver specific performance and
efficiency improvements to vSAN. These improvements, while described individually in this document,
will have dramatic improvements in efficiency when vSAN uses them collectively against the demands
of real workloads, and environmental demands of a production environment.
Pete Koehler is a Sr. Technical Marketing Manager, working in the Storage and Availability Business
Unit at VMware, Inc. He specializes in enterprise architectures, data center analytics, software-defined
storage, and hyperconverged Infrastructures. Pete provides more insight to challenges of the data
center at vmpete.com, and can also be found on twitter at @vmpete.
23