Académique Documents
Professionnel Documents
Culture Documents
tm
tm
Architecting
iSCSI Storage for
Microsoft Hyper-V
Greg Shields
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Introduction to Realtime Publishers
by Don Jones, Series Editor
For several years now, Realtime has produced dozens and dozens of high‐quality books
that just happen to be delivered in electronic format—at no cost to you, the reader. We’ve
made this unique publishing model work through the generous support and cooperation of
our sponsors, who agree to bear each book’s production expenses for the benefit of our
readers.
Although we’ve always offered our publications to you for free, don’t think for a moment
that quality is anything less than our top priority. My job is to make sure that our books are
as good as—and in most cases better than—any printed book that would cost you $40 or
more. Our electronic publishing model offers several advantages over printed books: You
receive chapters literally as fast as our authors produce them (hence the “realtime” aspect
of our model), and we can update chapters to reflect the latest changes in technology.
I want to point out that our books are by no means paid advertisements or white papers.
We’re an independent publishing company, and an important aspect of my job is to make
sure that our authors are free to voice their expertise and opinions without reservation or
restriction. We maintain complete editorial control of our publications, and I’m proud that
we’ve produced so many quality books over the past years.
I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially
if you’ve received this publication from a friend or colleague. We have a wide variety of
additional books on a range of topics, and you’re sure to find something that’s of interest to
you—and it won’t cost you a thing. We hope you’ll continue to come to Realtime for your
educational needs far into the future.
Until then, enjoy.
Don Jones
i
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Introduction to Realtime Publishers ................................................................................................................. i
Chapter 1: The Power of iSCSI in Microsoft Virtualization .................................................................... 1
The Goal for SAN Availability Is “No Nines” ............................................................................................. 2
Hyper‐V Is Exceptionally Dependant on Storage ................................................................................... 3
VHD Attachment to VM ................................................................................................................................ 4
Pass‐Through Disks ....................................................................................................................................... 6
iSCSI Direct Attachment ............................................................................................................................... 7
VM‐to‐VM Clustering ..................................................................................................................................... 9
Host Boot from SAN ....................................................................................................................................... 9
Guest Boot from SAN ..................................................................................................................................... 9
VM Performance Depends on Storage Performance ......................................................................... 10
Network Contention ................................................................................................................................... 11
Connection Redundancy & Aggregation ............................................................................................ 12
Type and Rotation Speed of Drives ...................................................................................................... 12
Spindle Contention ...................................................................................................................................... 13
Connection Medium & Administrative Complexity ...................................................................... 13
iSCSI Makes Sense for Hyper‐V Environments .................................................................................... 15
Chapter 2: Creating Highly‐Available Hyper‐V with iSCSI Storage .................................................. 16
The Windows iSCSI Initiator: A Primer ................................................................................................... 17
NIC Teaming ................................................................................................................................................... 19
MCS .................................................................................................................................................................... 20
MPIO .................................................................................................................................................................. 22
Which Option Should You Choose? ...................................................................................................... 25
Getting to High Availability with Hyper‐V ............................................................................................. 26
Single Server, Redundant Connections .............................................................................................. 27
Single Server, Redundant Path............................................................................................................... 27
Hyper‐V Cluster, Minimal Configuration ........................................................................................... 29
ii
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Hyper‐V Cluster, Redundant Connections ........................................................................................ 29
Hyper‐V Cluster, Redundant Path ........................................................................................................ 30
High Availability Scales with Your Pocketbook ................................................................................... 31
Chapter 3: Critical Storage Capabilities for Highly‐Available Hyper‐V .......................................... 32
Virtual Success Is Highly Dependent on Storage ................................................................................ 33
Modular Node Architecture .................................................................................................................... 34
Redundant Storage Processors Per Node ......................................................................................... 36
Redundant Network Connections and Paths ................................................................................... 36
Disk‐to‐Disk RAID ........................................................................................................................................ 37
Node‐to‐Node RAID .................................................................................................................................... 38
Integrated Offsite Replication for Disaster Recovery .................................................................. 40
Non‐Interruptive Capacity for Administrative Actions ................................................................... 41
Volume Activities ......................................................................................................................................... 42
Storage Node Activities ............................................................................................................................. 43
Data Activities ............................................................................................................................................... 43
Firmware Activities .................................................................................................................................... 44
Storage Virtualization ..................................................................................................................................... 44
Snapshotting and Cloning ........................................................................................................................ 44
Backup and Restore with VSS Integration ........................................................................................ 45
Volume Rollback .......................................................................................................................................... 45
Thin Provisioning ........................................................................................................................................ 45
Storage Architecture and Management Is Key to Hyper‐V ............................................................. 46
Chapter 4: The Role of Storage in Hyper‐V Disaster Recovery .......................................................... 47
Defining “Disaster” ........................................................................................................................................... 47
Defining “Recovery”......................................................................................................................................... 49
The Importance of Replication, Synchronous and Asynchronous .............................................. 50
Synchronous Replication .......................................................................................................................... 50
iii
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Asynchronous Replication ....................................................................................................................... 51
Which Should You Choose? ..................................................................................................................... 52
Recovery Point Objective .................................................................................................................... 53
Distance Between Sites ........................................................................................................................ 53
Ensuring Data Consistency ........................................................................................................................... 54
Architecting Disaster Recovery for Hyper‐V ........................................................................................ 56
Choosing the Right Quorum ......................................................................................................................... 58
Node and Disk Majority............................................................................................................................. 58
Disk Only Majority ....................................................................................................................................... 58
Node Majority ................................................................................................................................................ 59
Node and File Share Majority ................................................................................................................. 59
Ensuring Network Connectivity and Resolution ................................................................................ 61
Disaster Recovery Is Finally Possible with Hyper‐V Virtualization ........................................... 61
iv
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Copyright Statement
© 2010 Realtime Publishers. All rights reserved. This site contains materials that have
been created, developed, or commissioned by, and published with the permission of,
Realtime Publishers (the “Materials”) and this site and any such Materials are protected
by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtime Publishers its web site
sponsors. In no event shall Realtime Publishers or its web site sponsors be held liable for
technical or editorial errors or omissions contained in the Materials, including without
limitation, for any direct, indirect, incidental, special, exemplary or consequential
damages whatsoever resulting from the use of any information contained in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, non-
commercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtime Publishers and the Realtime Publishers logo are registered in the US Patent &
Trademark Office. All other product or service names are the property of their respective
owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtime Publishers, please contact us via e-mail at
info@realtimepublishers.com.
v
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
[Editor's Note: This eBook was downloaded from Realtime Nexus—The Digital Library for IT
Professionals. All leading technology eBooks and guides from Realtime Publishers can be found at
http://nexus.realtimepublishers.com.]
Chapter 1: The Power of iSCSI in Microsoft
Virtualization
Virtualization is one of the hottest technologies to hit IT in years, with Microsoft’s Hyper‐V
R2 release igniting those flames even further. Hyper‐V arrives as a cost‐effective
virtualization solution that can be easily implemented by even the newest of technology
generalists.
But while Hyper‐V itself is a trivial implementation, ensuring its highest levels of
redundancy, availability, and most importantly performance are not. Due to virtualization’s
heavy reliance on storage, two of the most critical decisions you will make in implementing
Hyper‐V are where and how you’ll store your virtual machines (VMs).
Virtualization solutions such as Hyper‐V enable many fantastic optimizations for the IT
environment: VMs can be easily backed up and restored in whole, making affordable server
restoration and disaster recovery possible. VM processing can be load balanced across any
number of hosts, ensuring that you’re getting the most value out of your server hardware
dollars. VMs themselves can be rapidly deployed, snapshotted, and reconfigured as needed,
to gain levels of operational agility never before seen in IT.
Yet at the same time virtualization also adds levels of complexity to the IT environment.
Gone are the traditional notions of the physical server “chassis” and its independent
connections to networks and storage. Replacing this old mindset are new approaches that
leverage the network itself as the transmission medium for storage. With the entry of
enterprise‐worthy iSCSI solutions into the market, IT environments of all sizes can leverage
the very same network infrastructure they’ve built over time to host their storage as well.
This already‐present network pervasiveness combined with the dynamic nature of
virtualization makes iSCSI a perfect fit for your storage needs.
Correctly connecting all the pieces, however, is the challenge. To help, this guide digs deep
into the decisions that environments large and small must consider. It looks at best
practices for Hyper‐V storage topologies and technologies, as well as cost and
manageability implications for the solutions available on the market today. Both this and
the following chapter will start by discussing the technical architectures required to create
a highly‐available Hyper‐V infrastructure. In Chapter 2, you’ll be impressed to discover just
how many ways that redundancy can be inexpensively added to a Hyper‐V environment
using native tools alone.
1
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
If, like many, your storage experience is thus far limited to the disks you plug directly into
your servers, you’ll be surprised at the capabilities today’s iSCSI solutions offer. Whereas
Chapters 1 and 2 deal with the interconnections between server and storage, Chapter 3
focuses exclusively on capabilities within the storage itself. Supporting features such as
automatic restriping, thin provisioning, and built‐in replication, today’s iSCSI storage
provides enterprise features in a low‐cost form factor.
Finally, no storage discussion is fully complete without a look at the affordable disaster
recovery options made available by virtualizing. Chapter 4 discusses how iSCSI’s backup,
replication, and restore capabilities make disaster recovery solutions (and not just plans) a
real possibility for everyone.
But before we delve into those topics, we first need to start with your SAN architecture
itself. That architecture can arguably be the center of your entire IT infrastructure.
The Goal for SAN Availability Is “No Nines”
It has been stated in the industry that “The goal for SAN availability is ‘no nines’ or 100%
availability.” This is absolutely true in environments where data loss or non‐availability
have a recognizable impact on the bottom line. If your business loses thousands of dollars
for every second its data is not available, you’d better have a storage system that never,
ever goes down.
While such a goal could be laughable if it were applied to general‐purpose operating
systems (OSs) such as Microsoft Windows, 100% availability is not unheard of in the
specialized hardware solutions that comprise today’s SANs. No matter which company
builds your SAN, nor which medium it uses to transfer data, its single‐purpose mission
means that multiple layers of redundancy can be built‐in to its hardware:
• Multiple power supplies means that no single cable or power input loss can cause
a failure.
• Multiple and redundant connections between servers and storage ensure that a
connection loss can be survived.
• Redundant pathing through completely‐isolated equipment further protects
connection loss by providing an entirely separate path in the case of a downstream
failure.
• RAID configurations ensure that the loss of a single disk drive will not cause the
loss of an entire volume of data.
• Advanced RAID configurations further protect against drive loss by ensuring that
even multiple, simultaneous drive failures will not impact availability.
• Data striping across storage nodes creates the ultimate protection by preserving
availability even after the complete loss of SAN hardware.
2
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
All these redundant technologies are laid in place because a business’ data is its most
critical asset. Whether that data is contained within Microsoft Office documents or high‐
performance databases, any loss of data is fundamentally critical to a business’ operations.
Yet a business’ data is only one facet of the IT environment. That data is useless without the
applications that work with it and create meaning out of its bits and bytes. In a traditional
IT environment, those applications run atop individual physical servers, with OSs and
applications often installed to their own local direct‐attached storage. While your
applications’ data might sit within a highly‐available SAN, the thousands of files that
comprise each OS and its applications usually remain local.
With virtualization, everything changes. Moving that same environment to virtualization
encapsulates each server’s OS and its applications into a virtual disk. That virtual disk is
then stored within the very same SAN infrastructure as your business data. As a result,
making the move to virtualization effectively elevates your run‐of‐the‐mill OS and
application files to the same criticality as your business data.
Hyper‐V Is Exceptionally Dependant on Storage
Let’s take a look at the multiple ways in which this new criticality occurs. Figure 1.1 depicts
an extremely simplistic representation of a two‐node Hyper‐V cluster. In this cluster, each
server connects via one or more interfaces to the environment’s network infrastructure.
Through that network, the VMs atop each server communicate with clients to provide their
assigned services.
Important to recognize here is that high‐availability in Hyper‐V—like most virtualization
platforms—requires some form of shared storage to exist across every host in the cluster.
3
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 1.1: Highlyavailable HyperV at its simplest.
That shared storage is the location where Hyper‐V’s VMs reside. This is the case because
today’s VM high‐availability technologies never actually move the VM’s disk file. Whether
the transfer of ownership between two hosts occurs as a live migration with a running VM
or a re‐hosting after a physical host failure, the high‐availability relocation of a VM only
moves the processing and not the storage of that VM.
It is for this reason the storage component of any Hyper‐V cluster is its most critical
element. Every VM sits within that storage, every Hyper‐V host connects to it, and all the
processing of your data center’s applications and data are now centralized onto that single
device.
Yet this is only the simplest of ways in which a Hyper‐V cluster interacts with its storage.
Remember that iSCSI is in effect an encapsulation of traditional SCSI commands into
network‐routable packets. This encapsulation means that wherever your network exists, so
can your storage. As a result, there are a number of additional ways in which virtual hosts
and machines can connect to their needed storage. Let’s take a look through a few that
relate specifically to Hyper‐V’s VMs. You’ll find that not all options for connecting VMs to
storage are created alike.
VHD Attachment to VM
Creating a new VM requires assigning its needed resources. Those resources include one or
more virtual processors, a quantity of RAM, any peripheral connections, and the disk files
that contain its data. Any created VM requires at a minimum a single virtual hard disk
(VHD) to become its storage location.
4
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Although a single VHD is the minimum, it is possible to attach additional VHDs to a VM
either during its creation or at any point thereafter. Each newly‐attached VHD becomes yet
another drive on the VM. Figure 1.2 shows how a second VHD, stored at G:\Second Virtual
Hard Disk.vhd, has been connected to the VM named \\vm1.
Figure 1.2: Attaching a second VHD to an existing VM.
Attached VHDs are useful because they retain the encapsulation of system files into their
single .VHD file. This encapsulation makes them portable, enabling them to be disconnected
from one VM and attached to another at any point. As VHDs, they can also be backed up as a
single file using backup software that is installed to the Hyper‐V host, making their single‐
file restore possible.
However, VHDs can be problematic when backup software requires direct access to disks
for proper backups or individual file and folder restores. Also, some applications require an
in‐band and unfiltered SCSI connection to connected disks. These applications, while rare,
will not work with attached VHD files. Lastly, VHDs can only be connected or disconnected
when VMs are powered down, forcing any change to involve a period of downtime to the
server.
VHDs can be created with a pre‐allocated fixed size or can be configured to dynamically
expand as data is added to the VM. All VHDs are limited to 2040GB (or just shy of 2TB) in
size. Dynamically expanding VMs obviously reduces the initial amount of disk space
consumed by the freshly‐created VM. However, care must be taken when collocating
multiple dynamically‐expanding VHD files on a single volume, as the combination of each
VHD’s configured maximum size will often be greater than the maximum size of the volume
itself. Proactive monitoring must be laid into place to watch for and alert on growth in the
size of storage when dynamically‐expanding VHDs are used.
5
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
The level of expected performance between fixed and dynamic VHD files is only slightly
different when using Hyper‐V R2, with fixed disks seeing a slightly increased level of
performance over those created as dynamic. Dynamic VHD files incur an overhead during
write operations that expand the VHD’s size, causing a slight reduction in performance over
fixed disks. Microsoft testing suggests that fixed VHDs see performance that is equal to
native disk performance when run atop Hyper‐V R2. Dynamic disks experience between
85% and 94% of native performance, depending on the type of write operations being done
within the VM.
Your decision about whether to use fixed versus dynamic VHDs will depend on your need
for slightly better performance versus your available quantity of storage. Consumed
storage, however, does represent a cost. As you’ll discover in Chapter 3, the capability for
thin‐provisioning VM storage often outweighs any slight improvements in performance.
Pass‐Through Disks
An alternative approach to pulling extra disks into a VM is through the creation of a pass‐
through disk. With this approach, an iSCSI disk is exposed to the Hyper‐V host and then
passed through from the host to a residing VM. By passing through the disk rather than
encapsulating it into a VHD, its contents remain in their native format. This allows certain
types of backup and other software to maintain direct access to the disk using native SCSI
commands. As essentially raw mappings, pass‐through disks also eliminate the 2040GB
size limitation of VHDs, which can be a problem for very large file stores or databases.
Microsoft suggests that pass‐through disks achieve levels of performance that are
equivalent to connected VHD files. Pass‐through disks can also be leveraged in clustered
Hyper‐V scenarios by creating the disk as a clustered resource after assigning it to a VM.
Figure 1.3 shows how a pass‐through disk is created between a host and its residing VM.
Here, as in the previous example, pass‐through disks can only be attached to VMs that have
been powered off. In this image, the host’s Disk Management wizard has been displayed on
the left with a 256MB offline disk attached via iSCSI and initialized by the host. Once
initialized, the disk is taken offline and made available to the VM through its settings
wizard on the right. There, the VM’s second disk drive is attached to the passed‐through
hard drive, which is labeled Disk 4 0.25 GB Bus 0 Lun 0 Target 3.
6
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 1.3: Creating a passthrough disk.
Because they are not encapsulated into VHDs, pass‐through disks cannot be snapshotted by
Hyper‐V. However, because the files reside on‐disk in a native format, your storage solution
may be able to complete the snapshot from its own perspective. This storage‐level
snapshot can enable advanced storage‐level management functions such as replication,
backup and restore, and volume‐level cloning.
iSCSI Direct Attachment
Pass‐through disks can be an obvious choice when applications require that direct
mapping. Yet creating pass‐through disks adds a layer of complexity that needn’t be
present when there aren’t specific application requirements. A third option that makes
sense for most environments is the direct attachment of iSCSI‐based volumes right into the
VM. This process uses the VM’s iSCSI Initiator to create and manage connections to iSCSI
disks.
Note
Because direct attachment uses the VM’s iSCSI Initiator, this process only
works when used with an iSCSI SAN. Environments that use Fibre Channel
SANs cannot recognize this benefit and must resort to using pass‐through
disks.
7
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 1.4 shows how the iSCSI Initiator for VM \\vm1 is instead configured to connect
directly to the previous example’s 256MB disk. This connection is possible because of
iSCSI’s network pervasiveness. Further, the iSCSI Initiator runs as its own service that is
independent of the virtualization infrastructure, with the VM’s connection to its iSCSI disk
being completely isolated from the host.
Figure 1.4: Connecting directly to an iSCSI LUN from within a VM.
iSCSI direct attachment enables the highest levels of portability for network‐attached disks,
retaining all the desired capabilities of the previous examples but without their limitations.
Disks can be connected and disconnected at will without the need to reboot the VM. As with
VHDs, disks from one VM can be easily attached to an alternate should the need arise; and
similar to pass‐through disks, data that is contained within the disk remains in its native
format.
8
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
SAN Backups and VM Resources
When considering the use of a SAN for a virtualized environment, pay special
attention to its backup features. One very valuable feature is the ability to
directly back up disks without the need for backup agents within the VM. VM‐
installed agents tend to consume large levels of resources during the backup
process, which can have a negative impact on the virtual environment’s
overall performance. By backing up SAN data directly from the SAN, VMs
needn’t be impacted by backup operations. This capability represents
another benefit to the use of pass‐through or direct‐attached iSCSI disks.
VM‐to‐VM Clustering
Yet another capability that can be used by environments with iSCSI SANs is the creation of
clusters between VMs. This kind of clustering layers over the top of the clusters used by
Hyper‐V hosts to ensure the high availability of VMs.
Consider the situation where a critical network resource requires the highest levels of
availability in your environment. You may desire that resource to run atop a VM to gain the
intrinsic benefits associated with virtualization, but you also want to ensure that the
resource itself is clustered to maintain its availability during a VM outage. Even VMs must
be rebooted from time to time due to monthly patching operations, so this is not an
uncommon requirement. In this case, creating a VM‐to‐VM cluster for that network
resource will provide the needed resiliency.
VM‐to‐VM clusters require the same kinds of shared storage as do Hyper‐V host‐to‐host
clusters. Due to the limitations of the types of storage that can be attached to a VM, that
storage can only be created using iSCSI direct connection. Neither VHD attachment nor
pass‐through disks can provide the necessary shared storage required by the cluster. In
this architecture, a SAN disk is exposed and connected to both VMs via direct connection.
The result is a network resource that can survive the loss of both a Hyper‐V host as well as
the loss of a VM.
Host Boot from SAN
With SAN storage becoming so resilient that there is no longer any concern of failure, it
becomes possible to move all data off your server’s local disks. Eliminating the local disks
from servers accomplishes two things: It eliminates the distribution of storage throughout
your environment, centralizing everything into a single, manageable SAN solution. Second,
it abstracts the servers themselves, enabling a failed server to be quickly replaced by a
functioning one. Each server’s disk drives are actually part of the SAN, so replacing a server
is an exceptionally trivial process.
Guest Boot from SAN
A final solution that can assist with the rapid provisioning of VMs is booting hosted VMs
themselves from the SAN. Here, SAN disks are exposed directly to VMs via iSCSI, enabling
them to boot directly from the exposed disk. This final configuration is not natively
available in Windows Server 2008 R2, and as such requires a third‐party solution.
9
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
VM Performance Depends on Storage Performance
As you can see, in all of these architectures, the general trend is towards centralizing
storage within the SAN infrastructure. By consolidating your storage into that single
location, it is possible to perform some very useful management actions. Storage can be
backed up with much less impact on server and VM processing. It can be replicated to
alternate or offsite locations for archival or disaster recovery. It can be deduplicated,
compressed, thin provisioned, or otherwise deployed with a higher expectation of
utilization. In essence, while SAN storage for Hyper‐V might be more expensive than local
storage, you should expect to use it more efficiently.
Chapter 3 will focus in greater detail on those specific capabilities to watch for. Yet there is
another key factor associated with the centralization of storage that must be discussed
here. That factor relates to storage performance.
It has already been said that the introduction of virtualization into an IT environment
brings with it added complexities. These complexities arrive due to how virtualization adds
layers of abstraction over traditional physical resources. That layer of abstraction is what
makes VMs so flexible in their operations: They’re portable, they can be rapidly deployed,
they’re easily restorable, and so on.
Yet that layer of abstraction also masks some of those resources’ underlying activities. For
example, a virtual network card problem can occur because there is not enough processing
power. A reduction in disk performance can be related to network congestion. An entire‐
system slowdown can be traced back to spindle contention within the storage array. In any
of these situations, the effective performance of the virtual environment can be impacted
by seemingly unrelated elements.
Figure 1.5 shows how Hyper‐V’s reliance on multiple, interconnected elements creates
multiple points in which bottlenecking can occur. For example, network contention can
reduce the amount of bandwidth that is available for passing storage traffic. The type and
speed of drives in the storage array can impact their availability. Even the connection
medium itself—copper versus fibre, Cat 5 versus Cat 6a—can impact what resources are
available to what servers. Smart Hyper‐V administrators must always be aware of and
compensate for bottlenecks like these in the architecture. Without digging too deep into
their technical details, let’s take a look at a few that can be common in a Hyper‐V
architecture.
10
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 1.5: Virtual environments have multiple areas where performance can
bottleneck.
Network Contention
Every network connection has a hard limit on the quantity of traffic that can pass along it
over a period of time. This maximum throughput is in many environments such a large
quantity that monitoring it by individual server is unnecessary. Yet networks that run
virtual environments operate much differently than all‐physical ones. Consolidating
multiple VMs atop a single host means a higher rate of resource utilization (that same
“greater efficiency” that was spoken of earlier). Although this brings greater efficiency to
those resources, it also brings greater utilization.
Environments that move to virtualization must take into account the potential for network
contention as utilization rates go up. This can be alleviated through the addition of new and
fully‐separated network paths, as well as more powerful networking equipment to handle
the load. These paths can be as simple as aggregating multiple server NICs together for
failover protection, all the way through completely isolated connections through different
network equipment. With Hyper‐V’s VMs having a heavy reliance on their storage,
distributing the load across multiple paths will become absolutely necessary as the
environment scales.
Another resolution involves modifying TCP parameters for specified connections.
Microsoft’s Hyper‐V R2 supports the use of Jumbo Frames, a modification to TCP that
enables larger‐sized Ethernet frames to be passed across a network. With a larger quantity
of payload data being passed between TCP acknowledgements, the protocol overhead can
be reduced by a significant percentage. This results is a performance increase over existing
gigabit Ethernet connections.
11
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Note
Jumbo Frames are not enabled by default on servers, networking equipment,
or most SAN storage devices. Consult your manufacturer’s guide for the
specific details on how to enable this support. Be aware that Jumbo Frames
must be enabled on every interface in each path between servers and
storage.
Connection Redundancy & Aggregation
Connection redundancy in virtual environments is necessary for two reasons: First, the
redundant connection provides an alternate path for data should a failure occur. With
external cables connecting servers to storage in iSCSI architectures, the potential for an
accidental disconnection is high. For this reason, connection redundancy using MultiPath
I/O (MPIO) or Multiple Connected Session (MCS) is strongly suggested. Both protocols are
roughly equivalent in terms of effective performance; however, SAN interfaces often
support only one of the two options. Support for MPIO is generally more common in
today’s SAN hardware.
The second reason redundancy is necessary in virtualized environments is for augmenting
bandwidth. iSCSI connections to Hyper‐V servers can be aggregated using MPIO or MCS for
the express purpose of increasing the available throughput between server and storage. In
fact, Microsoft’s recommendation for iSCSI connections in Hyper‐V environments is to
aggregate multiple gigabit Ethernet connections in all environments. Ensure that your
chosen SAN storage has the capability of handling this kind of link aggregation across
multiple interfaces. Chapter 2 will discuss redundancy options in much greater detail.
Consider 10GbE
The 10GbE standard was first published in 2002, with its adoption ramping
up only today with the increased network needs of virtualized servers.
10GbE interface cards and drivers are available today by most first‐party
server and storage vendors. Be aware that the use of 10GbE between servers
and storage requires a path that is fully 10GbE compliant, including all
connecting network equipment, cabling, and interfaces. Cost here can be a
concern. With 10GbE being a relatively new technology, its cost is
substantially more expensive across the board, with 10GbE bandwidth
potentially costing more than aggregating multiple 1GbE connections
together.
Type and Rotation Speed of Drives
The physical drives in the storage system itself can also be a bottleneck. Multiple drive
types exist today for providing disk storage to servers: SCSI, SAS, and SATA are all types of
server‐quality drives that have at some point been available for SAN storage. Some drives
are intended for low‐utilization archival storage, while others are optimized for high‐speed
read and write rates. Virtualization environments require high‐performance drives with
high I/O rates to ensure good performance of residing VMs.
12
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
One primary element of drive performance relates to each drive’s rotation speed. Today’s
SAN drives tend to support speeds of up to 15,000RPM, with higher‐rotation speeds
resulting in greater performance. Studies have shown that rotation speed of storage drives
has a greater impact on overall VM performance than slow network conditions or
connection maximums. Ensuring that VHD files are stored on high‐performance drives can
have a substantial impact on their overall performance.
Spindle Contention
Another potential bottleneck that can occur within the storage device itself is an
oversubscription of disk resources. Remember that files on a disk are linearly written to
the individual platters as required by the OS. Hyper‐V environments tend to leverage large
LUNs with multiple VMs hosted on a single LUN. Multiple hosts have access to those VMs
via their iSCSI connections, and process their workloads as necessary.
Spindle contention occurs when too much activity is requested for the files on a small area
of disk. For example, if the VHD files for two high‐utilization VMs are located near each
other on the SAN’s disks. When these two VMs have a high rate of change, they require
greater than usual attention by the disk’s spindle as it traverses the platters to read and
write data. When the hardware spindle itself cannot keep up with the load placed upon it,
the result is a reduction in storage (and, therefore, VM) performance.
This problem can be easily resolved by re‐locating some of the data to another position in
the SAN array. Yet, protecting against spindle contention is not an activity that can be easily
accomplished by an administrator. There simply aren’t the tools at an administrator’s
disposal for identifying where it is and isn’t occurring. Thus, protection against spindle
contention is often a task that is automatically handled by the SAN device itself. When
considering the purchase of a SAN storage device for Hyper‐V, look for those that have the
automated capability to monitor and proactively fix spindle contention issues or that use
storage virtualization to abstract the physical location of data. Also, work with any SAN
vendor to obtain guidance on how best to architect your SAN infrastructure for the lowest‐
possible level of spindle contention.
Connection Medium & Administrative Complexity
Lastly is the connection medium itself, with many options available to today’s businesses.
The discussion on storage in this guide relates specifically to iSCSI‐based storage for a
number of different reasons: Administrative complexity, cost, in‐house experience, and
existing infrastructure all factor into the type of SAN that makes most sense for a business:
• Administrative complexity. iSCSI storage arrives as a network‐based wrapper
around traditional SCSI commands. This wrapper means that traditional TCP/IP is
used as its mechanism for routing. By consolidating storage traffic under the
umbrella of traditional networking, only a single layer of protocols need to be
managed by IT operations to support both production and storage networking.
13
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
• Inhouse experience. Fibre Channel‐based SANs tend to require specialized skills
to correctly architect the SAN fabric between servers and storage. These skills are
not often available in environments who do not have dedicated SAN administrators
on‐site. Further, skills in working with traditional copper cabling do not directly
translate to those needed for Fibre Channel connections. Thus, additional costs can
be required for training.
• Cost. Due to iSCSI’s reliance on traditional networking devices for its routing, there
is no need for additional cables and switching infrastructure to pass storage traffic
to configured servers. Existing infrastructure components can be leveraged for all
manner of iSCSI traffic. Further, when iSCSI traffic grows to the point where
expansion is needed, the incremental costs per server are reduced as well. Table 1.1
shows an example breakdown of costs to connect one server to storage via a single
connection. Although cables and Fibre Channel switch ports are both slightly higher
with Fibre Channel, a major area of cost relates to the specialized Host Bus Adapter
(HBA) that is also required. In the case of iSCSI, existing gigabit copper network
cards can be used.
Fibre Channel iSCSI
Table 1.1: Comparing the cost to connect one server to storage via Fibre Channel
versus iSCSI using a single connection.
• Existing infrastructure. Lastly, every IT environment already has a networking
infrastructure in place that runs across traditional copper connections. Along with
that infrastructure are usually the extra resources necessary to add connectivity
such as cables, switch ports, and so forth. These existing resources can be easily
repurposed to pass iSCSI traffic over available connections with a high degree of
success.
14
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
iSCSI Makes Sense for Hyper‐V Environments
Although useful for environments of all sizes, iSCSI‐based storage is particularly suited for
those in small and medium enterprises. These enterprises likely do not have the Fibre
Channel investment already in place, yet do have the necessary networking equipment and
capacity to pass storage traffic with good performance.
Successfully accomplishing that connection with Hyper‐V, however, is still more than just a
“Next, Next, Finish” activity. Connections must be made with the right level of redundancy
as well as the right architecture if your Hyper‐V infrastructure is to survive any of a
number of potential losses. Chapter 2 will continue this discussion with a look at the
various ways to implement iSCSI storage with Hyper‐V. That explanation will show you
how to easily add the right levels of redundancy and aggregation to ensure success with
your Hyper‐V VMs.
15
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Chapter 2: Creating Highly‐Available Hyper‐
V with iSCSI Storage
It’s worth saying again that Hyper‐V alone is exceptionally easy to set up. Getting the basics
of a Hyper‐V server up and operational is a task that can be completed in a few minutes and
with a handful of mouse clicks. But in the same way that building a skyscraper is so much
more than welding together a few I‐beams, creating a production‐worthy Hyper‐V
infrastructure takes effort and planning to be successful.
The primary reason for this dissonance between “installing Hyper‐V” and “making it ready
for operations” has to do with high availability. You can think of a Hyper‐V virtual
infrastructure in many ways like the physical servers that exist in your data center. Those
servers have high‐availability functions built‐in to their hardware: RAID for drive
redundancy, multiple power supplies, redundant network connections, and so on. Although
each of these is a physical construct on a physical server, they represent the same kinds of
things that must be replicated into the virtual environment: redundancy in networking
through multiple connections and/or paths, redundancy in storage through multipathing
technology, redundancy in processing through Live Migration, and so on.
Using iSCSI as the medium of choice for connecting servers to storage is fundamentally
useful because of how it aggregates “storage” beneath an existing “network” framework.
Thus, with iSCSI it is possible to use your existing network infrastructure as the
transmission medium for storage traffic, all without needing a substantially new or
different investment in infrastructure.
To get there, however, requires a few new approaches in how servers connect to that
network. Hyper‐V servers, particularly those in clustered environments, tend to make use
of a far, far greater number of network connections than any other server in your
environment. With interfaces needed for everything from production networking to
storage networking to cluster heartbeat, keeping straight each connection is a big task.
This chapter will discuss some of the best practices in which to connect those servers
properly. It starts with a primer on the use of the iSCSI Initiator that is natively available in
Windows Server 2008 R2. You must develop a comfort level with this management tool to
be successful, as Hyper‐V’s high level of redundancy requirements means that you’ll likely
be using every part of its many wizards and tabs. In this section, you’ll learn about the
multiple ways in which connections are aggregated for redundancy. With this foundation
established, this chapter will continue with a look at how and where connections should be
aggregated in single and clustered Hyper‐V environments.
16
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
The Windows iSCSI Initiator: A Primer
Every iSCSI connection requires two partners. On the server is an iSCSI initiator. This
initiator connects to one or more iSCSI targets that are located on a storage device
elsewhere on the network. In order to create this connection, software is required at both
ends. At the target is software that handles incoming connections, directs incoming
initiator traffic to the right LUN, and ensures that security is upheld between LUNs and the
initiators to which they are exposed.
Each computer connecting to that iSCSI storage device needs its own set of initiator
software to manage its half of the connection. Native to Windows Server 2008 R2 (as well
as previous editions, although we will not discuss them in this guide) is the Windows iSCSI
Initiator. Its functionality is accessed through a link of the same name that is found under
Administrative Tools. Figure 2.1 shows an example of the iSCSI Initiator Control Panel as
seen in Windows Server 2008 R2. Here, three separate LUNs have been created using the
iSCSI target device’s management toolset and exposed to the server.
Figure 2.1: The default iSCSI Initiator.
17
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Note
Storage connections are generally made first using the storage device’s
management toolset. They are exposed via a fully‐qualified domain name or
IP address to one or more servers that may or may not share concurrent
access.
The process to accomplish this first step is different based on the storage
device used. Each storage device vendor develops their own toolset for
accomplishing this task, with some leveraging Web‐based utilities while
others use client‐based utilities. Consult your vendor’s administrative guide
for details on this process.
In the simplest of configurations, connecting an iSCSI target to an initiator is an
exceptionally easy process. First, enter the target DNS name or IP address into the box
labeled Target, and click Quick Connect. If settings have been correctly configured at the
iSCSI SAN, with LUNs properly exposed to the server, a set of targets should appear in the
box below. Selecting each target and clicking Connect will enable the connection.
At this point, the disk associated with that target’s LUN will be made available within the
server’s Disk Management console. There, it will need to be brought online, initialized, and
formatted to make it useable by the system.
From an architectural standpoint, a number of on‐system components must work in
concert for this connection to occur. As Figure 2.2 shows, the operating system (OS) and its
applications leverage the use of a vendor‐supplied disk driver to access SCSI‐based disks.
The SCSI layer in turn wraps its commands within iSCSI network traffic through the iSCSI
Initiator, which itself resides atop the server’s TCP/IP communication. Incoming traffic
arrives via a configured NIC, and is recompiled into SCSI commands, which are then used
by the system for interaction with its disks.
18
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 2.2: Multiple elements work together to enable an OS to interact with iSCSI
disks.
This overt simplicity in configuring LUNs with a single path belies the added complexity
when adding additional connections. You’ll find that once your number of storage
connections grows beyond one per server, your number of high‐availability options and
configurations within each option grows dramatically. To assist, let’s take a look through
the three kinds of high‐availability options that are commonly considered today.
NIC Teaming
One option for high availability that is often a first consideration by many administrators is
the use of NIC teaming at the server. This option is often first considered because of the
familiarity administrators have with NIC teaming over production networks. Using this
method, multiple NICs are bonded together through the use of a proprietary NIC driver. In
Figure 2.2, this architecture would be represented by adding additional arrows and NIC
cards below the element marked TCP/IP.
Although teaming or bonding together NICs for the purpose of creating storage connection
redundancy is indeed an option, be aware that this configuration is neither supported nor
considered a best practice by either Microsoft or most storage vendors. As such, it is not an
option that most environments should consider for production deployments.
19
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Although NIC teaming provides the kind of redundancy that works for traditional network
connections, two alternative protocols have been developed that accomplish the same goal
but with better results. Multipath Input/Output (MPIO) and Multiple Connections per
Session (MPS) are two very different protocols that enable multiple connections between
servers and storage and are designed specifically to deal with the needs of network storage
traffic.
MCS
The first of these protocols is MCS. This protocol operates at the level of the iSCSI initiator
(see Figure 2.3) and is a part of the iSCSI protocol itself, defined within its RFC. Its protocol‐
specific technology enables multiple, parallel connections between a server and an iSCSI
target. As a function of the iSCSI initiator, MCS can be used on any connection once the
iSCSI initiator is enabled for use on the system.
Disk Driver
SCSI
iSCSI Initiator
TCP/IP
NIC NIC
To enable MCS for a storage connection, connect first to a target and then click Properties
within the Targets tab of the iSCSI Initiator Control Panel. In the resulting screen, click the
button marked MCS that is found at the bottom of the wizard. With MCS, multiple initiator
IP addresses are connected to associated target portal IP addresses on the storage device
(see Figure 2.4). These target portal IP addresses must first be configured on both the
server and the storage device prior to connecting an initiator.
20
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Unlike MPIO, which will be discussed next, MCS does not require any special multipathing
technology to be coded by the manufacturer and installed to connecting servers; however,
support must be available on the storage device. Consult your manufacturer’s
documentation to verify whether MCS support is available within your storage hardware.
Figure 2.4: Configuring MCS for a storage connection.
MCS can be configured with one of five different policies, with each policy determining the
behavior of traffic through the connection. Policies with MCS are configured per session
and apply to all LUNs that are exposed to that session. As such, individual sessions between
initiator and target are given their own policies. The five policies function as follows:
• Fail Over Only. With a failover policy, there is no load balancing of traffic across the
session’s multiple connections. One path is used for all communication up until the
failure of that path. When the active path fails, traffic is then routed through the
standby path. When the active path returns, routing of traffic is returned back to the
original path.
• Round Robin. Using Round Robin, all active paths are used for routing traffic. Using
this policy, communication is rotated among available paths using a round robin
approach.
21
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
• Round Robin with a subset of paths. This third policy operates much like the Round
Robin policy, with one important difference. Here, one or more paths are set aside
as standby paths to be used similar to those in the Fail Over Only policy. These paths
remain in standby until a primary path failure occurs. At that point, the standby path
is used in the Round Robin with the surviving paths. When the failed primary path
returns, traffic is routed again through that path, returning the subset path to
standby.
• Least Queue Depth. The Least Queue Depth policy functions similarly to Round
Robin, with the primary difference being in the determination of how traffic is load
balanced across paths. With Least Queue Depth, each request is sent along the path
that has the least number of queued requests.
• Weighted Paths. Weighted Paths provides a way to manually configure the weight of
each path. Using this policy, each path is assigned a relative weight. Traffic will be
balanced across each path based on that assigned weight.
MPIO
Another option for connection redundancy is MPIO. This protocol accomplishes the same
functional result as MCS but uses a different approach. With MPIO (see Figure 2.5), disk
manufacturers must create drivers that are MPIO‐enabled. These disk drivers include a
Device‐Specific Module (DSM) that enables the driver to orchestrate requests across
multiple paths. The benefit with MPIO is in its positioning above the SCSI layer. There, a
single DSM can be used to support multiple network transport protocols such as Fibre
Channel and iSCSI.
22
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
SCSI
iSCSI Initiator
TCP/IP
NIC NIC
Your hardware manufacturer’s DSM must be installed to each server where you intend to
use MPIO. Alternatively, a default DSM is available in Windows Server 2008 R2 that
functions with many storage devices. Consult your manufacturer’s documentation to verify
whether their separate vendor driver installation is required, or if the default Windows
DSM is supported.
To use the default DSM with iSCSI storage, two steps are necessary. First, install the
Multipath I/O Feature from within Server Manager. Installing this feature requires a reboot
and makes available the MPIO Control Panel within Administrative Tools. Step two involves
claiming all attached iSCSI devices for use with the default Microsoft DSM. Do this by
launching the MPIO Control Panel and navigating to the Discover Multi‐Paths tab. There,
select the Add support for iSCSI devices check box and reboot the computer. This process
instructs the server to automatically claim all iSCSI devices for the Microsoft DSM,
regardless of their vendor or product ID settings.
Once enabled, MPIO is configured through the iSCSI Initiator Control Panel. There, select an
existing target and click Connect. In the resulting screen, select the Enable multipath check
box and click Advanced. The Advanced settings screen for the connection provides a place
where additional initiator IP addresses are connected to target portal IP addresses.
Repeating this process for each source and target IP address connection will create the
multiple paths used by MPIO.
23
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Verifying path creation is accomplished by selecting an existing target and clicking Devices
and then MPIO. The resulting screen, seen in Figure 2.6, displays the configured paths from
the server to the target. Also in this location is the selection for configuring the load‐
balance policy for the LUN.
Figure 2.6: Setting an MPIO loadbalance policy.
MPIO in Windows Server 2008 R2 can use any of the same five policies as MCS as well as
one additional policy. Since the DSM operates at the level of the disk driver, it can
additionally load balance traffic across routes based on the number of data blocks being
processed. This sixth policy, named Least Blocks, will route each subsequent request down
the path that has the fewest data blocks being processed.
It is important to note that policies with MPIO are applied to individual devices (LUNs),
enabling each connected LUN to be assigned its own policy based on need. This behavior is
different than with MCS, where each LUN that is exposed into a single session must share
the same policy.
Note
Be aware that MPIO and MCS both achieve link redundancy using a protocol
that exists above TCP/IP in the network protocol stack, while NIC teaming
uses protocols that exist below TCP/IP. For this reason, each individual MPIO
or MCS connection requires its own IP address that is managed within the
iSCSI Initiator Control Panel. This is different than with NIC teaming, where
ports are aggregated via the switching infrastructure and a single IP address
is exposed.
24
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Which Option Should You Choose?
It is commonly considered that MPIO and MCS are relatively similar in their level of
performance and overall manageability. Microsoft’s MPIO tends to use fewer processor
resources than MCS, particularly under heavy loads; however, MCS tends to have slightly
better performance as long as the number of connections per session remains low.
With this in mind, consider the following guidelines when determining which option for
storage connection redundancy you should choose for your Hyper‐V environment:
• Traditional NIC teaming is not considered a best practice for storage connections.
• Some storage devices do not support the use of MCS. In these cases, your only option
is to use MPIO.
• Use MPIO if you need to support different load‐balancing policies on a per‐LUN
basis. This is suggested because MCS can only define policies on a per‐session basis,
while MPIO can define policies on a per‐LUN basis.
• Hardware iSCSI HBAs tend to support MPIO over MCS as well as include other
features such as Boot‐from‐iSCSI. When using hardware HBAs, consider using MPIO.
• MPIO is not available on Windows XP, Windows Vista, or Windows 7. If you need to
create iSCSI direct connections to virtual machines, you must use MCS.
• Although MCS provides a marginally better performance over MPIO, its added
processor utilization can have a negative impact in high‐utilization Hyper‐V
environments. For this reason, MPIO may be a better selection for these types of
environments.
Do I Need Hardware iSCSI HBAs?
This guide has talked extensively about the use of traditional server NICs as
the medium for iSCSI network traffic. However, specialized hardware HBAs
for iSCSI traffic exist as add‐ons. These specialized devices are dedicated for
use by iSCSI connections and potentially provide a measure of added
performance over traditional network cards.
As such, you may be asking “Do I need these special cards in my Hyper‐V
servers?” Today’s conventional wisdom answers this question with, “perhaps
not.”
iSCSI network processing represents a relatively small portion of the overall
processing of SCSI disk commands in Windows, with the majority of
processing occurring in the network stack, kernel, and file system. Windows
Server 2008 in cooperation with server‐class NIC vendors now includes
support for a number of network optimizations (TCP Chimney, Receive Side
Scaling, TCP Checksum Offload, Jumbo Frames) that improve the overall
processing of network traffic, and therefore iSCSI processing as well.
25
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
One traditional configuration where hardware iSCSI HBAs have been
necessary was when Boot‐from‐iSCSI was desired. These HBAs have typically
included the necessary pre‐boot code needed to boot a server from an iSCSI
SAN. However, today’s production NICs found in your everyday servers are
beginning to natively support Boot‐from‐iSCSI, further driving the answer to
this question towards a resounding “no.”
Getting to High Availability with Hyper‐V
All of this discussion prepares us to answer the primary question: How does one achieve
high availability with HyperV and iSCSI? With all the architectural options available,
answering this question best requires a bit of an iterative approach. That iterative
approach recognizes that every implementation of Hyper‐V that desires true high
availability must do so via the Windows Failover Clustering feature.
This general‐purpose clustering solution enables Windows Servers to add high availability
to many different services, Hyper‐V being only one in its long list. Thus, being successful
with highly‐available Hyper‐V also requires skills in Windows Failover Clustering. While
the details of installing and working with Windows Failover Clustering are best left for
other publications, this chapter and guide will assist with correctly creating the needed
storage and networking configurations.
The second point to remember is that Windows Failover Clustering requires the use of
shared storage between all hosts. Using Cluster Shared Volumes (CSV) in Windows Server
2008 R2, this storage is actively shared by all cluster nodes, with all nodes accessing
connected LUNs at once. Microsoft’s CSV transparently handles the necessary arbitration
between cluster nodes to ensure that only one node at a time can interact with a Hyper‐V
virtual machine or its configuration files.
Going a step further, Hyper‐V and its high‐availability clusters can obviously be created in
many ways, with a range of redundancy options available depending on your needs,
available hardware, and level of workload criticality. Obviously, the more redundancy you
add to the environment, the more failures you can protect yourself against, but also the
more you’ll spend to get there.
It is easiest to visualize these redundancy options by iteratively stepping through them,
starting with the simplest options first. The next few sections will start with a very simple
single‐server implementation that enjoys redundant connections. Through the next set of
sections, you’ll see where additional levels of redundancy can be added to protect against
various types of failures.
To keep the figures simple, color‐coding has been used for the connections between server
and network infrastructure. That color‐coding is explained in Figure 2.7. As you can see,
Production Network (or, “virtual machine”) connections are marked in green, with Storage
Network connections marked in red. It is a best practice as Hyper‐V clusters are scaled out
to separate management traffic from virtual machine traffic, and where appropriate, it is
labeled in black. It is also recommended that the cluster itself be reserved a separate
network for its heartbeat communication. That connection has been labeled in blue where
appropriate.
26
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 2.7: Color coding for the following set of figures.
Single Server, Redundant Connections
The simplest of configurations involves creating a single‐server Hyper‐V environment (see
Figure 2.8). Here, a single server connects to its network via a single network switch. This
configuration is different from the overly‐simplistic diagram first seen in Figure 1.1 in that
both the Production Network and Storage Network connections have been made
redundant in the setup in Figure 2.8.
Figure 2.8: Single HyperV server, redundant connections.
In this architecture, Hyper‐V server traffic is segregated into two different subnets. This is
done to separate storage traffic from production networking traffic, and is an important
configuration because of Hyper‐V’s very high reliance on its connection to virtual machine
storage. Separating traffic in this manner ensures that an overconsumption of traditional
network bandwidth does not impact the performance of running virtual machines.
Both connections in this architecture have also been made highly redundant, though
through different means. Here, Production Network traffic is teamed using a network
switching protocol such as 802.3ad NIC teaming, while Storage Network traffic is
aggregated using MPIO or MCS.
Single Server, Redundant Path
Although this first configuration eliminates some points of failure through its addition of
extra connections, the switch to which those connections occur becomes a very important
single point of failure. Should the switch fail, every Hyper‐V virtual machine on the server
will cease to operate.
27
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
To protect against this situation, further expansion of connections can be made to create a
fully‐redundant path between the Hyper‐V server and the production network core as well
as between Storage Network NICs and the storage device. Figure 2.9 shows how this might
look.
Figure 2.9: Single HyperV server, fullyredundant path.
In this configuration, Production Network connections for the single Hyper‐V server can
either remain in their existing configuration to the single network switch or they can be
broken apart and directed to different switches. This option is presented here and
elsewhere with an “optional” dashed line because not all networking equipment can
support the aggregating of links across different switch devices.
This limitation with NIC teaming highlights one of the benefits of MPIO and MCS. Due to
these protocols’ position above TCP/IP, each Storage Network connection leverages its
own IP address. This address can be routed through different paths as necessary with the
protocol reassembling data on either end.
Note
It is also important to recognize in any redundant path configuration that a
true “redundant path” requires separation of traffic at every hop between
server and storage. This requirement can make redundant pathing an
expensive option when supporting networking equipment is not already in
place.
28
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Hyper‐V Cluster, Minimal Configuration
Yet even the most highly‐available network path doesn’t help when a Hyper‐V server’s
motherboard dies in the middle of the night. To protect against the loss of an individual
server, Hyper‐V must run atop a Windows Failover Cluster. This service enables virtual
machines to be owned by more than one server, as well as enables the failover of virtual
machine ownership from one host to another.
As such, creating a Hyper‐V cluster protects against an entirely new set of potential
failures. Such a cluster, see Figure 2.10, requires that all virtual machines are stored
elsewhere on network‐attached disks, with all cluster nodes having concurrent access to
their LUN.
Figure 2.10: HyperV Cluster, minimal configuration.
Microsoft’s recommended minimum configuration for an iSCSI‐based Hyper‐V cluster (or,
indeed any iSCSI‐based Windows Failover Cluster) requires at least three connections that
exist on three different subnets. Like before, one connection each is required for
Production Network and Storage Network traffic. A third connection is required to handle
inter‐cluster communication, commonly called the “heartbeat.” This connection must be
segregated due to the low tolerance for latency in cluster communication.
Hyper‐V Cluster, Redundant Connections
Such a cluster configuration like the one previously explained actually removes points of
redundancy as it adds others. Although the previous configuration has the potential to
survive the loss of a cluster host, its configuration is no longer redundant from the
perspective of the network connections coming out of each server.
Needed at this point is the merging of the redundant configuration from the single‐server
configuration with the now clustered configuration. That updated architecture is shown in
Figure 2.11. There, both Production Network and Storage Network connections have been
made redundant to the network switch using the protocols explained earlier.
29
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
You’ll also see in Figure 2.11 that an additional black line has been drawn between both
Hyper‐V servers and the network switch. This line represents an additionally‐segregated
network connection that is used for managing Hyper‐V. It is considered a best practice with
mature Hyper‐V implementations to segregate the management of a Hyper‐V server from
the networks used by its virtual machines. This is done for several reasons:
• Segregation of security domains—Virtual machines operate at a security trust level
that is considered higher than that of management traffic. By segregating virtual
machine traffic from management traffic, virtual machines can be better monitored.
Further, management connections cannot be used to intercept virtual machine
communications.
• Segregation of Live Migration traffic—Transferring ownership of a virtual machine
from one host to another can consume a large amount of available bandwidth over a
short period of time. This consumption can have a negative impact on the
operations of other virtual machines. By segregating Live Migration traffic into its
own subnet that is shared with management traffic, this effect is eliminated.
• Protection of management functionality—In the case where a network attack is
occurring on one or more virtual machines, segregating management traffic ensures
that the Hyper‐V host can be managed while troubleshooting and repair functions
are completed. Without this separate connection, it can be possible for a would‐be
attacker to deny access to administrators to resolve the problem.
Hyper‐V Cluster, Redundant Path
Finally, this discussion culminates with the summation of all the earlier architectures,
combining redundant paths with a fully‐functioning Hyper‐V cluster. This architecture, see
Figure 2.12, enjoys all the benefits of the previous iterations all at once, yet requires the
greatest number of connections as well as the largest amount of complexity.
30
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 2.12: HyperV cluster, fullyredundant path.
It should be obvious to see the level of added cost that an environment such as this brings.
Each cluster node requires a minimum of six network connections, spread across two
halves of a switching infrastructure. Due to their inability to share RAM resources between
running virtual machines, Hyper‐V clusters operate with the greatest levels of efficiency
when they are configured with more nodes than less. Thus, a four‐node cluster will require
24 connections, with an eight‐node cluster requiring 48 connections, and so on.
High Availability Scales with Your Pocketbook
With all this added expenditure comes the protection against many common problems.
Individual servers can fail and virtual machines will automatically relocate elsewhere.
Disks can fail and be automatically replaced by storage device RAID. Individual connections
can fail with the assurance that surviving connections will maintain operations. Even an
entire switch can fail and keep the cluster active. It is important to recognize that your level
of need for high availability depends on your tolerance for loss. As with physical servers,
more redundancy options costs you more but ensures higher reliability.
But reliability in Hyper‐V’s storage subsystem is fundamentally critical as well. If you create
all these connections but attach them to a less‐than‐exemplary SAN, then you’ve still set
yourself up for failure. Finding the right features and capabilities in such a storage device is
critical to maintaining those virtual machine disk files as they’re run by cluster nodes.
The next chapter of this book takes a step back from the Hyper‐V part of a Hyper‐V
architecture, and delves deep into just those capabilities that you probably will want in
your SAN infrastructure. It will discuss how certain SAN capabilities being made available
only today are specifically designed to provide an assist to virtualization infrastructures.
31
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Chapter 3: Critical Storage Capabilities for
Highly‐Available Hyper‐V
Chapter 2 highlighted the fact that high availability is fundamentally critical to a successful
HyperV infrastructure. This is the case because uncompensated hardware failures in any
Hyper‐V infrastructure have the potential to be much more painful than what you’re used
to seeing in traditional physical environments.
A strong statement, but think for a minute about this increased potential for loss: In any
virtual environment, your goal is to optimize the use of physical equipment by running
multiple virtual workloads atop smaller numbers of physical hosts. Doing so gives you
fantastic flexibility in managing your computing environment. But doing so, at the same
time, increases your level of risk and impact to operations. When ten workloads, for
example, are running atop a single piece of hardware, the loss of that hardware can affect
ten times the infrastructure and create ten times the pain for your users.
Due to this increased level of risk and impact, you must plan appropriately to compensate
for the range of failures that can potentially occur. The issue here is that no single
technology solution compensates for every possible failure. Needed are a set of solutions
that work in concert to protect the virtual environment against the full range of
possibilities.
Depicted in Figure 3.1 is an extended representation of the previous chapter’s fully‐
redundant Hyper‐V environment. There, each Hyper‐V server connects via multiple
connections to a networking infrastructure. That networking infrastructure in turn
connects via multiple paths to the centralized iSCSI storage infrastructure. Consider for a
minute which failures are compensated for through this architecture:
• Storage and Production Network traffic can survive the loss of a single NIC due to
the incorporation of 802.3ad network teaming and/or MPIO/MCS.
• Storage and Production Network traffic can also survive the loss of an entire
network switch due to the incorporation of 802.3ad network teaming and/or
MPIO/MCS that has been spread across multiple switches.
• Virtual machines can survive the planned outage of a Hyper‐V host through Live
Migration as a function of Windows Failover Clustering.
• Virtual machines can also be quickly returned to service after the unplanned outage
of a Hyper‐V host as a function of Windows Failover Clustering.
• Network oversubscription and the potential for virtual machine denial of service are
inhibited through the segregation of network traffic across Storage, Production,
Management, and Heartbeat connections.
32
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Network
Switch
iSCSI
Storage
Figure 3.1: HyperV environments require a set of solutions to protect against all of
the possible failures.
The risk associated with each of these potential failures has been mitigated through the
implementation of multiple layers of redundancy. However, this design hasn’t necessarily
taken into account its largest potential source of risk and impact. Take another look at
Figure 3.1. In that figure, one element remains that in and of itself can become a significant
single point of failure for your Hyper‐V infrastructure. That element is the iSCSI storage
device itself.
Each and every virtual machine in your Hyper‐V environment requires storage for its disk
files. This means that any uncompensated failure in that iSCSI storage has the potential to
take down each and every virtual machine all at once, and with it goes your business’ entire
computing infrastructure. As such, there’s a lot riding on the success of your storage
infrastructure. This critical recognition should drive some important decisions about how
you plan for your Hyper‐V storage needs. It is also the theme behind this guide’s third
chapter.
Virtual Success Is Highly Dependent on Storage
In the end, storage really is little more than just a bunch of disks. You must have enough
disk space to store your virtual machines. You must also have enough disk space for all the
other storage accoutrements that a business computing environment requires: ISO files,
user home folders, space for business databases, and so on. Yet while raw disk space itself
is important, the architecture and management of that disk space is exceptionally critical to
virtualization success in ways that might not be immediately obvious.
33
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Chapter 2 introduced the suggestion that the goal for SAN availability is “no nines,” or what
amounts to 100% availability. Although this requirement might seem an impossibility at
first blush, it is in fact a necessity. The operational risk of a SAN failure is made even more
painful by the level of impact such an event will have on your environment. As a result,
your goal in selecting, architecting, and implementing your SAN is to ensure that its design
contains no single points of failure.
Today’s iSCSI SAN equipment accomplishes this lofty goal through the concurrent
implementation of a set of capabilities that layer on top of each other. This layered
approach to eliminating points of failure ensures that surviving hardware always has the
resources and data copies it needs to continue serving the environment without
interruption.
“NonInterruptive” Is Important
This concept of non‐interruptive assurance during failure conditions is also
critical to your SAN selection and architecture. Your selected SAN must be
able to maintain its operations without interruption as failures occur.
Although non‐interruptive in this definition might mean an imperceptibly
slight delay as the SAN re‐converges after a failure, that delay must be less
than the tolerance of the servers to which it is connected. As you’ll discover
later in this chapter, non‐interruptive is important not only during failure
operations but also during maintenance and management operations.
The easiest way to understand how this approach brings value is through an iterative look
at each compensating layer. The next few sections will discuss how today’s best in class
iSCSI SAN hardware has eliminated the SAN as a potential single point of failure.
Modular Node Architecture
Easily the most fundamental new approach in eliminating the single point of failure is in
eliminating the “single point” approach to SAN hardware. Modern iSCSI SAN hardware
accomplishes this by compressing SAN hardware into individual and independent modules
or “nodes.” These nodes can be used independently if needed for light or low‐priority uses.
Or, they can be logically connected through a storage network to create an array of nodes.
Figure 3.2 shows a logical representation of how this architecture might look. Here, four
independent storage nodes have been logically connected using their built‐in management
software and a dedicated storage network. Within each node are 12 disks for data storage
as well as all the other necessary components such as processors, power supplies, NICs,
and so on. The result of connecting these four devices is a single logical iSCSI storage
device. That device has the capacity to present the summation of each device’s available
storage to users and servers.
34
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 3.2: Multiple storage nodes aggregate to create a single logical device.
Important to recognize here is that each device can be an independent entity or aggregated
with others to modularly increase the capacity of the SAN. This modular approach can be
added to or subtracted from as the data needs of its owner changes over time. This
presents a useful benefit to the ownership of such a SAN over more traditional monolithic
approaches: Its capacity can be expanded or otherwise modified as necessary without the
need for wholesale hardware replacements.
Consider as an alternative the more traditional monolithic SAN. These devices rely on the
population of a storage “frame” with disks, storage processors, and switch fabric devices. In
this type of SAN, there is a physical limit to the amount of storage that can be added into
such a frame. Once that frame is full to capacity, either additional frames must be
purchased or existing disks or frames must be swapped out for others that have greater
capacity. The result can be a massive capital expenditure when specific threshold limits are
exceeded.
Using the modular approach, new modules can be added to existing ones at any point.
Management software within each module is used to complete the logical connection
through the dedicated storage network. That same software can be configured to
automatically accomplish post‐augmentation tasks such as volume restriping and re‐
optimization on behalf of the administrator. This chapter will talk more about these
management functions shortly.
35
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Redundant Storage Processors Per Node
Modularization alone does nothing to enhance storage availability. It also does nothing to
enhance the resiliency of the individual node and its data. However, it does provide the
framework in which much of the aforementioned advanced availability features lie.
Every storage device requires some sort of processor in order to accomplish its stated
mission. Although some processors leverage entirely proprietary code, many processors
today rest atop highly‐tailored distributions of existing operating systems (OSs) such as
Linux or Windows Storage Server. No matter which OS is at its core, one architectural
element that is critical to ensuring node resiliency is the use of redundant storage
processors within each individual node.
Figure 3.3 shows how this might look in a storage device that is comprised of four nodes.
Here, each individual node includes two storage processors that are clustered for the
purposes of redundancy. With this architecture in place, the loss of a storage processor will
not impact the functionality of the individual node.
Figure 3.3: Multiple storage processors per node ensure individual node resiliency.
This architecture comes in particularly handy when nodes are used independently. In this
configuration, a single node can survive the loss of a storage processor without
experiencing an interruption of service.
Redundant Network Connections and Paths
Redundancy in processing is a great feature, but even multiple storage processors cannot
assist when network connections go down. The risk of network failure is in fact such a
common occurrence that the entirety of Chapter 2 was dedicated to highlighting the
necessary server‐to‐SAN connections that are required for Hyper‐V.
36
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Yet that discussion in Chapter 2 did not include one critical redundancy element that is
shown in Figure 3.4. This redundancy becomes relevant when used in the framework of a
modular SAN architecture. There, each individual storage node has also been connected to
the storage network using redundant connections.
Figure 3.4: Redundant connections and paths relate to internode communication as
well as servertonode.
Important to recognize here is that this configuration is necessary not only for resiliency
but also for raw throughput. Because each individual storage node is likely connected to by
multiple servers, the raw network performance in and out of each node can be more than is
possible through a single connection. Although all iSCSI storage nodes have at least two
network connections per node, those that are used in support of extremely high
throughput may include four or more to support the necessary load.
Note
Measuring that performance is a critical management activity. iSCSI storage
nodes tend to come equipped with the same classes of performance counters
that you’re used to seeing on servers: Processor, network, and memory
utilization are three that are common. Connecting these counters into your
monitoring infrastructure will ensure that your Hyper‐V server needs aren’t
oversubscribing any part of your SAN infrastructure.
Disk‐to‐Disk RAID
RAID has been around for a long time. So long, in fact, that it is one of those few acronyms
that doesn’t need to be written out full when used in guides like this one. Although RAID
has indeed had a long history in IT, it’s important to recognize that it is another high‐
availability feature that you should pay attention to as you consider a SAN storage device
for Hyper‐V.
37
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
The reason behind this special consideration has to do with the many types of RAID
protection that SANs can deploy over and above those traditionally available within
individual servers. These added RAID levels are made possible in many ways due to the
sheer number of disks that are available within an individual storage node.
Figure 3.5 shows a graphical representation of how some of these might look. In addition to
the usual RAID 1 (mirroring), RAID 5 (striping with parity), and RAID 1+0 (disks are
striped, then mirrored) options that are common to servers, SANs can often leverage
additional RAID options such as RAID‐with‐hot‐spares, RAID 6 (striping with double
parity), and RAID 10 (disks are mirrored, then striped), among others.
Figure 3.5: Disktodisk RAID in iSCSI storage devices is functionally similar to RAID
within individual servers.
These alternative options are often necessary as the size of SANs grow due to the potential
for multiple disk failures. Although the traditional RAID levels used in servers are designed
to protect against a single disk failure, they are ineffective against the situation where more
than one disk fails in the same volume. The added level of protection gained through
advanced RAID techniques becomes increasingly necessary when large numbers of
individual disks are present in each storage node.
Node‐to‐Node RAID
Another RAID capability that is not common to server disk drives is the capacity to span
volume redundancy across multiple nodes. In fact, this feature alone is one of the greatest
reasons to consider the implementation of a multiple‐node architecture for the storage of
Hyper‐V virtual machines as well as other critical data.
38
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
In Figure 3.6, the red boxes that represent intra‐node RAID have been augmented with
another set of purple boxes. This second set of boxes highlights how node‐to‐node RAID
configurations can span individual nodes. In this configuration, volumes have been
configured in such a way that every piece of data on one node (or its parity information) is
always replicated to one or more additional nodes in the logical device.
Figure 3.6: Nodetonode RAID ensures that entire nodes can fail with no impact to
operations.
Note
Although Figure 3.6 shows an example of a RAID set that has been created
across only a few disks in a few nodes, it is more common that RAID sets are
created across every disk in the entire logical storage device. By creating a
hardware RAID set in this manner, the entire device’s storage can then be
made available to exposed volumes.
Depending on the storage device selected, multiple levels of node‐to‐node RAID are
possible with each having its own benefits and costs. For example, each block of data can be
replicated across two nodes. This configuration ensures that a block of data is always in
two places at once. As an alternative that adds redundancy but also adds cost, each block
can be replicated across three nodes, ensuring availability even after a double‐node failure.
This architecture is critically important for two reasons. First, it extends the logical storage
device’s availability to protect against failures of an entire node or even multiple nodes. The
net result is the creation of a storage environment that is functionally free of single points
of failure.
39
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
As a second reason, such an architecture also increases the capacity of the logical storage
device’s volumes to greater than the size of a single node. Considering the large size of
Hyper‐V virtual machines, extremely large volume sizes may be necessary, such as those
that are larger than can be supported by a single node alone.
Modularization Plus DisktoDisk RAID Equals SwapAbility
Interesting to note here is how the combination of disk‐to‐disk RAID goes
hand‐in‐hand with modularization. This combination of capabilities enables
SAN hardware to be very easily replaced in the case of an entire‐node failure,
making the individual node itself a hotswappable item.
Think for a minute about how this might occur: Every block of data on such a
SAN is always replicated to at least one other storage node. Thus, data is
always protected when a node fails. When a failure occurs, an administrator
needs only to remove the failed node and swap it with a functioning
replacement. With minimal configuration, the replacement can automatically
reconnect with the others in the logical storage device and synchronize the
necessary data. As a result, even an entire node failure becomes as trivial as
an individual disk failure.
Integrated Offsite Replication for Disaster Recovery
And yet even these capabilities don’t protect against the ultimate failure: the loss of an
entire operational site. Whether that loss is due to a natural disaster, one that is man‐made,
or a misconfiguration that results in massive data destruction, there sometimes comes the
need to relocate business operations in their entirety to a backup site.
What’s particularly interesting about disaster recovery and its techniques and technologies
is that many are newcomers into the IT ecosystem. Although every business has long
desired a fully‐featured disaster recovery solution, only in the past few years have the
technologies caught up to make this dream affordable.
Needed at its core is a mechanism to replicate business data as well as data processing to
alternate locations with an assurance of success. Further, that replication needs to occur in
such a way that minimizes bandwidth requirements. To be truly useful, it must also be a
solution that can be implemented without the need for highly‐specialized training and
experience. In the case of a disaster, your business shouldn’t need specialists to failover
your operations to a backup site nor fail them back to the primary site when the disaster is
over.
Today’s best‐in‐class iSCSI SANs include the capability to connect a primary‐site SAN to a
backup‐site SAN as Figure 3.7 shows. This being said, such a connection is a bit more than
just plug‐and‐go. There are some careful considerations that are important to being
successful, most especially when SAN data consists of Hyper‐V virtual machines.
CrossReference
Chapter 4 will explore the architectures and requirements for disaster
recovery in more detail.
40
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 3.7: Automated replication of changed data to alternate sites protects against
entire site loss.
Non‐Interruptive Capacity for Administrative Actions
It has already been stated that architecting your storage infrastructure is exceptionally
important to be successful with Hyper‐V. Yet getting that storage up and operational is only
the first step in actually using your Hyper‐V virtual environment. It’s also the shortest step.
Longer in timeframe and arguably more important are the management activities you’ll
undergo after the installation is complete.
The processes involved with managing Hyper‐V storage often get overlooked when the
initial architecture and installation is planned. However, these same administrative tasks,
when not planned for, can cause complications and unnecessary outages down the road. No
matter which action needs to be accomplished, your primary goal should be an ability to
invoke those actions with the assurance that they will not interrupt running virtual
machines.
If these statements sound alarmist, consider the long‐running history of storage
technologies. In the not‐too‐distant past, otherwise simple tasks became operational
impacts due to their need for volume downtime. These tasks included basic administrative
actions such as extending an existing volume to add more disk space, installing a firmware
upgrade, or augmenting the environment with additional nodes or frames. In the most
egregious of examples, simple tasks such as these sometimes required the presence of on‐
site assistance from manufacturer storage technicians.
That historical limitation added substantial complexity and cost to SAN ownership. Today,
such limitations are wholly unacceptable when considered with the availability
requirements needed by a virtual infrastructure. Your business simply can’t bring down
every virtual machine when you need to make a small administrative change to your storage.
41
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
With this in mind, consider the following set of administrative activities that are common
to all storage environments. Your SAN hardware should be able to accomplish each of them
without interruption to virtual machine processing or other concurrent data access.
Further, they also represent actions that a sufficiently‐experienced administrator should be
able to accomplish with the right hardware and minimal tool‐specific instruction.
Note
With these activities, iSCSI isn’t alone. Many of the features explained in the
following sections should be available in other types of SAN equipment such
as those that leverage fibre channel connections. Often, however, these
features are only available at extra cost. This is an important consideration
when purchasing a new storage infrastructure. Look carefully to the
capabilities that are offered by your SAN vendor to ensure that the right set
of management activities is available for your needs. For some vendors, you
may need to purchase the rights to use certain management functions. As an
alternative, look to an all‐inclusive SAN vendor that does not price out
advanced functionality at extra cost.
Volume Activities
Early monolithic SAN infrastructures required complex configuration file changes when
volumes needed reconfiguration. For some vendors, this configuration file change was an
exceptionally painful operation, often requiring the on‐site presence of trained
professionals to ensure its successful implementation.
Today, volume changes are relatively commonplace activities. Administrators recognize
that provisioning too much storage to a particular volume takes away disk space from
other volumes that might need it down the road. It is for this reason that today’s best
practices in volume size assignment are to maintain a small but constant percentage of free
space. This sliding window of available space can require administrators to constantly
monitor and adjust sizes as needed. Some SANs have the capability to automatically scale
the size of volumes per preconfigured thresholds. No matter which method you use, this
activity on today’s iSCSI SANs should not require downtime to either the volume or
connected users and servers.
Advanced SANs provide the capability to accomplish other volume‐based tasks without
interruption as well. These tasks can relate to changing how a volume is provisioned, such
as thin‐provisioned versus pre‐allocated, or configured RAID settings. For example,
volumes that start their operational life cycle as a low priority resource may later grow in
criticality and require additional RAID protection. That reconfiguration should occur
without interruption to operations.
42
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Storage Node Activities
Activities associated with the storage node itself should also be accomplished without
impact to data access. For example, adding, removing, or replacing storage nodes from a
logical storage device are tasks that can and should be possible without interruption.
Important to recognize here are the non‐interruptive internal activities that must occur in
the background after such a dramatic change to the storage environment:
• Adding a node automatically restripes existing volumes across the new node,
balancing storage across the now‐larger logical storage device.
• Removing a node automatically relocates data off the node prior to the actual
removal activity, ensuring that data remains available even after the node has been
removed from the logical storage device.
• Replacing a node automatically rebuilds volumes from surviving data on the
remaining nodes.
Another useful cross‐node activity is the use of automated volume restriping to reduce
spindle contention. This problem of spindle contention was first introduced in Chapter 1
and can have a larger‐than‐normal impact on storage that is part of a virtualization
infrastructure. In essence, when the disk use of virtual machines becomes greater than
expected, virtual machines whose disk files share the same disk spindles in the SAN
infrastructure will experience a bottleneck. Collocated virtual machines in this situation
experience a collective reduction in performance as each vies for attention by the storage
device.
To alleviate this situation, some storage devices have the ability to watch for spindle
contention and transparently relocate data files to alternate locations on disk. The result is
a more optimized distribution of storage hardware resources across the entire logical
device as well as better overall performance for virtual machines.
Data Activities
Storage arrays commonly include the ability to snapshot volumes as well as replicate them
to other locations within and outside the logical device. Snapshotting activities are critical
to reducing backup windows. They also provide the ability to quickly create point‐in‐time
copies of virtual machines for testing or other purposes.
Replication is often necessary when virtual machines or other data must be offloaded to
alternate volumes or logical storage devices—this can be due to a forklift upgrade of the
logical storage device or because it is necessary to create copies of volumes for device‐to‐
device replication. As with the other activities, completing these data‐related activities
should be a non‐interruptive process.
43
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Firmware Activities
Last, is the not‐uncommon activity associated with updating the firmware on individual
storage nodes. All storage devices require the occasional update of firmware code in order
to add features, eliminate bugs, and update code to prevent known attacks.
This updating of SAN firmware must be an operation that does not require downtime.
Downtime prevention may occur as a function of multiple storage processors or in using an
OS that can implement updates without requiring a reboot.
Storage Virtualization
The concepts that embody storage virtualization share little with those that are associated
with traditional server virtualization. However, they do share the same high‐level meaning
in that storage virtualization is also about abstraction. In the case of storage virtualization,
the abstraction exists between logical storage (RAID sets, volumes, and so on) and the
actual physical storage where that data resides.
You’ve already been exposed in this chapter to many of the capabilities that fall under the
banner of storage virtualization: The ability to snapshot a drive abstracts the snapshot
from the bits in its initial volume. Restriping a volume across multiple nodes requires a
layer of abstraction as well. Accomplishing this task requires a meta‐layer atop the volume
that that maps the logical storage to physical locations.
In the context of virtualization atop platforms such as Hyper‐V, storage virtualization
brings some important management flexibility. It accomplishes this through the
introduction of new features that improve the management of Hyper‐V virtualization. Let’s
look at a few of these features in the following sections.
Snapshotting and Cloning
Creating snapshots of volumes enables administrators to work with segregated copies of
data but without the need to create entirely duplicate copies of that data. For example,
consider the situation where you need to test the implementation of an update to a set of
virtual machines on a volume. Using your SAN’s snapshotting technology, it is possible to
create a duplicate copy of that entire volume. Because the volume has been created as a
snapshot rather than a full copy, the time to complete the snapshot is significantly reduced.
The level of consumed space is also only a fraction of the overall volume size.
Once created, actions like the aforementioned update installation can be completed on the
snapshot volume. If the results are a success, the snapshot can be merged into the original
volume or discarded.
44
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Backup and Restore with VSS Integration
Snapshots are useful for other reasons as well. Backup operations are made much easier
through the use of snapshots. Integrating those snapshots with Microsoft’s Volume Shadow
Copy (VSS) ensures that backups successfully capture the state of the virtual machine along
with its installed applications. Without VSS integration, installed applications and their
data may not be correctly backed up. When seeking a SAN to be used in a virtualized
environment, it is important to look for those that support VSS integration to ensure
backups of these types of applications.
Volume Rollback
A key advanced feature is the ability for volumes to be rolled backwards in time. This need
can occur after a significant data loss or data corruption event. Combining snapshot
technology with the capacity to store multiple snapshot iterations gives the administrator a
series of time‐based volume snapshots. Rolling a volume to a previous snapshot quickly
returns the volume to a state before the deletion or corruption occurred. Further, volume
rollback can more quickly return a corrupted volume to operations than traditional restore
techniques.
Thin Provisioning
Lastly is the capability for volume thin provisioning. It has already been discussed in this
chapter that today’s best practices suggest that volumes should be configured to maintain
only a small level of free space. This small level ensures that available disk space can
always be assigned to the volumes that need them.
One problem with this approach relates to how an OS will make use of an assigned volume.
Unlike storage devices, OSs tend to create statically‐sized volumes for their configured disk
drives. Thus, every storage device volume extension must be followed by a manual volume
extension within the OS.
A method to get around this limitation is the use of thin provisioning. Here, a volume is
presented to the OS for its anticipated size needs. On the storage device, however, the true
size of the volume is only as large as the actual data being consumed by the OS. The storage
device’s volume automatically grows in the background as necessary to provide free space
for the OS. The result is that the OS’s volume does not need expansion while the storage
device’s volume only uses space as necessary. This process significantly improves the
overall utilization of free space across the storage device.
Caution
Caution must be used in leveraging thin provisioning to ensure that the real
allocation of disk space doesn’t go above true level of available disk space.
Proper monitoring and alerting of storage space is critical to prevent this
catastrophic event from occurring.
45
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Storage Architecture and Management Is Key to Hyper‐V
You’ve seen the comment presented over and over that the task of installing the very basics
of Hyper‐V is excessively simplistic; the real skill comes in creating a Hyper‐V
infrastructure that can survive the many possible failures that can and will occur in a
production computing environment. Preventing those failures happens with the right
combination of a good architecture and the capability to accomplish needed management
activities without service interruption. You’ve learned about these needs in this chapter.
But this chapter’s discussion on storage capabilities has left one element remaining. You
now understand how your iSCSI storage should be architected to ensure the highest levels
of availability. But you haven’t really come to understand the special needs that arrive
when an entire site goes down. Disaster recovery is the theme in the fourth and final
chapter. Coming up, you’ll learn about the technologies and techniques you’ll need to
consider when you expand your operations to a full disaster recovery site.
46
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Chapter 4: The Role of Storage in Hyper‐V
Disaster Recovery
You’ve learned about the power of iSCSI in Microsoft virtualization. You’ve seen the various
ways in which iSCSI storage is connected into Hyper‐V. You’ve learned the best practices
for architecting your connections along with the smart features that are necessary for
100% storage uptime. You’ve now got the knowledge you need to be successful in
architecting iSCSI storage for HyperV.
With the information in this guide’s first three chapters it becomes possible to create a
highly‐available virtual infrastructure atop Microsoft’s virtualization platform. With it, you
can create and manage virtual machines with the assurance that they’ll survive the loss of a
host, a connection, or any of the other outages that happen occasionally within a data
center.
Yet this knowledge remains incomplete without a look at one final scenario: the complete
disaster. That disaster might be something as substantial as a Category 5 hurricane or as
innocuous as a power outage. But in every scenario, the end result is the same: You lose the
computing power of an entire data center.
Important to recognize here is that the techniques and technologies that you use in
preparing for a complete disaster are far, far different than those you implement for high
availability. Disaster recovery elements are added to a virtual environment as an
augmentation that protects against a particular type of outage.
Defining “Disaster”
Before getting into the actual click‐by‐click installation of Hyper‐V disaster recovery, it is
important first to understand what actually makes a disaster. Although the term “disaster”
finds itself greatly overused in today’s sensationalist media (“Disaster in the South: News at
11.”), the actual concept of disaster in IT operations has a very specific meaning.
There are many technical definitions of “disasters” that exist, one of which your
organization’s process framework likely leverages to functionally define when a disaster
has occurred. Rather than relying on any of the technical definitions, however, this chapter
will simply consider a disaster for IT operations to be an event that fully interrupts the
operations of a data center.
47
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Using this definition, you can quickly identify what kinds of events can be considered a
disaster:
• A naturally‐occurring event, such as a tornado, flood, or hurricane, impacts your
data center and causes damage; that damage causes the entire processing of that
data center to cease
• A widespread incident, such as a water leakage or long‐term power outage that
interrupts the functionality of your data center for an extended period of time
• An extended loss of communications to a data center, often caused by external
forces such as utility problems, construction, accidentally severed cabling, and so on
Although disasters are most commonly associated with the types of events that end up on
the news, the actual occurrence of newsworthy disasters is in fact quite rare. In reality, the
events making up the second group in the previous list are much more likely to occur. Both
cause interruption to a data center’s operations, but those in the first group occur with the
kinds of large‐scale damage that requires greater effort to fix.
It is important to define disasters in this way because those above are handled in much
different ways than simple service outages. Consider the following set of incidents that are
problematic and involve outage but are in no way disasters:
• A problem with a virtual host creates a “blue screen of death,” immediately ceasing
all processing on that server
• An administrator installs a piece of code that causes problems with a service,
shutting down that service and preventing some action from occurring on the server
• An issue with power connections causes a server or an entire rack of servers to
inadvertently and rapidly power down
The primary difference between these types of events and your more classic “disasters”
relates to the actions that must occur to resolve the incident. In none of these three
incidents has the operations of the data center been fully interrupted. Rather, in each, some
problem has occurred that has caused a portion of the data center—a server, a service, or a
rack—to experience a problem.
This differentiation is important because a business’ decision to declare a disaster and
move to “disaster operations” is a major one. And the technologies that are laid into place
to act upon that declaration are substantially different (and more costly) than those used to
create simple high availability. In the case of a service failure, you are likely to leverage
your high‐availability features such as Live Migration or automatic server restart. In a
disaster, you will typically find yourself completely moving your processing to an
alternative site. The failover and failback processes are big decisions with potentially big
repercussions.
48
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Defining “Recovery”
Chapter 3 started this guide’s conversation on disaster recovery through its iterative
discussion on the features that are important to Hyper‐V storage. There, a graphic similar
to Figure 4.1 was shown to explain how two different iSCSI storage devices could be
connected across two different sites to create the framework for a disaster recovery
environment.
Figure 4.1: The setup of two different SANs in two different sites lays the framework
for HyperV disaster recovery.
In Figure 4.1, you can see how two different iSCSI storage devices have been
interconnected. The device on the left operates in the primary site and handles the storage
needs for normal operations. On the left is another iSCSI storage device that contains
enough space to hold a copy of the necessary data for disaster operations. Between these
two storage devices is a network connection of high‐enough bandwidth to ensure that the
data in both sites remains synchronized.
This architecture is important because at its very core virtualization makes disaster
recovery far more possible than ever before. Virtualization’s encapsulation of servers into
files on disk makes it both operationally feasible and affordable to replicate those servers
to an alternative location.
At a very high level, disaster recovery for virtual environments is made up of three basic
things:
• A storage mechanism
• A replication mechanism
• A target for receiving virtual machines and their data
The storage mechanism used by a Hyper‐V virtual environment (or, really any virtual
environment) is the location where each virtual machine’s disk files are contained. Because
the state of those virtual machines is fully encapsulated by those disk files, it becomes
trivial to replicate them to an alternative location. Leveraging technology either within the
storage device, at the host, or a combination of both, creating a fully‐functional secondary
site is at first blush as trivial as a file copy.
49
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Now, obviously there are many factors that go into making this “file copy” actually
functional in a production environment. There are different types of replication approaches
that focus on performance or prevention of data loss. There are clustering mechanisms that
actually enable the failover as well as failback once the primary site is returned to
functionality. There are also protective technologies that ensure data is properly replicated
to the alternative site such that it is crash‐consistent and application‐consistent. All of these
technologies you will need to integrate when creating your own recovery solution.
The Importance of Replication, Synchronous and Asynchronous
Without delving into the finer details of how this architecture is constructed, a primary
question that must first be answered relates to how that synchronization is implemented.
Remember that above all else, an iSCSI storage device is at its core just a bunch of disks.
Those disks have been augmented with useful management functions to make them easier
to work with (such as RAID, storage virtualization, snapshots, and so on), but at its most
basic, a storage device remains little more than disk space and a connection.
This realization highlights the importance of how these two storage devices must remain in
synch with each other. Remember that the sole reason for this second storage device’s
existence is to create a second copy of production data comprised of both virtual machine
disk files and the data those virtual machines work with. Thus, the mechanism by which
data is replicated from primary to backup site (and, eventually, back) is important to how
disaster recovery operations are initiated.
Two types of replication approaches are commonly used in this architecture to get data
migrated between storage devices. Those two types are generically referred to as
synchronous and asynchronous replication. Depending on your needs for data preservation
as well as the resources you have available, you may select one or the other of these two
options. Or, both.
Synchronous Replication
In synchronous replication, changes to data are made on one node at a time. Those changes
can be the writing of raw data to disk by an application or the change to a virtual machine’s
disk file as a result of its operations. When data is written using synchronous replication,
that change is first enacted on the primary node and then subsequently made on the
secondary node. Important to recognize here is that the change is not considered complete
until the change has been made on both nodes. By requiring that data is assuredly written on
both nodes before the change is complete, the environment can also ensure that no data
will be lost when an incident occurs.
Consider the following situation: A virtual machine running Microsoft Exchange is merrily
doing its job, responding to Outlook clients and interacting with its Exchange data stores.
That virtual machine’s disk files and data stores are replicated using synchronous
replication to a second storage device in another location. Every disk transaction that
occurs with the virtual machine requires the data to be changed at both the primary and
secondary site before the next transaction can occur.
50
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 4.2 shows a breakdown of the steps required for this synchronous replication to
fully occur. In this situation, the Exchange server makes a change to its database. That
change is first committed at the primary site. It is then replicated to the secondary site,
where it is committed to the storage device in that location. An acknowledgement of
commitment is finally sent back to the primary site, upon which both storage devices can
then move on to the next change.
Figure 4.2: A breakdown of the steps required for synchronous replication.
This kind of replication very obviously ensures that every piece of data is assuredly written
before the next data change can be enacted. At the same time, you can see how those extra
layers of assurance can create a bottleneck for the secondary site. As each change occurs,
that change must be acknowledged across both storage devices before the next change can
occur.
Synchronous replication works exceptionally well when the connection between storage
devices is of very high bandwidth. Gigabit connections combined with short distances
between devices reduces the intrinsic latency in this architecture. As a result, environments
that require zero amounts of data loss in the case of a disaster will need to leverage
synchronous replication.
Asynchronous Replication
Asynchronous replication, in contrast, does not require data changes to occur in lock‐step
between sites. Using asynchronous replication between sites, changes that occur to the
primary site are configured to eventually be written to the backup site.
Leveraging preconfigured parameters, changes that occur to the primary site are queued
for replication to the backup site as appropriate. This queuing of disk changes between
sites enables the primary site to continue operating without waiting for each change’s
commitment and acknowledgement at the backup site. The result is no loss of storage
performance as a function of waiting for replication to complete.
51
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Although asynchronous replication eliminates the performance penalties sometimes seen
with synchronous replication, it does so by also eliminating the assurance of zero or nearly
zero data loss. In Figure 4.3, you can see how changes at the primary site are queued up for
eventual transfer to the backup site. Using this approach, changes can be submitted in
batches as bandwidth allows; however, a disaster that occurs between change replication
intervals will cause some loss of the queued data.
Change 1 Committed at
Primary Site
Change 2 Committed at
Primary Site
Change 3 Committed at
Primary Site
Change 4 Committed at
Primary Site
Changes Replicated to
Secondary Site
Although the idea of “eventual replication” might seem scary in terms of data integrity, it is
in fact an excellent solution for many types of disaster recovery scenarios. To give you an
idea, turn back a few pages and take another look at the types of incidents that this chapter
considers to be disasters. In either of these classes of events, the level of impact to the
production data center facility is enormous. At the same time, those same types of disasters
are likely to cause an impact to the people who work for the business as well.
For example, a natural disaster that impacts a data center is also likely to impact the brick‐
and‐mortar offices of the business. This impact may impede the ability of employees to get
the job of the business done. As a result, a slight loss in data may be insignificant when
compared with the amount of business data that is saved, that will be used in the
immediate term, and that can be reconstructed from other means.
Which Should You Choose?
To summarize the discussed concepts, remember always that synchronous replication has
the following characteristics:
• Assures no loss of data
• Requires a high‐bandwidth and low‐latency connection
• Write and acknowledgement latencies impact performance
• Requires shorter distances between storage devices
52
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
In contrast, asynchronous replication solutions have the following characteristics:
• Potential for loss of data during a failure
• Leverages smaller‐bandwidth connections, more tolerant of latency
• No performance impact to source server processing
• Potential to stretch across longer distances
Your decision about which type of replication to implement will be determined primarily
by your Recovery Point Objective (RPO), and secondarily by the amount of distance you
intend to put between your primary and secondary sites.
Recovery Point Objective
RPO is a measurement of your business’ tolerance for acceptable data loss for a particular
service, and is formally defined as “the point in time to which you must recover data as
defined by your organization.” Business services that are exceptionally intolerant of data
loss are typified by production databases, critical email stores, or line of business
applications. These services and applications cannot handle any loss of data for reasons
based on business requirements, compliance regulations, or customer satisfaction. For
these services, even the most destructive of disasters must be mitigated against because
the loss of even a small amount of data will significantly impact business operations.
You’ll notice here that this definition does not talk about the RPO of your business but
rather the RPO of particular business services. This is an important differentiation as well as
one that requires special highlighting. Remember that every business has services that it
considers to be Tier I or “business critical”. Those same businesses have other services that
it considers to be Tier II or “moderately important” as well as others that are Tier III or
“low importance.”
This differentiation is critically important because although virtualization indeed makes
disaster recovery operationally feasible for today’s business, disaster recovery still
represents an added cost. Your business might see the need for getting its production
database back online within seconds, but it likely won’t need the same attention for its low‐
importance WSUS servers or test labs.
Distance Between Sites
Remember too that synchronous replication solutions require good bandwidth between
sites. At the same time, they are relatively intolerant of latency between those connections.
Thus, the physical distance between sites becomes another factor for determining which
solution you will choose.
Of the different types of disasters, natural disasters tend to have the greatest impact on this
decision. For example, to protect against a natural disaster like a Category 5 hurricane, you
likely want your backup site to sit in a geographic location that is greater than the expected
diameter of said hurricane. At the same time, Category 5 hurricanes are relatively rare
events, while other events like extended power outages are much more likely.
53
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
It is for these reasons that combinations of synchronous, asynchronous, and even non‐
replication for your servers can be an acceptable solution. Some of your servers need to
stay up no matter what, while others can wait for the disaster to end and normal
operations to return. Others can be protected against low‐impact disasters through short‐
distance synchronous replication, while a tertiary site located far away protects against the
worst of natural cataclysms. In all of these, cost and benefit will be your guide.
Note
An additional and yet no less important determinant here relates to your
support servers. When considering which virtual servers to enable for
disaster recovery, remember to also make available those that provide
support services. You don’t want to experience a disaster, fully failover, and
find yourself without domain controllers to run the domain or Remote
Desktop Servers to connect users to applications.
Ensuring Data Consistency
No discussion on replication is complete without a look at the perils of data consistency.
Bluntly put, if you expect to simply file‐copy your virtual machines from one storage device
to another, you’ll quickly find that the resulting copies aren’t likely to power on all that
well. Nor will their applications and databases be immediately available for use when a
disaster strikes.
Data Consistency: An Exchange Analogy
The best way to explain this problem is through a story. Have you or
someone in your organization ever accidentally pulled the power cable on
your Exchange Server? Or have you ever seen that Exchange Server crash,
powering down without a proper shut down sequence? What happens when
either of these two situations happens?
In either situation, the Exchange database does not return back to operations
immediately with the powering back on of the server. Instead it refuses to
start Exchange’s services, reporting that its database was shut down
uncleanly. The only solution when this occurs is a long and painful process of
running multiple integration checks on the database to return it back to
functionality. Depending on the size of the database, those integrity checks
can require multiple hours to complete. During their entire process, your
company must operate with a non‐fully‐functional mail system. It is for this
reason that businesses that use Microsoft Exchange add high‐availability
features such as battery backup, redundant power supplies, and even
database replication to alternative systems.
54
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Now, you might be asking yourself, “How does this story relate to data
consistency in replicated virtual environments?” The answer is, Without the
right technology in place, a dirty Exchange database can occur from a poorly‐
replicated virtual machine in the exact same way that it does with a power
fault. In either case, you must implement the right technologies if you’re to
prevent that unclean shutdown.
The problem here has to do with the ways in which virtual machine data is replicated from
primary site to backup site. Remember that a running virtual machine is also a virtual
machine that is actively using its disk files. Thus, any traditional file copy that occurs from a
primary site to a backup site will find that the file has changed during the course of the
copy. Even ignoring the obvious locked‐file problems that occur with such open files, it
becomes easy to see how running virtual machine disk files cannot be replicated without
some extra technology in place.
Further complicating this problem are the applications that are running within that virtual
machine itself. Consider Exchange once again as an example, although the issue exists
within any installed transactional database. With a Microsoft Exchange data store, its .EDB
file on disk behaves very much like a virtual machine’s disk file. In essence, although it may
be possible to copy that .EDB file from one location to another, you can only be guaranteed
a successful copy if the Exchange server is not actively using the file. If it is, changes are
likely to occur during the course of the transfer that result in a corrupted database.
It is for both of these reasons that extra technology is required at one or more levels of the
infrastructure to manage the transfer between primary and secondary sites. This
technology commonly uses one of many different snapshotting technologies to watch for
and transfer changes to virtual machines and their data as they occur.
Data integration technologies often require the installation of extension software to either
the Hyper‐V cluster or the individual virtual machines. This software commonly integrates
with the onboard Volume Shadow Copy service along with its application‐specific
providers to create and work with dynamic snapshots of virtual machines and their
installed applications. The result is much the same as what is seen with traditional
application backup agents that integrate with applications like Exchange, SQL, and others,
to successfully gather backups from running application instances. The difference here is
that instead of gathering backups for transfer to tape, these solutions are gathering changes
for replication to a backup site.
Other solutions exist purely at the level of the storage device. These solutions leverage on‐
device technology for ensuring that data is replicated consistently and in the proper order.
It should be obvious that leveraging storage device‐centric solutions can be of lesser
complexity: Using these solutions, installing agents to each virtual host or machine may not
be required. Also, fewer “moving parts” are exposed to the administrator, allowing
administrators to enable replication on a per‐device or per‐volume basis with the
assurance that it will operate successfully with minimal further interaction. Depending on
your environment, one or both of these solutions may be necessary for accomplishing your
needs for replication.
55
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Note
When considering a secondary storage device for disaster recovery purposes,
you must take into account the extra technologies required to ensure data
consistency. In essence, if your backup site cannot automatically fail over
without extra effort, you don’t have a complete disaster recovery solution.
Architecting Disaster Recovery for Hyper‐V
All of this introductory discussion brings this conversation to the main topic of how to
actually enable disaster recovery in Hyper‐V. You’ll find that the earlier discussion on
storage devices and replication is fundamentally important for this architecture. Why?
Because creating disaster recovery for Hyper‐V involves stretching your Hyper‐V cluster to
two, three, or even many sites and implementing the necessary replication. The first half of
accomplishing this is very similar to the cluster creation first introduced in Chapter 2.
Note
As in Chapter 2, this guide will not detail the exact click‐by‐click steps
necessary to build such a cluster. That information is better left for the step‐
by‐step guide that is available on Microsoft’s Web site at
http://technet.microsoft.com/en‐us/library/cc732488(WS.10).aspx.
Microsoft’s terminology for a Hyper‐V cluster that supports disaster recovery is a multisite
cluster, although the terms stretch cluster and geocluster have all been used to describe the
same architecture. By definition, a Microsoft multi‐site cluster is a traditional Windows
Failover Cluster that has been extended so that different nodes in the same cluster reside in
separate physical locations.
Figure 4.4 shows a network diagram of the same cluster that was first introduced in Figure
4.1. In Figure 4.4, you can see how the high‐availability elements that were added into the
single‐site cluster have been mirrored within the backup site.
56
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 4.4: A network diagram of a multisite cluster that includes highavailability
elements.
Full Redundancy Isn’t Always Necessary at the Backup Site
This mirroring of high‐availability elements is present for completeness;
however, it is not uncommon for backup site servers to leverage fewer
redundancy features than are present in the production site.
The reason for this reduction in redundancy lies within the reason for being
for the cluster itself: Backup sites are most commonly used for disaster
operations only—often only a small percentage of total operations—so the
cost for full redundancy often outweighs its benefit. As you factor in the
amount of time you expect to operate with virtual machines at the backup
site, your individual architecture may also reveal that fewer features are
necessary.
Important to recognize in this figure is the additional iSCSI storage location that exists
within the backup site. Multi‐site Hyper‐V clusters leverage the use of local and replicated
storage within each site. Although each Windows Failover Cluster generally requires this
storage to be local to the site, its services provide no built‐in mechanisms for accomplishing
the replication. You must turn to a third‐party provider—commonly either through your
storage vendor or an application provider—to provide replication services between
storage devices.
Note
Although Microsoft has a replication solution in its Distributed File System
Replication (DFS‐R) solution, this solution is neither appropriate nor
supported for use as a cluster replication mechanism. DFS‐R only performs
replication as a file is closed, an action that does not often happen with
running virtual machines. Thus, it cannot operate as a cluster replication
solution.
57
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Choosing the Right Quorum
In Windows Server 2008, Microsoft eliminated the earlier restriction that cluster nodes all
reside on the same subnet. This restriction complicated the installation of multi‐site
clusters because the process of extending subnets across sites was complex or even
impossible in many companies. Today, the click‐by‐click process of creating a cluster across
sites requires little more than installing the Windows Failover Clustering service onto each
node and configuring the node appropriately.
Although clicking the buttons might be a trivial task, it is designing the architecture of that
cluster where the greatest complexity is seen. One of the first decisions that must be made
has to do with how the cluster determines whether it is still a cluster. This determination is
made through a process of obtaining quorum.
Obtaining quorum in a Hyper‐V cluster is not unlike how your local Kiwanis or Rotary club
obtains quorum in their weekly meetings. If you’ve ever been a part of a club where
decisions were voted on, you’re familiar with this process. Consider the analogy: Decisions
that are important to a Kiwanis club should probably be voted on by a large enough
number of club members. In the bylaws of that club, a process (usually based on the rules
of Parliamentary Procedure) is documented that explains how many members must be
present for an important item to be voted on. That number is commonly 50% of the total
members plus one. Without this number of members present, the club itself cannot vote on
important matters, because it does not see itself as a fullyfunctioning club.
The same holds true in Hyper‐V clusters. Remember first that a cluster is by definition
always prepared for the loss of one or more hosts. Thus, it must always be on the lookout
for conditions where there are not enough surviving members for it to remain a cluster.
This count of surviving members is referred to as the cluster’s quorum. And just like
different Kiwanis clubs can use different mechanisms to identify how they measure
quorum, there are different ways for your Hyper‐V cluster to identify whether it has
quorum. In Windows Server 2008, four are identified.
Node and Disk Majority
In the Node and Disk Majority model, each node gets a quorum vote, as does each disk.
Here, a single‐site four‐node cluster would have five votes: one for each of the nodes plus
one for its shared storage. Although useful for single‐site clusters that have an even number
of nodes, Node and Disk Majority is not a recommended quorum model for multi‐site
clusters. This is the case because the replicated shared storage introduces a number of
challenges with multi‐site clusters. The process of replication can cause problems with SCSI
commands across multiple nodes. Also, storage must be replicated in real‐time
synchronous mode across all sites for the disks to retain the proper awareness.
Disk Only Majority
In the Disk Only Majority model, only the individual storage devices have votes in the
quorum determination. This model was used extensively in Windows Server 2003, and
although it is still available in Windows Server 2008, it is not a recommended configuration
for either single‐site or multi‐site clusters today.
58
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Node Majority
In the Node Majority model, only the individual cluster nodes have votes in the quorum
determination. It is strongly suggested that environments that use this model do so with a
node count that is equal to three or greater in single‐site clusters, and only with an odd
number of nodes in multi‐site clusters. Clusters that leverage this model should also be
configured such that the primary site contains a greater number of nodes than the
secondary site. Further, the Node Majority model is not recommended when a multi‐site
cluster is spread across more than two sites.
The reason for these recommendations has to do with how votes can be counted by the
cluster in various failure conditions. Consider a two‐site cluster that has five nodes, three in
the primary site and two in the secondary site. In this configuration, the cluster will remain
active even with the loss of any two of the nodes. Even if the two nodes in the secondary
site are lost, the three nodes in the primary site will remain active because three out of five
votes can be counted.
Node and File Share Majority
The Node and File Share Majority adds a separate file share witness to the Node Majority
Model. Here, a file share on a server separate from the cluster is given one additional vote
in the quorum determination. It is recommended that the file share be located in a site that
is not one of the sites occupied by any of the cluster nodes. If no additional site exists, it is
possible to locate the witness file share within the primary site; however, its location there
does not provide the level of protection gained through the use of a completely separate
site.
This introduction of the file share witness to the cluster quorum determination provides a
very specific assist to multi‐site clusters in helping to arbitrate the quorum determination
when entire sites are down. Because an entire‐site loss also results in the loss of network
connectivity to all hosts on that site, the cluster can experience a situation known as “split
brain” where multiple sites each believe that they have enough votes to remain an active
cluster. This is an undesirable situation because each isolated and independent site will
continue operating under the assumption that the other nodes are down, creating
problems when those nodes are again available. Introducing the file share witness to the
quorum determination ensures that an entire site loss cannot create a split brain condition,
no matter how many nodes are present in the cluster.
Further, the Node and File Share Majority also makes possible the extension of clusters to
more than two sites. A single file share in an isolated site can function as the witness for
multiple clusters. Figure 4.5 shows a network diagram for how a witness disk can be used
to ensure complete resiliency across a multi‐site cluster even with the loss of any single
site.
59
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Figure 4.5: Introducing a Witness Server further protects a multisite cluster from a
site failure.
Obtaining Quorum
If you are considering a multi‐site cluster for disaster recovery, you will need
to select one of the two recommended quorum options (Node Majority or
Node and File Share Majority). That decision will most likely be based on the
availability of an isolated site for the witness disk but can be based on other
factors as well.
The actual process of obtaining quorum is an activity that happens entirely
under the covers within the Windows Failover Cluster service. To give you
some idea of the technical details of this process, on its Web site at
http://technet.microsoft.com/en‐us/library/cc730649(WS.10).aspx
Microsoft identifies the high‐level phases that are used by cluster nodes to
obtain quorum. Those phases have been reproduced here:
As a given node comes up, it determines whether there are other cluster
members that can be communicated with (this process may be in
progress on multiple nodes simultaneously).
Once communication is established with other members, the members
compare their membership “views” of the cluster until they agree on one
view (based on timestamps and other information).
60
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
A determination is made as to whether this collection of members “has
quorum” or, in other words, has enough members that a “split” scenario
cannot exist. A “split” scenario would mean that another set of nodes that
are in this cluster was running on a part of the network not accessible to
these nodes.
If there are not enough votes to achieve quorum, the voters wait for more
members to appear. If there are enough votes present, the Cluster service
begins to bring cluster resources and applications into service.
With quorum attained, the cluster becomes fully functional.
Ensuring Network Connectivity and Resolution
The final step in architecting your Hyper‐V cluster relates to the assurance that proper
networking and name resolution are both present at any of the potential sites to which a
virtual machine may fail over. This process is made significantly easier through the
introduction of multi‐subnet support for Windows Failover Clusters. That support
eliminates the complex (and sometimes impossible) networking configurations that are
required to stretch a subnet across sites.
This is very obviously a powerful new feature. However, at the same time, the use of
multiple subnets in a failover cluster means that virtual machines must be configured in
such a way that they retain network connectivity as they move between sites. For example,
the per‐virtual machine addressing for each virtual machine must be configured such that
its IP address, subnet mask, gateway, and DNS servers all remain acceptable as it moves
between any of the possible sites. Alternatively, DHCP and dynamic DNS can be used to
automatically re‐address virtual machines when a failover event occurs.
Any of these events will involve some level of downtime for clients that attempt to connect
to virtual machines as they move between sites. The primary delay in connection has to do
with re‐convergence of proper DNS settings both on the servers as well as clients after a
failover event. It may be necessary to reconfigure DNS settings to reduce their Time To Live
(TTL) setting for DNS entries, or flush local caches on clients after DNS entries have been
updated to reconnect clients with moved servers.
Disaster Recovery Is Finally Possible with Hyper‐V Virtualization
Although this chapter’s discussion on disaster recovery might at first blush appear to be a
complex solution, consider the alternatives of yesteryear. In the days before virtualization,
disaster recovery options were limited to creating mirrored physical machines in
alternative sites, replicating their data through best‐effort means, and manually updating
backup servers in lock‐step with their primary brethren.
61
The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields
Today’s solutions for Hyper‐V disaster recovery are still not installed through any Next,
Next, Finish process. These architectures remain solutions rather than any simple product
installation. However, with a smart architecture and planning in place, their actual
implementation and ongoing management can be entirely feasible by today’s IT
professionals. Doing so atop iSCSI‐based storage solutions further enhances the ease of
implementation and management due to iSCSI’s network‐based roots.
Your next step is to actually implement what you’ve learned in this guide. With the
knowledge you’ve discovered in its short count of pages, you’re now ready to augment
Hyper‐V’s excessively simple installation with high‐powered high‐availability and disaster
recovery. No matter whether you need a few servers to host a few virtual machines or a
multi‐site infrastructure for complete resiliency, the iSCSI tools are available to manifest
your needed production environment.
Download Additional eBooks from Realtime Nexus!
Realtime Nexus—The Digital Library provides world‐class expert resources that IT
professionals depend on to learn about the newest technologies. If you found this eBook to
be informative, we encourage you to download more of our industry‐leading technology
eBooks and video guides at Realtime Nexus. Please visit
http://nexus.realtimepublishers.com.
62