XAXD Disaster Recovery

Design Considerations for Citrix
XenApp/XenDesktop 7.6 Disaster Recovery

Citrix Solutions Lab White Paper
This paper examines the issues and concerns around building a disaster recovery plan and solution, the
possible use cases that may occur, and how a team of engineers within the Citrix Solutions Lab
approaches building a disaster recovery solution.
November 2015
Prepared by: Citrix Solutions Lab
Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery
Table of contents
Section 1: Overview ................................................................................... 5
Executive summary ...................................................................................................... 5
Audience ................................................................................................................... 5
Disaster Recovery vs. High Availability ........................................................................ 6
Defining types of Disaster Recovery ......................................................................... 6
Defining what is critical .............................................................................................. 7
Section 2: Defining the Environment .......................................................... 8

Service Descriptions..................................................................................................... 9
User to Site Assignment ............................................................................................. 10
User Counts by Region ........................................................................................... 10
Regional Site 1 Network Diagram ........................................................................... 11
Regional Site 2 Network Diagram ........................................................................... 12
Cold DR Site Network Diagram ............................................................................... 13
Software ..................................................................................................................... 14
Hardware .................................................................................................................... 14
Servers .................................................................................................................... 14
Network ................................................................................................................... 15
Storage....................................................................................................................... 16
Use Cases.................................................................................................................. 17
Section 3: Deployment ............................................................................. 18

Configuration Considerations ..................................................................................... 19
Region Server Pools .................................................................................................. 22
Failover Process ........................................................................................................ 25
Section 4: Conclusion............................................................................... 27
Section 5: Appendices.............................................................................. 28
Appendix A ................................................................................................................. 28
citrix.com
References .............................................................................................................. 28
Appendix B ................................................................................................................. 29
High Level Regional Diagrams ................................................................................ 29
Appendix C ................................................................................................................. 31
Identifying Services and Applications for DR/HA ..................................................... 31
citrix.com
Section 1: Overview
Executive summary
There is much conversation around executing disaster recovery for a data center, and utilizing high
availability wherever possible. However, what are the requirements around disaster recovery, and how
does it differ from high availability? How do they work together to ensure your systems and applications
are up and available, no matter what?
This white paper looks at understanding disaster recovery and high availability. As with most things in life,
there are trade-offs. The more resilient to failure you want to be, the more it is going to cost. How do
these trade-offs affect you? There is the old-fashioned approach of writing everything of importance to
tape, storing the tape off-site, and waiting for a disaster to occur. Tape is a very low cost option, but it
could take days or weeks to rebuild your environment. The other end of the spectrum comes from utilizing
todays technology and making everything active/active, essentially running two complete data centers in
two different locations. The two data centers option is an extremely resilient, but also extremely costly
option. Simply, you are betting you are going to have a disaster that affects at least one of your sites.
What exactly needs to be up and running as quickly as possible after a failure of your data center? Where
does high availability come into play to help? This document looks at some of these questions, and asks
a few more, to help you understand and make good decisions in building a disaster recovery plan.
This project is not looking at sizing, scaling, or performance, but at design considerations for disaster
recovery. In the Solutions Lab, a team of engineers including lab hardware specialists, network
specialists, storage specialists, architects, and Citrix experts were challenged to build a disaster recovery
solution for a fictitious company defined by Solutions Lab Management. This document shows how the
company was defined, how the team architected and then implemented a solution, and some of the
issues and problems they uncovered as flaws in their plan or things they did not expect or anticipate. The
end result plan was compared to how companies such as Citrix handle disaster recovery, and it was
found to be very similar. The team had an advantage in that they were able to build the company data
center to fit their design, not try to fit a design to an existing data center. Hopefully what they learned and
uncovered will assist you as you think about building your own disaster recovery plan.
Note that a major component of any disaster recovery solution is the storage and storage vendor used.
The concerns are around the amount of data to be moved between the sites and the acceptable delta
between data synchronizations. For this paper, we worked with EMC, utilizing their storage solution to
achieve our defined goals.
Audience
This paper was written for IT experts, consultants, and architects tasked with designing a disaster
recovery plan.
citrix.com
Disaster Recovery vs. High Availability

Before we can proceed, we need to align on some definitions and terms. For this paper, High Availability
(HA) is focused more on the server level, and is configured in such a manner that the end user
experiences little to no down time. Recovery can be automatic, simply failing over to another host, server,
or instance of the application. HA is often thought of in terms of N+1, the addition of one more server
(physical or virtual) or application than is required. If five physical servers were required to support the
workload, then six would be configured with the load distributed across all six servers. If any single server
fails, the remaining five can pick up the workload without significantly affecting the user experience. With
software like Citrix XenDesktop, the same approach applies. If one delivery controller/broker, provisioning
server, or SQL server is not sufficient to support the workload, a second one is deployed. Depending on
the software, this can be either Active/Active where all are actively processing, or Active/Passive where
the HA only becomes active in failure of the first system. In XenDesktop, we always recommend an
Active/Active deployment.
Disaster Recovery (DR) implies a complete disaster, no access to the site or region, total failure. The
recovery will require manual intervention at some point, and the response times for being operational
again are defined by the disaster recovery specifics. We will talk more about this later in this paper.
Defining types of Disaster Recovery

For HA, we talked in terms of Active/Active and Active/Passive, where these terms define how the HA
components act, either all are up and supporting users or one is awaiting a failure event and then
proceeds to pick up the load. These terms can be applied to DR as well:
Active/Passive (A/P) referred to as planned or cold

o
Once a disaster strikes the second site must be brought up entirely
Only as current as the last back up
Could have hardware sitting idle waiting for disaster
Active/Active (A/A) referred to as hot sites

o
Everything replicated in the disaster site
Duplicate hardware
Everything that occurs on the primary site also occurs on the secondary site
Load balanced
Active/Warm (A/W) referred to as reactive or warm

o
Some components online, ready
Must define priority recovery
When disaster occurs, provision capacity as needed
In A/P, depending on how quickly you need to be back up and running, it may be as simple as backing up
to tape and in a disaster restoring from tape to available hardware. This is the lower cost solution, but not
very resilient or quick for recovery. A/A has duplicate hardware and software running and supporting
users. In a multi-site scenario, each site must have enough additional hardware to support the user
failover. A/A is much quicker to recover from a disaster, but much more expensive from a Capital
Expenditure (CAPEX) cost with hardware. Essentially, each site has a complete duplicate set of underutilized hardware waiting for a disaster. With A/W, the plan is to define that which is critical to the
company and what must be recovered as quickly as possible and having enough bandwidth at the other
site(s) to support the requirement. Once the most critical environment is defined, the rest of the company
can be dealt with. This does require some extra hardware in each region, but we can better manage the
resources and costs.
citrix.com
Defining what is critical

In an A/A deployment, the thought is that everything is critical, and must be up and running. In an A/P
deployment, critical uptime is not important. However, for A/W we must define what is critical, and which
users are critical. The following terms are used going forward:
Mission Critical (MC) Highest Priority

o
Requires continuous availability
Breaks in service are very impactful on the company business
Availability required at almost any price
Mission critical users are highest priority in event of a failure
Business Critical (BC) High Priority

o
Requires continuous availability, though short breaks in service are not catastrophic
Availability required for effective business operation
Business critical users have a less stringent recovery time
Business Operational / Productivity (PR) Medium Priority

o
Contributing to efficient business operation but doesn't greatly affect business
Regular users, may not fail over, or done so as final steps
As stated earlier, we created a fictitious company for this disaster recovery plan scenario. This company
has a single Mission Critical application and a single Business Critical application, and associated users.
The company president defined the acceptable response times and requirements, including a desire to
have a warm failover for mission- and business-critical users, and a passive failover for the rest of the
company. The following sections highlight the development and implementation of the plan.
citrix.com
Section 2: Defining the Environment

For this setup, the fictitious business was structured as one business with two regional sites. The
business requires both company database (which is considered Mission Critical) and Exchange (which is
considered Business Critical) availability. Region 1 focuses on company infrastructure and Region 2
focuses on a call center. MC and BC users are spread across multiple groups in each region. This setup
must also be able to handle the total failure of both Regions 1 and 2 at the same time.
In a single region failure, the recovery goals for our setup are for MC applications and users to be back up
and running within two hours with minimal data loss. BC applications and users must be back up and
running within four hours with up to 60 minutes of acceptable data loss. If Regions 1 and 2 both fail, the
third site must be up and running within five days with no more than 24 hours of acceptable data loss.
For a closer look at this diagram by region, see Appendix B at the end of the paper.
citrix.com
Service Descriptions
This table defines our MC, BC and PR services and applications and our considerations in handling them
in our setup.
Service
Type
Service
Description
Configuration
Requirements
Mission
Critical
Microsoft SQL
Sample
Database Northwind
SQL Sample
Northwind Database
is used along with a
web server. This
represents the Call
Center mission
critical application
database.
SQL Sample database is

deployed at all locations.
Replication is handled by
storage backend.
In case of major failure, the

database must be delivered
from the DR data center.
Microsoft
Exchange /
Outlook
Access to email data

for business critical
users from Exchange
database
The database is replicated

between primary and
secondary locations using
Exchange database copies.
Business
Critical
A maintenance message must

be presented to external users
when the database is not
available.
The database is backed up

every 4 hours to the storage
in DR location.
Business
Microsoft Office
Operational and file shares
/
Productivity
All users use

Microsoft Office to
create and review
documents.
Documents are
stored on file server
shares synced
between regions.
DFS Replication is
In case of disaster, a limited set
configured between primary of users must have access to
sites and file-based backup the DR file share location.
is performed to the DR
location every 8 hours.
Published Microsoft Office
must be unavailable to users
when the file share is not
available.
Microsoft Office is
published on
XenApp.
citrix.com
User to Site Assignment

Each regional site in our setup has different type of users. Region 1 is focused on HR and engineering.
Region 2 is focused on call center users. A majority of the users are hosted shared desktops, the
remaining users are VDI users, either pooled or dedicated.
User Counts by Region

The table below shows the breakdown of users by region and how they are organized within the regions.
Region 1 User Counts
Mission Critical
Business
Critical
Business
Operational / PR
Engineering
30
60
560
HR
10
10
20
Management
Region 1 Grand Total
45
75
580
Region 2 User Counts
Mission Critical
Business
Critical
Business
Operational / PR
Call Center
20
60
520
Engineering
10
HR
25
Management
Region 2 Grand Total
40
90
citrix.com
50
570
10
Regional Site 1 Network Diagram
For Region 1, the server configuration consisted of:
Three physical servers running XenServer, hosting infrastructure VMs.
Four physical XenApp hosts in a single delivery group, as a 3+1 HA model supporting the
business operational users.
Four physical hosts running XenServer configured as a pool, in a 3+1 HA model supporting the
mission- and business-critical users. This pool supported the following configuration:
30 Windows 8.1 Dedicated VDI VMs
90 Windows 8.1 Random Pooled VDI VMs
5 Windows 2012 R2 Multi-user XA/HSD VMs supporting 80 users
The Region 2 failover pool in Region 1 is four XenServer hosts in a 3+1 model supporting the
following configuration:
citrix.com
3 SQL 2014 VMs in a cluster (Call center database failover)

11
Regional Site 2 Network Diagram
For Region 2, the server configuration consisted of:
Three physical servers running XenServer, hosting infrastructure VMs, including the SQL call
center cluster.
Four physical XenApp hosts in a single delivery group, as a 3+1 HA model supporting the
business operational users.
Four physical hosts running XenServer configured as a pool, in a 3+1 HA model supporting the
mission- and business-critical users. This pool supported the following configuration:
The Region 1 failover pool in Region 2 is four XenServer hosts in a 3+1 model supporting the
following configuration:
citrix.com

12
Cold DR Site Network Diagram
For the DR site, the Region 1 disaster recovery site was set up with four XenServer hosts in a 3+1 HA
model supporting the following configuration:
o
Windows 8.1 Dedicated VDI VMs
Windows 2012 R2 Multi-user XA/HSD VMs
Infrastructure VMs
The Region 2 disaster recovery site was set up with four XenServer hosts in a 3+1 HA model supporting:
o
Windows 8.1 Dedicated VDI VMs
Windows 2012 R2 Multi-user XA/HSD VMs
Infrastructure VMs
Note: The networks for Region 1 and Region 2 in this site are set up with the same IP ranges as in the
original regional sites.
citrix.com
13
Software
The following is a list of software components deployed in the environment:
Component
Version
Virtual Desktop Broker
XenDesktop 7.6 Platinum Edition FP2
VDI Desktop Provisioning
Provisioning Services 7.6
Endpoint Client
Citrix Receiver for Windows 4.2 (ICA)
Web Portal
Citrix StoreFront 3.0
License Server
Citrix License Server 11.12.1
Office
Microsoft Office 2013
Virtual Desktop OS (Pooled VDI)
Microsoft Windows 8.1 x64
Virtual Desktop OS (Hosted Shared Desktops)
Microsoft Windows Server 2012 R2 Datacenter
Database Server
Microsoft SQL Server 2014
Hypervisor
XenServer 6.5 SP1
Network Appliance
NetScaler VPX, NS11.0: Build 62.10.nc
WAN Optimization
CloudBridge WAN Accelerator

CBVPX 7.4.1
Storage Network
Brocade 5100 switch
Storage DR
For XtremIO: EMC RecoverPoint 4.1 SP2 P1

For Isilon: OneFS 7.2 SyncIQ
Note: All software is updated to run the latest hotfixes and patches
Hardware
Servers
The hardware used in this configuration were blade servers with 2-socket Intel Xeon E5-2670 @
2.60GHz, with 192 GB of RAM and two internal hard drives.
citrix.com
14
Network
VMs were utilized as site edge devices that helped route traffic between regions. The perimeter network
(also known as a DMZ) had a firewall between itself and the internet and another firewall between the
perimeter network and production network.
NetScaler Global Site Load Balancing (GSLB) was used to determine which region the user is sent. If
available, users are sent to their primary region. When the primary region is not available, users are sent
to their secondary region. A pair of NetScaler VPX appliances per region were utilized for authentication,
access, and VPN communications. Additionally, a pair of NetScaler Gateway VPX appliances were
utilized per region to allow connectivity into the XenApp/XenDesktop environment. CloudBridge VPX
appliances were utilized for traffic acceleration and optimization between regions. NetScaler CloudBridge
Connector was configured for IPSec tunneling.
The following diagram is a detailed architectural design of our network implementation.
citrix.com
15
Storage
Storage was configured using EMC XtremIO All-Flash Storage and Isilon Clustered NAS systems.
Storage Network for EMC XtremIO was configured with Brocade Fibre Channel SAN switches. The
following diagram gives a high level view for Region 1. As stated previously, failover to a DR site requires
manual intervention, so the concern in syncing data comes down to a math problem. How much data do
you need to sync between sites and what size pipe between the sites? That determines how long it will
take to sync. Can you sync in the time allowed? If not, what do you have to correct the problem, reduce
the amount of data or increase the pipe speed?
One thing to look at is the LUNs, or storage repositories. Our design created multiple volumes for mission
critical data and business critical data, and scheduled syncs accordingly. It is crucial that you work with
the storage vendor to get the proper configuration.
citrix.com
16
Use Cases
The following use cases define the possible scenarios that must be considered and, for our case study,
the users that must be supported. The minimum implies the mission critical and business critical users
that need to be supported.
Use Case 1
The sites are configured as Active/Active using NetScaler GSLB.
If the Region 1 site fails, mission- and business-critical users will be able to connect and log on to
the Region 2 site with the same data resources as were available in the Region 1 site.
With the Region 1 site back online, NetScaler GSLB will direct users to the correct site, as Region
1 site users log off from the Region 2 site and then log back into the Region 1 site.
A maximum of 120 users will have warm HA failover capability from Region 1 to Region 2.
Use Case 2
The sites are configured as Active/Active using NetScaler GSLB.
If the Region 2 site fails, mission- and business-critical users will be able to connect and log on to
the Region 1 site with the same data resources as were available in the Region 2 site.
With the Region 2 site back online, NetScaler GSLB will direct users to the correct site, as Region
2 site users log off from the Region 1 site and then log back into the Region 2 site.
A maximum of 130 users will have warm HA failover capability from Region 2 to Region 1.
Use Case 3 Cold DR
The sites configured as Active/Passive, with the goal of failing over only the mission critical users
from the Region 1/Region 2 sites to the DR site.
This site will be based on backup data from Region1 and Region 2 and will go live within 5 days.
Manual process to switch to the DR site
When users login to the DR site, they should have any changes/modifications in their
dedicated environment in the DR site environment. There is potential of data loss between
the last site to site copy and the failover. Once failed over to DR site, when Region 1/Region
2 return online, and after allowing appropriate time for replication between sites, login should
connect to Region 1/Region 2 and the changes should be reflected there.
The cold DR site will contain subset of the regional sites including networking, infrastructure
and dedicated VDIs.
o
citrix.com
This approach allows us to both easily recover from disaster with backups, and later
rebuild regional sites from the DR site data.
Mission Critical users will have primary access to the cold DR site, followed by Business
Critical, and then the rest of the company depending on timelines and disaster impact.
o
A maximum of 45 users will have cold DR access from Region 1.
A maximum of 40 users will have cold DR access from Region 2.
17
Section 3: Deployment
In building this configuration, this document is not a step by step manual, but a guide to help understand
what needs to be done. Wherever possible, Citrix documentation was followed around deployment and
configuration. The following configuration sections highlight any deviations or areas of importance to help
with a successful deployment.
Implementing the software breaks down to two major areas. First, putting the correct software into each
region. Second, configuring NetScaler for GSLB.
The process followed for deployment was:
1. Deploy XenServer pools.
2. Create required AD groups and DHCP scopes.
3. Prepare SQL Environment (SQL AlwaysOn). PVS 7.6 adds support for AlwaysOn.
4. Deploy XenDesktop environment.
5. Deploy Storefront servers and connect to XenDesktop.
6. Deploy PVS environment and create required vDisks.
7. Configure NetScaler GSLB, create site and service.
8. Configure NetScaler Gateway in Active/Passive mode and update Storefront configuration.
9. Deploy Microsoft Exchange Environment.
The NetScaler configurations are straightforward, there was nothing special done with configuring
StoreFront. This was a typical XenDesktop and NetScaler Gateway configuration. Two StoreFront servers
were configured to be load balanced by NetScaler.
NetScaler GSLB is where the focus is:
Using LB Method StaticProximity: Region 1 users will be sent to Region 1 if it is online,

otherwise the users will be set to Region 2 and vice versa.
Using location settings in NetScaler to define the primary regions of the clients local DNS
Servers and for the GSLB sites and services.
Users regardless of region use the same Fully Qualified Domain Name (FQDN) (i.e.
desktop.domain.com) NetScaler running ADNS will answer authoritatively with the IP of
primary site.
Once the user is redirected to the proper site, the user authenticates at AG, and is then
redirected to local StoreFront to get access to resources.
Additionally, NetScaler CloudBridge Connector is configured for IPSec tunneling:
citrix.com
An IPSec tunnel for AD replication, server/client communication is created using the

outbound connection.
A second IPSec tunnel is created for site to site data replication.
18
Configuration Considerations
The following defines some of the specific configurations applied to environment:
XenApp/XenDesktop
Regional Sites R1 and R2

o
2 Delivery Controllers primary regional site
FMA Services to have SSL on Controllers and change XML Service ports from HTTP
to HTTPS ports to secure traffic communication
XD/XA Database on the Always On SQL group
SSL to VDA feature of XenApp and XenDesktop 7.6
Hosted shared desktops
5 Machine Catalogs
Physical XA HSD
XA HSD MC
XA HSD BC
XA HSD MC Failover
XA HSD BC Failover
5 Delivery Groups matching the catalogs
Pooled VDI desktops
4 Machine Catalogs
PR
BC
PR Failover
BC Failover
4 Delivery Groups
Dedicated VDI Desktops
citrix.com
Must have unique Site Database naming
4 Machine Catalogs
MC
BC
MC Failover
BC Failover
4 Delivery Groups
19

VDI Virtual Desktops
o
Pooled Random VDI Desktops
VDI VMs Streamed from PVS vDisk
Dedicated VDI Desktops
Static VMs
My Documents must be redirected to a network location on File Share
XenApp / HSD
o
Deployed in two models
Physical Hosts in N+1 HA Model manually installed on hardware
Virtualized XA HSD VMs in N+1 HA model streamed from PVS vDisks
User Profile Manager

o
Hosted Shared Desktop: User Profile Data: \\FS01\ProfileData\HSD\

#SAMAccountName#
Hosted Virtual Desktop: User Profile Data: \\FS01\ProfileData\HVD\

#SAMAccountName#
Hosted Virtual Desktop: User Profile Data: \\FS01\ProfileData\MC\

#SAMAccountName#
User Profile and Folder redirection Policies
StoreFront VMs
o
SSL configured to secure traffic communication
2 StoreFront Servers (HA) and LB by NetScaler VPX
Authentication is configured on NetScaler Gateway.
License Server VM
citrix.com
2 HA license servers
SSL configured to secure traffic communication
Windows 2012 RDS Licenses
Citrix Licensing Server
20

Isilon Scale-out NAS for each Site
o
4 - X410 Nodes
34 TB HDD + 1.6 TB SSD
128 GB RAM 8x16 GB
Provisioning Services
o
2 PVS Server VMs in HA
PVS DB Server configured on SQL AlwaysOn
Utilizing remote storage location for vDisks on each PVS remote storage attached
to PVS VMs as 2nd drive via File Server and SMB/CIFS.
Separate locations for vDisk store for Mission Critical and Business Critical
vDisks on File Server via SMB/CIFS
Regular vDisks located on local File Servers
Multihomed
Utilizing Guest VLAN as Management interface
Utilizing the PXE VLAN for Streaming interface
DHCP for PVS network/PXE VLAN
Cache in device RAM with overflow on hard disk
256MB for Windows 8.1 VDI
2048MB for XA HSD
NetScaler VPX VMs

o
citrix.com
2 LB VPX in HA mode
LDAP Authentication
AG VIP
VPN
GSLB for regional sites
2 - VPXs for LB of StoreFront and XML services
21
Region Server Pools

The following defines the VM breakdown per region for the different pools required within the
infrastructure environment. In all cases, the VMs were balanced across XenServer hosts, and VMs were
configured in an HA model; a minimum of two VMs for each required application.
Region 1:
2 XenDesktop Brokers
2 - StoreFront VMs
2 - License Server VMs
2 - Provisioning Services
2 - File Server VMs
3 - SQL 2014 Database Server VM Always On
2 AD DC VMs
4 - Exchange server VMs
2 Mailbox
2 Client Access
Perimeter Network
1 Firewall / Router VM
2 - NetScaler VPX VMs HA Model User Access
2 CloudBridge VPX VMs - HA Model - Active/Passive Site to Site user access WAN
optimization
2 CloudBridge VPX VMs - HA Model - Active/Passive Site to Site data replication
2 - NetScaler VPX VMs HA Model Data Replication
R2 HA Fail-Over Pool
citrix.com
5 XA HSD VMs
25 Pooled VDI VMs
25 Dedicated VDI VMs
3 SQL Server VMs (Call Center Cluster)
22

Region 2:
2 XenDesktop Brokers
2 - StoreFront VMs
2 - License Server VMs
2 - Provisioning Services
3 SQL 2014 Database Server VMs SQL Cluster
3 - SQL 2014 Database Server VM Always On
2 - File Server VMs
2 AD DC VMs
4 - Exchange server VMs
2 Mailbox
2 Client Access
Perimeter Network
2 - NetScaler VPX VMs HA Model User Access
2 CloudBridge VPX VMs - HA Model - Active/Passive Site to Site user access WAN
optimization
2 CloudBridge VPX VMs - HA Model - Active/Passive Site to Site data replication
2 - NetScaler VPX VMs HA Model Data Replication
R1 HA Fail-Over Pool
citrix.com
5 XA HSD VMs
10 Pooled VDI VMs
30 Dedicated VDI VMs
23

Region 3:
Infrastructure Pool to support Region 3
2 AD DC VMs
1 VM to handle backups from regions 1 and 2
Region 1 Infrastructure Pool
2 AD DC VMs
2 Delivery Controllers
2 StoreFront VMs
2 License Server VMs
1 File Server VMs
2 SQL 2012 Database Server VM Always On
4 Exchange server VMs
2 Mailbox
2 Client Access
Region 2 Infrastructure Pool
2 AD DC VMs
2 Delivery Controllers
2 StoreFront VMs
2 License Server VMs
1 File Server VMs
2 SQL 2014 Database Server VM Always On
2 SQL 2014 Database Server VM SQL Cluster
4 Exchange server VMs
2 Mailbox
2 Client Access
Perimeter Network
2 NetScaler VPX VMs R1/R2 Access
VIP per Region 2 VIPs
Note: The infrastructure VMs for regions 1 and 2 were duplicated in region 3 for networking purposes. By
setting the networks correctly in region 3, once regions 1 and 2 were brought up, no network changes
were required in their infrastructure or VHD files.
citrix.com
24
Failover Process
The dedicated VMs present the biggest challenge in a failure. To address this, VMs are created in both
regions for the failover dedicated VMs from the other region. However, no storage is attached to these
VMs. In the event of a failure, these VMs will be assigned the proper VHD file from the backup storage
location. It should also be noted that for fail-back after the failed region is back online, the dedicated VM
VHD files will be deleted in the failed region and copied back from the failover region and attached to the
proper VM. This ensures the latest version of the dedicated VMs will be restarted after the fail-back.
Note: In dealing with dedicated VMs, we realized that we had to carefully name the VHD files and
associated files to ensure connecting the correct VHD file to the correct VM in failover and fail-back.
If there is a failure in either Region 1 or Region 2 (whats called a warm failover), a few steps need to be
taken. The actions differ depending on the failure. If it is a network access issue, or the Internet is down,
the dedicated VMs in the failed region are placed in Maintenance Mode in Citrix Studio and shut down.
The latest storage backup of the dedicated VMs in the new region must be made available and the
storage for each VM needs to be attached individually to the pre-created VMs already present. Group
policy is applied to the dedicated VMs OU which import the registry value, listing the delivery controllers
host names, allowing VDA registration with the local delivery controllers. The pooled VDI and XA HSD
VMs on the local delivery site are also taken off Maintenance Mode and brought online.
For Region 2, the SQL database for the call center application is brought online as well. Depending on
the type of failure, you may need to power down the failed region firewall to force failover to the other
region.
Once those steps are completed, you boot Mission Critical User VMs and Business Critical User VMs.
Mission- and Business-Critical data is kept in sync between the sites. You can then communicate the
availability to your users. The end users use the same URL as always, with GSLB redirecting as required.
For fail-back after recovery of the failed region has completed, the steps are to sync all storage back to
the failed site, perform the necessary steps for the dedicated VMs, bring the applications back online, and
bring up the users.
In a full loss of both Regions 1 and Regions 2, the DR site, or Region 3, needs to be brought online. The
physical servers are powered up, making the XenServer pools accessible. The latest database and
Exchange information are imported and the infrastructure for user VDI VMs should be restored and
brought online. A new URL is required to log in. Once the site has been brought online, any new
information, like a new URL for access, needs to be given to your users.
citrix.com
25

The following defines steps required to recover and bring Region 3 as defined back online:
Active Directory, DNS and DHCP

o
Import Domain Controllers from backup and restore Active Directory functionality
Update DNS Records for Storefront / Access Gateway / Exchange MX
Create DHCP Scopes
NetScaler
o
XenServer
o
Restore access to file services, user data and UPM.
XenDesktop Environment
o
Import SQL VMs and restore XenDesktop, PVS and Call Center application databases
Import StoreFront, XenDesktop and PVS VMs and test connectivity to databases
Exchange Environment
o
Turn existing XenServer Pools on
File Services
o
Rebuild NetScaler components, NetScaler Gateway
Import Client Access and Mailbox Servers and restore databases
External DNS
citrix.com
Update External DNS records for Access Gateway URLs
Update External MX records for email
Update Outlook Anywhere, Active Sync, etc. DNS records
26
Section 4: Conclusion
As stated in the beginning, the goal of this project was to challenge a group of engineers with creating a
disaster recovery plan for a fictitious company. This meant understanding what was mission critical,
business critical, and normal day-to-day work, and what applications and data needed to be ready in case
of a disaster. This also meant understanding user needs for issues like dedicated VMs. This paper
highlights and defines some of the issues around creating a disaster recovery environment. This is not a
how-to, step-by-step manual, but a guide to help you understand the issues and concerns in doing
disaster recovery, and things to consider when defining your disaster plan. It shows you how the Citrix
Solutions Lab team of engineers defined, designed, and implemented a DR plan for a fictitious company.
This may not be the optimal solution overall for your company, but it is one that you can utilize as a base
line of considerations and operational steps to be used when you create your disaster recovery plan for
you company.
During the process of deploying and testing, there were some realizations and changes made. One of the
first was around failing back after a failover; how to handle the data. Do you sync back, or delete and
copy back? Our decision was to delete and copy back, ensuring the original site is clean and up to date.
Another realization was around the configuration of GSLB and the failed site. Since preparing the fail
over site for access requires manual intervention, there is potential for GSLB to re-direct users to the fail
over site before it is ready, users could hit a StoreFront before any personal desktops or applications are
available for them, they would have access to any common applications or desktops.
We used two different SQL approaches, Always-on for our infrastructure environment and clustering for
our data base application. This was done by design in the lab to show issues and considerations around
both.
To support high availability between the two main regions and having a third region for total failover the
one thing that our company president was less than thrilled with was the Cap-Ex cost of hardware not
being fully utilized. This is a cost of doing business.
However, with the recent introduction of Citrix Workspace Cloud, an alternate may have come up that we
are reworking our fictitious company toward. Rather than having additional hardware in Regions 1 and
2, what if there was a cloud site running at a minimum waiting for a region to fail, and spin up what is
needed to support the failure? Essentially, what is needed in the cloud is a NetScaler VPX for
connectivity, an AD server, a SQL Always on server, and an Exchange server. This keeps the mission
critical and business critical environments in sync. You can then determine what else may be required to
support each region. The one current caveat of the cloud is that currently no cloud supports desktop
operating systems; VDI users get server operating systems running in a desktop mode. This is not a
major issue for pooled VDI users, but does become something to be solved for dedicated VDI users.
Will the cloud work for you? Should you use additional hardware in your regions? What are your recovery
times? How much of your environment is actually mission critical? These are questions we hope you are
now considering as you build a disaster recovery plan for your company.
citrix.com
27
Section 5: Appendices
Appendix A
References
EMC Storage
http://www.emc.com/en-us/storage/storage.htm?nav=1
Brocade Storage Network
http://www.brocade.com/en/products-services/storage-networking/fibre-channel.html
XenApp
http://www.citrix.com/products/xenapp/overview.html
XenDesktop
http://www.citrix.com/products/xendesktop/overview.html
NetScaler
http://www.citrix.com/products/netscaler-application-delivery-controller/overview.html
CloudBridge
http://www.citrix.com/products/cloudbridge/overview.html
Citrix CloudBridge Data Sheet:
https://www.citrix.com/content/dam/citrix/en_us/documents/products-solutions/cloudbridge-data-sheet.pdf
citrix.com
28
Appendix B
High Level Regional Diagrams
citrix.com
29
citrix.com
30
Appendix C
Identifying Services and Applications for DR/HA
This section identifies all the applications, services and data items for planning within our setup.
Call Center
Type: Database and App
Description: Main application for call center activity required for company mission critical function
Level: Mission Critical
Primary Location: Region 2 (West Coast), Region 1, R3/DR in case of failover or disaster
Access Methods:
Local Web Browser
Published App Web Browser
Data: SQL Database
actual test database - Microsoft SQL Sample StoreFront
Data Location: SQL 2014 Cluster

Systems:
SQL 2014 Database servers
Web Servers
Notes:
Database servers and database must be made accessible in R1 and R3/DR in case of fail-over or
disaster
Both database and web site for it would need to be created
http://businessimpactinc.com/install-northwind-database/
https://msdn.microsoft.com/en-us/library/vstudio/tw738475%28v=vs.100%29.aspx
Exchange
Type: Service
Description: Email service, required for internal and external communication
Level: Business Critical
Primary Location: Region 1 & 2, R3/DR in case of disaster
Some Exchange databases are region specific
Access Methods:
Local Outlook Application
Published Outlook Application
Web Outlook
Data: Exchange Databases

citrix.com
31

Data Location: Exchange Servers
Systems:
Exchange Mailbox Servers
Exchange Client Access Servers
Notes
Exchange will need to be accessible in DR scenario in R3/DR for mission-critical users
Microsoft Office
Type: Application
Description: Productivity applications for regular office work
Level:
Outlook - Business Critical
other office apps - Productivity

Access Methods:
Local Outlook Application
Published Outlook Application
Web Outlook
Data:
Outlook Data File
Outlook Address Book
Exchange Mailbox
Exchange Address Book
Data Location:
Exchange Servers
User Outlook file location (redirected from My Documents to UPM storage?)
Systems:
Exchange Mailbox Servers
Exchange Client Access Servers
Notes:
Outlook needs to be available in all regions in case of failover for business critical users.
XenDesktop
Type: Service
Description: Virtual Desktop Brokering and management system, required for virtual desktop access and
assignment
citrix.com
32

Data: XD Site Databases, region specific.
Data Location: SQL Always On HA Group
Systems:
XD Delivery Broker Server VMs
Citrix Licensing Server VMs
Notes:
Must be available in all regions for mission- and business-critical users to be able to access
desktops.
For R3/DR the XenDesktop database and SQL servers supporting it are required to be brought
up before the XD Deliver Controllers
Licensing server must be available for XenDesktop functionality to allow user connections
StoreFront
Type: Service
Description: Web Portal into the XenDesktop environment, required for user session access
Access Methods: Web Browser, Citrix Receiver
Data: SF configuration
Data Location: SF servers
Systems: Storefront Server VMs
Notes:
Must be available in all regions for mission- and business-critical users to be able to access
desktops.
Provisioning Services
Type: Service
Description: Virtual Desktop VM streaming and deployment system, required for the virtual desktop VMs
launch
Access Methods: PXE and DHCP for the Virtual Desktop VMs
Data:
PVS Farm Databases
vDisks
Data Location:
Farm Database - SQL Always On HA Group
citrix.com
33
vDisks File Servers
Systems:
PVS Server VMs
File Servers (for vDisks)
Notes:
Licensing server must be available for PVS functionality to allow virtual desktop launch
User Profiles
Type: Data
Description: User data required for all users work on virtual desktops
Access Methods: SMB
Data: User personal data, including redirected My Documents
Data Location: UPM File Servers
Systems: File Server VMs
citrix.com
34
Corporate Headquarters
India Development Center
Latin America Headquarters
Fort Lauderdale, FL, USA
Bangalore, India
Coral Gables, FL, USA
Online Division Headquarters
UK Development Center
Santa Barbara, CA, USA
Chalfont, United Kingdom
Silicon Valley Headquarters

Santa Clara, CA, USA
EMEA Headquarters
Schaffhausen, Switzerland
Pacific Headquarters
Hong Kong, China
About Citrix
Citrix (NASDAQ:CTXS) is leading the transition to software-defining the workplace, uniting virtualization, mobility management, networking
and SaaS solutions to enable new ways for businesses and people to work better. Citrix solutions power business mobility through secure,
mobile workspaces that provide people with instant access to apps, desktops, data and communications on any device, over any network
and cloud. With annual revenue in 2014 of $3.14 billion, Citrix solutions are in use at more than 330,000 organizations and by over 100
million users globally. Learn more at www.citrix.com
citrix.com
35

Copyright 2015 Citrix Systems, Inc. All rights reserved. XenApp, XenDesktop, XenServer, CloudBridge, and NetScaler are trademarks of
Citrix Systems, Inc. and/or one of its subsidiaries, and may be registered in the U.S. and other countries. Other product and company names
mentioned herein may be trademarks of their respective companies.

XAXD Disaster Recovery

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

XAXD Disaster Recovery

Transféré par

Droits d'auteur :

Formats disponibles

Design Considerations for Citrix

XenApp/XenDesktop 7.6 Disaster Recovery

Prepared by: Citrix Solutions Lab

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Section 2: Defining the Environment .......................................................... 8

Section 3: Deployment ............................................................................. 18

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Disaster Recovery vs. High Availability

Defining types of Disaster Recovery

Active/Passive (A/P) referred to as planned or cold

Once a disaster strikes the second site must be brought up entirely

Only as current as the last back up

Could have hardware sitting idle waiting for disaster

Active/Active (A/A) referred to as hot sites

Everything replicated in the disaster site

Active/Warm (A/W) referred to as reactive or warm

Some components online, ready

Must define priority recovery

When disaster occurs, provision capacity as needed

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Defining what is critical

Mission Critical (MC) Highest Priority

Requires continuous availability

Breaks in service are very impactful on the company business

Availability required at almost any price

Mission critical users are highest priority in event of a failure

Business Critical (BC) High Priority

Availability required for effective business operation

Business critical users have a less stringent recovery time

Business Operational / Productivity (PR) Medium Priority

Contributing to efficient business operation but doesn't greatly affect business

Regular users, may not fail over, or done so as final steps

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Section 2: Defining the Environment

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

SQL Sample database is

In case of major failure, the

Access to email data

The database is replicated

A maintenance message must

The database is backed up

All users use

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

User to Site Assignment

User Counts by Region

Region 1 User Counts

Region 1 Grand Total

Region 2 User Counts

Region 2 Grand Total

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Regional Site 1 Network Diagram

For Region 1, the server configuration consisted of:

Three physical servers running XenServer, hosting infrastructure VMs.

30 Windows 8.1 Dedicated VDI VMs

90 Windows 8.1 Random Pooled VDI VMs

5 Windows 2012 R2 Multi-user XA/HSD VMs supporting 80 users

25 Windows 8.1 Dedicated VDI VMs

25 Windows 8.1 Random Pooled VDI VMs

5 Windows 2012 R2 Multi-user XA/HSD VMs supporting 80 users

3 SQL 2014 VMs in a cluster (Call center database failover)

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Regional Site 2 Network Diagram