Vous êtes sur la page 1sur 11

Achieving Cost-Effective, Flexible Disaster

Recovery Systems
Solving the Disaster Recovery Challenge:
Understanding today's options

Focus on Pillar Data Systems

By Tony Asaro
Senior Analyst
Enterprise Strategy Group

September 2006

Copyright ©2006. The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems

Introduction
Disasters happen at all levels with varying degrees of impact to the business. One of the most common,
everyday events involves files that get corrupted or accidentally deleted. Servers are vulnerable to software
bugs and viruses, storage systems suffer disk drive failures, and networks can become unavailable. And
the worst scenario is a data center that suffers an extreme disaster – fire, flood, power failure, natural
disaster or terrorist act.

If we could easily protect all of our data at a low cost, life would be simpler. However, there is always a
cost associated with implementing data protection solutions (or any solutions for that matter). Therefore
companies make hard choices about how much data they are willing to lose (recovery point objective, or
RPO) and how fast they need to get mission-critical applications back online (recovery time objective, or
RTO) on an application by application basis.
Figure One: RPO and RTO

Data Protection and Disaster Recovery


Data protection is made up of the series of actions we take to protect data, which include implementing
technologies and processes. This protection comes in various forms, including implementing high
availability server and storage systems, implementing snapshots, remote mirroring and backing up data.
Disaster recovery is made up of the actions we take to keep our businesses operational in the event of a
disaster. Disaster recovery actions include getting systems back online and recovering data. RPO and
RTO are important considerations when determining how to best implement disaster recovery for your
company.

Disasters happen at all levels. We tend to associate a disaster with the most extreme circumstances, such
as lost systems or data centers. However, the loss of even a single file has the potential to be a disaster.
Therefore, implementing multiple tiers of data protection is critical to a successful disaster recovery plan.

Enterprise Strategy Group Page 1


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems
There are a number of ways to protect data. High availability server and storage architectures are
essential. Snapshot technologies are used to rapidly recover files. Backing up to tape is still perhaps the
most widely used method for data protection and disaster recovery and is, in many cases, the only thing
data center managers have at their disposal. Remote mirroring is often closely associated with disaster
recovery but generally meets with the highest level of resistance in most organizations, due to a perception
that it is complex and expensive.

“For the last eight years we have been trying to get approval to implement remote mirroring as part of our
disaster recovery plan,” one Director of IT told us. “And my CIO said, ‘you have been requesting this for
years and nothing bad has ever happened. What is the justification for us doing this?’ ”

The reason that implementing remote mirroring has a hard time being approved by organizations like the
one cited is that the technology is expensive and complex, as well as the fact that the financial impact of
the downtime isn’t always well understood. What is an organization willing to pay to mitigate risk? That
indeed is the question.

The answer to that particular CIO’s question is simple. Remote mirroring and other data protection
solutions have become easier to manage and more cost-effective. Therefore, the inflection point between
risk and cost may have moved enough to justify the investment. The risk is higher and the cost is lower.

The 2005 Atlantic hurricane season was the worst in recorded history. There were also over 46 tornados
in the United States that same year, causing billions of dollars worth of damage. Large power outages
have occurred in recent years. In 2003, the Northeast power grid went down, taking out a huge section of
the US and parts of Canada, with an estimated financial loss of $6 billion. In July 2006, a large section of
Queens, New York was without power for nearly 10 days. And the threat of terrorism continues to loom
over us.

Our research has found that over 80% of customers that we surveyed, whether they are large or mid-sized
organizations, cannot tolerate 24 hours of downtime without an adverse business impact1.

Figure Two: Downtime Tolerance

Source: Enterprise Strategy Group

1
ESG Research: The Evolution of Data Protection, 2004

Enterprise Strategy Group Page 2


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems

Even armed with this information, a prudent CIO and director of IT might do the math and still conclude that
the risk is not great enough to justify implementing any additional data protection technology. However,
there is another major fact to consider beyond the risk of losing data and the cost of ensuring that you
don’t. The consequences of lost data or unavailable applications have also changed. The risk of going out
of business if a major disaster occurs has always existed. And yet, that alone has not been enough for
most companies to justify implementing remote mirroring technologies. Today, another real risk that
companies and organizations must consider is failing to meet federal regulatory mandates and/or the
impact of litigation on the business. However, it isn’t just some cold corporate entity that may suffer the
harsh consequences. The corporation’s executive management may also be accountable for ensuring the
continuity of the business. Failure to provide a single piece of critical data may be even more potent and
disastrous to your business than a level five hurricane.

Another major factor is that what are currently considered mission-critical applications aren’t necessarily
back office financial applications running on a Unix server that cost the company half a million dollars each.
E-mail is considered mission-critical by most, if not all, companies today. In a recent study, ESG found that
42 percent of the 485 companies that we surveyed had been required to respond to an electronic data
discovery (EDD) process. The most commonly requested data types were e-mail (77%), office documents
(50%), invoices and other customer records (46%) and financial statements (42%). Of the large enterprise
companies that we surveyed, 18% were subject to fines or other sanctions.
Figure Three: Electronic Data Discovery

Source: ESG Research 2006

When you add it all up, the risk has become greater and the consequences more dire, more personal and
more likely. Weigh this against the fact that there are a greater number of lower-cost and easier-to-use
solutions today than ever before, and rationalizing more investment in DR seems straightforward and on
the path to being a “no-brainer.”

Enterprise Strategy Group Page 3


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems

DR: No Longer an All or Nothing Proposition

DR is no longer an all or nothing proposition, encompassing various approaches and solutions that can be
implemented individually and/or in stages. Determining your objectives is essential. If you primarily use
tape for recoveries, your RPO will be within a 24 hour period (based on a nightly backup) and your RTO
could be minutes, hours or even days. Is that acceptable to your business? Consider the impact on your
resources and the effect on your business productivity. You must also factor in the fact that you may not be
able to get the data you want to recover from tape. ESG Research has found that human error and
mechanical failure make tape recoveries somewhat unreliable. Ironically, tape is still the most widely used
technology for disaster recovery. To some degree, this is like eating ice cream as a way to lose weight.

ESG believes that all recoveries of data should be from disk. The reasons are self-evident. When you
need to recover data, it is most likely due to an urgent and important matter. Therefore you want to
recover data quickly and reliably. Tape is inherently slow and unreliable. Disk-based storage systems are
much faster, more reliable and intelligent.

There are various methods for protecting data including snapshots, implementing virtual tape libraries (VTL
- disk-based systems that emulate tape libraries), continuous data protection (CDP) technologies, and
remote mirroring solutions. Each of these methods is complementary and at the same time can each be
used as standalone solutions. Each has their own RPO/RTO metrics and associated costs. A key factor
determining what to use and when to use it is a concept that ESG refers to as the Total Cost of Recovery
(TCR).

If an RPO of zero and an instant RTO were free, everyone would have DR environments that met these
objectives. However, moving toward a more granular RPO and a faster RTO impacts cost. Therefore the
TCR must be considered. Tape has an arguably low TCR but its RPO is typically nightly and its RTO can
vary from minutes, hours, days to even weeks. VTL has a similar RPO to tape but an RTO that can be
instant. The TCR of a VTL solution is not much higher than tape and in some cases is on par. CDP can
have an RPO of zero but the cost of keeping every new write can be expensive. Again, the TCR must be
considered. The same holds true with remote mirroring solutions.

The DR Eco-Systems
DR begins with building a system around high availability and avoiding single points of failure, which is
similar to the process of building an airplane. If one engine stops, the airplane continues to fly, if an
instrument fails there is another to back it up, and if something should ever happen to the pilot, the co-pilot
can take over the controls. The best-of-breed storage systems maintain high data availability through
active-active storage controllers, clustering, RAID, multiple RAID controllers, redundant power supplies,
rapid RAID rebuild times and multi-path connectivity.

Snapshots and CDP are additional local protection methods. Differential snapshots are taken at scheduled
times, providing protection copies of data at specific points-in-time that are used to recover files that have
been corrupted, deleted or overwritten. Differential snapshots only copy changes and therefore minimize
the amount of storage capacity required – typically about 20 percent of the full data set. Snapshots are
scheduled by the system administrator and can happen at different time intervals, providing RPOs that
range from minutes or hours to days or weeks, etc. CDP is typically used for mission critical applications
that have zero tolerance for data loss. CDP captures each unique write, allowing administrators to create a
perfect image of a volume continuously with a true RPO of zero.

Enterprise Strategy Group Page 4


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems
Snapshots consume less capacity than CDP, and therefore are typically retained over longer periods of
time. Snapshots can be kept for weeks, months and theoretically for years, and CDP journals are typically
only kept for hours or a limited number of days. These solutions can be complementary. A company may
determine that for a period of time certain data needs to be protected with an RPO of zero. They have set
up their snapshot policies to be once a day (midnight) and use CDP to continuously capture data for an
RPO of zero. At midnight, the storage system creates a snapshot, the CDP data is deleted and the
process starts all over again.

VTL improves backup and recovery performance, streamlines media management and improves overall
reliability. ESG has found that customers tend to keep data retained longer on VTL systems for the
purposes of timely recovery. Our experience is that end-user companies that have implemented VTL use it
to enhance their backup environments, allowing them to run nightly backups to disk and weekly backups to
tape. This enables them to provide faster backup performance and a reliable platform from which to
recover quickly, and also minimizes the burdensome tape management process.

Figure Four: The DR Eco-system

Synchronous and asynchronous remote mirroring allow organizations to create continuous copies of
primary data in real or near real-time. The method used for transferring or mirroring the data between the
primary and secondary sites will depend upon the application and the business. There are examples of
business applications in the financial industry that can’t afford to lose any data without the risk of incurring
enormous fines and losing revenues. Many of these companies use synchronous remote mirroring.

Synchronous remote mirroring writes new data to both the primary and secondary sites and only
acknowledges that a write is completed after receiving a confirmation from both systems that the data has
been written. While this process ensures data integrity, and that what is at Site A is the same data as the
data at Site B, it limits the distance that is supported in a synchronous mirrored relationship. While

Enterprise Strategy Group Page 5


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems
synchronous mirroring is valuable and often required in a zero data loss environment, it does not protect
companies within the same geographic disaster zone with distance limited to a maximum of approximately
100 km.

Asynchronous remote mirroring can replicate data over much longer distances. This is achievable since
asynchronous remote mirroring only requires the receipt of one write acknowledgement. While this method
solves the distance limitations, there is a potential for data corruption. Companies must make a
determination and consider the risk based on their requirements. In some cases, companies use both
remote mirroring methods to ensure recoverability. Synchronous remote mirroring is used to a remote DR
site at a limited distance and an asynchronous remote mirroring relationship is created to a DR site that is
separated by a long distance.

Snapshot and CDP technology can be used to complement remote mirroring solutions. One of the
problems with remote mirroring is that it will make an exact replica of primary data. That means that it will
copy corrupted data. Companies can make local or remote snapshots of CDP images in order to go back
to a point-in-time before the corruption. Consider the most stringent environments. The combination of
synchronous mirroring and CDP provides a remote disaster recovery solution with a true RPO of zero.

Other Key Considerations

Bandwidth Optimization
WAN bandwidth is a costly reoccurring expense for companies that can greatly affect the efficiency of
remote mirroring. The amount of WAN bandwidth used can be reduced using hardware compression
techniques and by defining service policies that restrict remote mirroring bandwidth consumption on an
application by application basis. Businesses can adjust the scheduling of remote replication around peak
business times to ensure that other applications that rely on the WAN are not negatively impacted. The
compression rate will depend on the application and data type.

Scalable DR
Will your business be the same size in one, two or three years? Will the demands on your business be the
same? Successful businesses are not static entities but are dynamic and growing all the time. Capacity
rates are growing rapidly and additional projects are brought online on a regular basis with more waiting in
the wings. Additionally, the rate of growth is ultimately unpredictable. IT groups need to adapt quickly at a
reasonable cost.

At the heart of any data protection strategy are the storage systems. In order to adapt quickly, cost-
effectively and transparently without sacrifice, storage systems are required to support scalable
architectures that offer value in the present and over a continuum.

Heterogeneity
Increasingly, end-user customers want data protection solutions and specifically remote mirroring
technologies to support heterogeneous storage systems. First, customers do not want to have to manage
multiple remote mirroring technologies in their primary data center. Second, they want lower cost storage
systems at the DR site. They do not want to pay a premium for a storage system they believe will be used
only in extreme circumstances. The goal is to minimize complexity, not add more technology that
customers need to be experts in.

Enterprise Strategy Group Page 6


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems

Pillar DR Solutions
The Pillar Axiom is a high availability storage system that provides independent scaling of performance and
capacity, built in intelligence for management and control, and a suite of enhanced data protection
software. The Pillar Axiom has an internally clustered architecture that easily accommodates growth and
minimizes the impact on application performance in the case of a disaster. Up to four (8 controllers -- up to
4 Slammers per system, with each Slammer having 2 controllers) storage controllers can be clustered for
environments demanding high performance. This is an important advantage. Compare a storage system
that has only two storage controllers with an active-active architecture. In the rare event that a single
storage controller fails, the remaining storage controller will be required to handle the entire workload. This
has a negative impact on performance of 50 percent or greater. A four-node Pillar storage cluster would
only suffer a 25 percent performance hit if a single controller fails. It isn’t enough to survive a disaster but
business continuity must also be robust.

Pillar offers rapid RAID rebuilds. Disk drives do three things – read, write and break. ESG Lab tested the
Pillar Axiom and found that it can rebuild a 250 GB drive in 1.5 hours. Other storage systems can take up
to 24 hours to build a drive of this size, essentially leaving you vulnerable to data loss during that time.
Additionally, rebuilding a drive can have a real impact on primary application performance and therefore,
the longer it takes, the more of an inconvenience it is to your business.

Pillar natively supports SAN and NAS in a single storage system. As such, it provides block-based (SAN)
and file-based (NAS) snapshot technology. Snap FS is Pillar’s snapshot technology for its NAS file
systems. Pillar also provides a feature called Snap LUN, which is its block-level snapshot technology.
Snap LUN supports both read-only and read/write capabilities. Writeable snapshots provide space efficient
copies of data that can be used for running tests against or for data mining. Pillar supports up to 255
snapshots per volume and can be configured to save snapshots at different timeframes, including monthly,
weekly, daily, hourly, or even down to every minute.

For those organizations wishing to experience the advantage of disk speeds for backups while minimizing
the impact to their backup infrastructure, Pillar provides a VTL solution. Pillar’s AxiomONE VTL offers a
wide variety of hardware and software platforms to meet a broad range of environments and includes IP-
based replication and hardware compression options. To meet the needs of organizations with very
demanding RPOs, Pillar also offers a CDP option, as part of their block-based replication product

Pillar supports remote mirroring for both NAS with AxiomONE File Replicator (AFR), and SAN with
AxiomONE Volume Replicator (AVR). Both the AVR and AFR offer high availability capability through
clustered options. Pillar Axiom Volume Replicator supports synchronous and asynchronous remote
mirroring2. Pillar replication solutions support heterogeneous storage systems. Customers can use
AxiomOne AFR and AVR to manage Pillar Axiom and other vendors’ storage systems. Customers with
existing storage systems from the major storage vendors can use Pillar to provide a lower cost DR
alternative. They can mirror data from their primary storage systems from EMC, NetApp or IBM to a lower-
cost Pillar Axiom. Since the AxiomONE AFR and AVR are appliance-based, they are not in-band and
therefore can keep sending write data to remote sites if the primary storage system is not available.
Additionally, if the AxiomOne AFR or AVR appliances go down, they do not impact read/write access to the
primary storage system. Also, both AFR and AVR support bandwidth optimization technologies in order to
cost-effectively implement remote mirroring over long distances. Pillar provides a comprehensive suite of
data protection solutions that address a wide range of RPO and RTO requirements and Total Cost of
Recovery metrics. Finally, the Pillar Axiom’s fundamental architecture enables high availability, scalability
and ease of use.

2
See ESG Lab Validation Report Pillar Axiom February 2006

Enterprise Strategy Group Page 7


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems

One of the most compelling attributes of the Pillar approach to DR is that users can consolidate on one
Axiom system DR pool for both block and file, concurrently use the same Axiom system for primary
applications and optimize the performance for all the applications running on that platform. Several current
Pillar customers use their systems for primary applications and for backup to disk. This prevents them from
having to buy special storage solutions for each different storage tier in their data centers. Consolidating
data onto a single intelligent storage system significantly reduces the management workload. Supporting,
troubleshooting and maintaining a unified storage platform increases IT efficiencies and lowers TCO. When
you combine Pillar’s ability to support heterogeneous platforms at the data center with the Axiom’s ability to
consolidate DR and primary applications at remote sites, it is easy to see that deploying Pillar for DR can
result in minimizing cost and complexity for DR infrastructures.

Pillar DR Case Study – Mid-Sized Financial Institution in EMEA


ESG spoke with a mid-sized financial institution headquartered near London whose IT department needed
to replicate their data to a site approximately 100km from their main data center. The customer had several
key problems they were facing as they considered different disaster recovery options:

• They had a limited budget for the DR storage at the remote site and the proprietary primary storage
replication offering didn’t meet that budget requirement.
• The distance between the remote and primary sites was approximately 100 KM and the synchronous
replication option of their existing primary storage couldn’t meet that distance requirement.
• They needed a very granular level of recovery at the remote site.
• They couldn’t test their DR capability without stopping their replication process.

Several well-known storage vendors attempted to win this business but this customer chose Pillar’s DR
solution. The reasons for their selection are as follows:

• Pillar’s SATA-based, cost-effective hardware, plus their single-instance system software licensing
meant that Pillar’s system could meet the customer’s stringent budget requirements.
• Pillar’s AxiomOne AVR solution provided heterogeneous replication which meant that it could support
their primary storage system and others down the road.
• Pillar’s DR solution allowed near synchronous replication over IP without distance limitations.
• Pillar’s AxiomOne AVR solution featured CDP functionality that gave them the ability to roll back in 3
second increments for recovery.
• Pillar’s solution allowed full recovery testing at the DR site without impacting replication.
• The customer wanted to have a VTL capability in the remote site and the Axiom allowed them to host
DR, VTL, plus one or two other applications on the same platform.
• The customer liked the idea of optimizing the performance of each application hosted by the Axiom in
the remote site according to its particular profile.

Enterprise Strategy Group Page 8


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems

Figure Five: Pillar DR

This Pillar customer is leveraging the Axiom’s ability to consolidate both DR and other applications on one
platform and it doesn’t hurt that the system is natively HA and scales easily in both capacity and
performance. Future projects planned for the Axiom are the optimization of the backup and recovery
process and the deployment of Pillar’s secure WORM storage for Exchange journal e-mail.

Enterprise Strategy Group Page 9


ESG White Paper
Achieving Cost-Effective, Flexible
Disaster Recovery Systems

ESG’s View

Disaster Recovery happens at several levels within the data center. Server and storage architectures
that support mission-critical applications should provide high availability and redundancy that ultimately
provides no single point of failure. Backup is a well-worn process that protects data typically with an
RPO of 24 hours. Disk-to-disk backup, including VTL is replacing or augmenting tape as part of the
nightly backup process. Snapshot technology creates capacity-efficient local copies for rapid recoveries
with an RPO that can be less than an hour (this will also depend on how often you want to quiesce your
database). CDP can provide continuous protection of data by retaining every new write as they occur
(most CDP solutions can be scheduled for periodic capture if you don’t require perpetual capture of
data). CDP can theoretically provide a true RPO of zero, since you can go back to any point-in-time
versus mirroring data, which will replicate primary data exactly, including corruptions. Synchronous
remote mirroring extends data protection beyond a local data center with 100 percent write integrity.
Asynchronous remote mirroring does not provide 100 percent write integrity but can extend remote
mirroring relationships with no distance limitations.

Most of these technologies have been in place for several years, and much of it has improved over time.
However, there is a great deal of confusion about how they all fit together. These technologies can be
implemented as standalone solutions or they can be integrated in concert to provide a complementary data
protection schema. Companies must weigh risk and cost factors to determine their strategies. However, it is
important to consider that the risks may be greater and different from the traditional way of looking at data
protection, and the costs may be far lower than initial perceptions based on industry-leading solutions.
Additionally, there are easier-to-use solutions (which also impacts cost and risk) that provide viable
alternatives to the traditional methods

The rules of the game are changing. Certainly, data access problems occur in some cases on a daily basis.
Even major disasters seem to be more likely in this day and age, which creates a greater risk, or, at the very
least, greater perceived risk. Data loss once only meant an impact on the business to varying degrees.
Nowadays, when data is lost, on the one hand there may be little to no effect while, at the other extreme, the
company could go out of business with the loss of millions of dollars along the way. However the big
difference is that there is more of a personal stake today than ever, which can make the people working at
these companies and organizations personally liable and accountable with potentially harsh consequences
brought to bear. The good news is that DR is becoming easier and more cost-effective.

Pillar provides a scalable storage system with a high level of redundancy as well as a comprehensive suite of
data protection software that includes snapshots, CDP, VTL and remote mirroring. Pillar supports
asynchronous and synchronous remote mirroring for its SAN storage using its AxiomONE AVR solution and
file-based replication for its NAS solution with AxiomONE AFR. These remote mirroring solutions are easy to
use, highly scalable and support not only Pillar storage but heterogeneous systems as well. The Pillar
approach is to provide a single storage platform that scales up and down based on your environment. They
can be used equally well with entry, midsize and large environments, SAN and NAS. Tier One and Tier Two
(and Tier Three) applications, and primary and secondary storage. As such, they provide ease-of-use and
cost-effectiveness and at the same time, balance that with high-end advanced functionality, including robust,
comprehensive and sophisticated (but easy-to-use) data protection software.

All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources the
Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are
subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of
this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the
express consent of the Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages
and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at (508) 482-0188.

Enterprise Strategy Group Page 10

Vous aimerez peut-être aussi