SQ L Server High Availability

MICROSOFT SQL SERVER
HIGH AVAILABILITY
AND DISASTER RECOVERY
Michael Poremba // October 2008
Database HA & DR
Experience
Work with business to determine HA or DR

requirements for applications and data?
Design HA or DR solutions?
Administer HA or DR process?
Still learning MS SQL Server HA & DR

capabilities?
Scope of this Presentation

3
Presentation Focus
Data Availability
Data recovery
High availability
Disaster recovery
Technology Focus
MS SQL Server
Physical servers
SANs
Beyond Scope of
Presentation
In-depth how-to
(available elsewhere)
Partitioned views (federated)

Advanced DBA techniques
Custom application logic
3rd-party software solutions
Alternate DBMS engines
(e.g. Oracle; DB2)
HA on virtual machines
Complex scenarios &
solutions
Load balancing
Introduction to Data
Availability
So, you need to make your
production database bulletproof
Data Availability Continuum

5
Degrees of protection for information

systems:
Business Risk
Solution
Data
Recovery
Data loss
Redundant data
High
Availability
Downtime of
Redundant
database service system
components
Disaster
Recovery
Downtime of
business
operations
Redundant
systems
and facilities
Business Case for

Availability
High Availability
Keep businesscritical
applications
available
Secondary:
Server
maintenance
Disaster Recovery
Protect against
loss of data center
Secondary:
Application
upgrades
Infrastructure
upgrades
Service Level Agreement

(SLA)
Permitted downtime (planned vs. unplanned?)

Uptime
SLA
Downtime
per Year
Downtime
per Month
99.9%
8.76 hours
43.8 minutes
99.99%
52.6 minutes
4.38 minutes
99.999%
5.26 minutes
0.438
minutes
Acceptable data/transaction loss

Application response times
Mean time to recovery
Note: Database uptime is not equivalent to application

availability
Failures of other application services

Network outages
Protect What?
8
Application data stores
Database services
Databases
Files
Other data repositories
DBMS availability for applications
Application services
Application availability for users and external

systems
Databases are the heart of most information

systems;
they deserve the highest affordable protection.
Database Failure Scenarios

9
Physical Infrastructure
Failures
Storage
subsystem
Disk
Controller
Network
Server
Power
Logical Data Failures
Operator errors
DBMS interruption
Drops / deletes
Application
defects
DBMS defects
Data corruption
Service Recovery Strategies

10
Stand Failover Behavior

by
Mode
SQL Server
Feature
Cold
stand
by
Backup and
restore
Manual intervention
required to restore
offline data copy
Warm Data copy online and

stand
ready
by
Manual failover
required
Transaction log
shipping
Database
mirroring
Hot
stand
by
Database
mirroring
Failover
Automatic failover
11
Data Recovery
Terminology
Terminology varies for source vs. copy
High Availability
Strategy
Data Source
Data Copy
Backup and
Restore
Database
Backup
Log Shipping
Primary
Secondary
Standby
Database Mirroring Principal
Mirror
Failover Clustering
Secondary
Passive
Standby
Inactive
Primary
Active
12
Data Recovery
[Briefly]
Database Backups
13
Traditional backup types
Disk is better than tape
Full backup
Differential backup
Transaction log backup
First backup to disk (separate physical disk
volume)
Detect exceptions encountered during backup
Verify backup files
Copy backup files to tape or remote disk
Data retention policy for backup files
Database Backup Strategy

14
Backup of user databases not sufficient for

recovery
System database
Master database
MSDB database
Model database
External data stores
15
Synch with External Data

Stores
Synchronize recovered database with
external data stores:
Identity column seeds
Full-text indexes
(SQL Server 2000)
LDAP entries
File system objects
Other databases
Backup Retention Policy

16
Location of backup files

Duration of retention
Protection of sensitive data
Sarbanes/Oxley (SOX)
HIPAA
Internal policies for data management and
protection
Access to backups from offsite data

storage
Data Recovery Process

17
Backup file sets
Full baseline,
differential, and
transaction logs
Retrieving backup files
Offsite storage
Tape
Network copy
Dependency on
multiple people to get
access to backup files
Recovery strategy
depends on failure
scenario
Create comprehensive
failure matrix
Devise recovery strategy
for each scenario
Does worst-case
recovery scenario fit
within SLA parameters?
Recovery time; SLA

Include future data
growth in recovery plan
Fully test recovery
strategiespractice is
essential
18
High Availability
High Availability
19
Minimize or avoid service downtime
When components fail,

service interruption is brief or non-existent
Whether planned or unplanned
Automatic failover
Eliminate single points of failure (as

affordable)
Redundant components
Fault-tolerant servers
Redundant Components
20
Objective: Avoid single points of failure (where affordable)

Approach: Use redundant components for database service
Database server nodes
Server components
DBMS instance
User databases
Storage devices
Storage unit components
MPIO: Interfaces; paths; switches; controllers

RAID: Disks
Networking
ECC RAM; failure-tolerant HW & OS
MPIO: Interfaces; paths; switches
Data copies
E.g. Recovering torn page from mirror in SQL Server 2008
Transaction Log Shipping

21
Warm standby solution

Duplicate user database
Database available for read-only access
Copy transaction logs to standby server &

restore
Users must disconnect for logs to be applied
Two database licenses required if querying
standby
Manual application failover

Supported on standard hardware
Possible data loss (unapplied transactions)
Database Mirroring
22
Redundancy at user database level
Mirrored over private network channel
Requires witness server
Mirror-aware application client connection
High-availability: commit @ log on mirror;

automatic failover
High-protection: commit @ log on mirror; manual
failover
High-performance: commit when logged on
principal
Very fast automatic failoverseconds
Mirror always redoing transactions from principal

Negligible impact on transaction throughput
Multiple mirroring modes:
Duplicate copy of user database

Independent storage devices
Multiple copies of instance databases
Provided by client library

Database connection string must specify both
servers
Mirror may be available for read-only access

(snapshots)
Works with standard hardware
Mirror Witness
23
With mirroring, more than one server is required

to decide on failover
Witness automates failover from primary to mirror
Runs in separate SQL Server instance (Express is

OK)
Prevents split brain scenario
Very low resource consumption
Watches database availability

Reports observations back to principal and mirror
Can be witness for multiple databases
Not a single point of failure
24
SQL Server Failover

Clustering
Two clustered nodes
MS SQL services
Active/Passive config
Running on virtual
server
Shared storage device
User databases
System databases
Quorum drive
Redundant internal
components
25
Active/Passive Failover
Clustering
Redundancy at database instance

level
Single data copy on shared storage

device
SQL Agent; Analysis Services; Full-Text

engine, MS DTC
Automatic failover (up to minutes)

DBMS accessed over virtual IP
Database not available from inactive
node for DB client connections
No I/O overhead reducing throughput

Storage unit is single point of failure
for cluster
All database services are clustered
All databases fail over together

Shared copy of system databases
Storage is controlled by one cluster

node at a time
Requires hardware certified by

Microsoft for Microsoft Cluster
Service
HA Comparison
26
Database Mirroring
Scope: user DB
Standard hardware
One SQL license
(unless querying
snapshots on mirror)
Very fast failover (seconds)
OS flexible (e.g. 32/64)
Independent storage
Independent services
Reporting on mirror
Geographic separation OK
Failover Clustering
Scope: DBMS instance

Certified hardware
One SQL license
(only one node can access
database)
Automatic failover (up to
minutes)
Enterprise OS
Shared storage
Clustered services
Standby not available
Servers are usually co-located
Considerations for HA
27
HA complements backup and recovery strategy
Application service availability is often determined

by a network of interdependent services
Does not replace data recovery plan

Availability can be difficult to define (e.g. partial
failures)
Failure probability difficult to measure or compute
Increased system complexity could lead to lower

service availability!
Operator error a leading cause of availability issues

Increased number/types of system components
More complex to configure and administer
28
Data Recovery
Requirements
29
Disaster Recovery
Disaster Recovery
30
Minimize downtime of business

operations
SQL Server features:
Redundant systems and facilities

Transaction log shipping
Database mirroring
Failover clustering
Other technologies
Storage-based mirroring
Disaster Recovery Planning

31
Data security requirements

Clarify SLA, data loss allowance
Evaluate system cost vs. data protection
Failure analysis
System redundancy
Process validation
Training for personnel
Prevention practices
Executing disaster recovery and business continuity
Practice, practice, practice
Business Continuity Facility

32
System redundancy
Alternate facilities
Systems: Web servers app servers; database, etc.

Data: Databases; data files on OS; security info,
etc.
Networking: Domain, routing, subnet, VIPs, etc.
Network bandwidth
Physical or network access by operations staff
Failover
Often a deliberate decision, using manual failover
Data Redundancy
33
Synchronous redundancy
Asynchronous redundancy
Network bandwidth cost

Network latency and application performance
Network reliability
Risk of data loss
More cost-effective
Resilient to network latency issues
Candidate Technologies
SQL Server database mirroring

Failover clustering with SAN-based mirroring
34
DR Using Database
Mirroring
Two sites: Primary and DR location

Separate failover clusters at each site
SQL Server database mirroring between
sites
35
DR Using SAN-Based
Mirroring
Two sites: Primary and DR location

Four-node failover cluster; one virtual IP
address
SAN-based mirroring between sites
Manual cluster failover
36
Complimentary
Technologies
[Skip if time is running short.]
SAN-Based Data Mirroring

37
Data blocks duplicated at storage level
Copy performed in sequence and coordinated with

database checkpoint
Ensures consistency of mirrored data files
Synchronous or asynchronous mirroring

Co-located or geographically dispersedboth are
OK
Similar to transaction log shipping
SAN link bandwidth must support database I/O rate
May require extra feature support from SAN

vendor
Could rely on Failover Clustering for HA
38
SQL Server Database

Snapshots
Read-only point-in-time database

snapshot
No data is copiedinstantaneous
Snapshots can be maintained indefinitely
Historical snapshot pages tracked

separately from changing pages
Limited only by available storage
Snapshot copy can be used for reporting
Read-only, so no locking issues
SQL Server Replication

39
Transactional replication
Merge replication
High transaction volume

Low data latency required
Mixed technologies:
Integrates with other
DBMS
Bi-directional data
changes
Typically server-to-client
Snapshot replication
Large, infrequent data

changes
Data change latency OK
Best for smaller data sets
Subscriber
databases available
for reporting
Replicate data
subsets
Some data loss is
possible
Periodically validate
replicated data
40
App Development and

Admin
41
Considerations for App

Developers
App services tolerant to database service interruptions

Application transactions must be handled in codedata
consistency
Exception handling for transaction retry, connection
recovery
Requires coding standards, code reviews, and testing
Bulk data operations
Transaction volume impacts rollback time during failover
Batch jobs must be run on alternate nodes
Dont bypass transaction logging
Synchronization with external data sources?
Be aware of database recovery model
Mirroring uses FailoverPartner in connection string
Use TCP/IP as client protocol
Considerations for Admins

42
Use identical server hardware, when possible

Design network redundancies, when feasible
Always manage through virtual cluster, not individual cluster

nodes
Retest failover/failback after HA maintenance
Diagnose after failover
Consider network latency for geographic separation
Repair alternate node

Resynchronize data, as necessary
Be aware of primary/secondary locations
Ensure application services are connected and functioning properly
Keep server node configurations synchronized:
Service pack and patch levels

Duplicate non-redundant resources
Jobs; logins and permissions; OS & sys objects
HA Risks
43
System performance degradation

HA system complexity leads to availability
issues
Some system failures not planned for
Backup and recovery planning incomplete
Administrators not fully trained or informed
User databases not synchronized with other
data sources
Common Admin Use Cases

44
Maintain HA nodes
Resynchronize the redundant copy
Hardware maintenance
Rolling upgrades and software patches
Re-synch mirror
Restart log shipping
Diagnose and repair
Diagnose cause of failover

Repair failed node and restore failover
capabilities
Test failover and failback
Common Admin Actions

45
Train and practice administrators to:

Initiate a database mirror
Manually failover mirror database or
cluster node
Add/remove passive node from mirror or
cluster
Upgrade/patch servers nodes
Restart or redirect application services
46
More Information
ReferencesBooks
47
High Availability
Microsoft SQL Server 2008

High Availability with
Clustering & Database
Mirroring
by Michael Otey, 2009.
Microsoft SQL Server High
Availability
by Paul Bertucci, 2004.
Pro SQL Server 2005 High
Availability
by Allan Hirt, 2007.
Related Topics
Pro SQL Server 2005

Replication
by Sujoy Paul, 2006.
Pro SQL Server 2005 Service

Broker
by Klaus Aschenbrenner,
2007.
The Rational Guide to SQL

Server 2005 Service Broker
by Roger Wolter, 2006.
ReferencesPresentations
48
Microsoft Load Balancing and Clustering

http://ce.sharif.edu/courses/84-85/2/ce317/resources/root/lecture%20slides
/14.%20Microsoft%20Load%20Balancing%20and%20Clustering.ppt
SQL Server 2005 High Availability
http://www.atlantamdf.com/Presentations/AtlantaMDF_111207HA.ppt
High Availability Technologies In SQL Server 2000 And SQL Server 2005
http://202.181.238.2/hk/teched2004/ppt/Day_2_Rm407/DAT431(13301445).ppt
Meeting the Availability Challenge

http://download.microsoft.com/download/E/D/C/EDCF54DB-19CD-4882-9FC44F7D46FCEAA6/HighAvailability.ppt
Disaster Recovery Mistakes

http://www.sqlsig.org/Oct%2011%20DASSUG%20-%20Jason%20Hall%2010-1107%20MM.ppt
SQL Server 2005 High Availability

http://blogs.msdn.com/sql2005event/attachment/564303.ashx
Effective Usage of SQL Server 2005 Database Mirroring

http://www.sqlserver-qa.net/SSQA-Effective%20Usage%20of%20SQL%20Server
%202005%20Database%20Mirroring_show.ppt
ReferencesArticles
49
Achieve High Availability for SQL Server

http://technet.microsoft.com/en-us/magazine/cc162477.aspx
Geographically Dispersed Clusters in Windows

Server 2003
http://www.microsoft.com/windowsserver2003/techinfo/overview/clusterg
eo.mspx
Restoring file and filegroup backups

http://support.microsoft.com/kb/281122/en-us
Restoring specific tables or rows from backups

http://support.microsoft.com/kb/321836/en-us
Maintaining Availability During Upgrades

http://msdn.microsoft.com/en-us/library/ms191449.aspx

SQ L Server High Availability

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

SQ L Server High Availability

Transféré par

Droits d'auteur :

Formats disponibles

MICROSOFT SQL SERVER

Work with business to determine HA or DR

Still learning MS SQL Server HA & DR

Scope of this Presentation

Partitioned views (federated)

Data Availability Continuum

Degrees of protection for information

Business Case for

Service Level Agreement

Permitted downtime (planned vs. unplanned?)

Acceptable data/transaction loss

Note: Database uptime is not equivalent to application

Failures of other application services

Application data stores

Application availability for users and external

Databases are the heart of most information

Database Failure Scenarios

Logical Data Failures

Service Recovery Strategies

Stand Failover Behavior

Warm Data copy online and

Database Mirroring Principal

Traditional backup types

Disk is better than tape

Data retention policy for backup files

Database Backup Strategy

Backup of user databases not sufficient for

Synch with External Data

Backup Retention Policy

Location of backup files

Access to backups from offsite data

Data Recovery Process

Backup file sets

Retrieving backup files

Recovery time; SLA

Minimize or avoid service downtime

When components fail,

Whether planned or unplanned

Eliminate single points of failure (as

Objective: Avoid single points of failure (where affordable)

MPIO: Interfaces; paths; switches; controllers

ECC RAM; failure-tolerant HW & OS

MPIO: Interfaces; paths; switches

E.g. Recovering torn page from mirror in SQL Server 2008

Transaction Log Shipping

Warm standby solution

Database available for read-only access

Copy transaction logs to standby server &

Manual application failover

Redundancy at user database level

Mirrored over private network channel

Requires witness server

Mirror-aware application client connection

High-availability: commit @ log on mirror;

Very fast automatic failoverseconds

Mirror always redoing transactions from principal

Multiple mirroring modes:

Duplicate copy of user database

Provided by client library

Mirror may be available for read-only access

With mirroring, more than one server is required

Runs in separate SQL Server instance (Express is

Watches database availability