Switchovers and Failovers

1/22/2015
SwitchoversandFailovers:Exchange2010Help
Switchovers and Failovers

Exchange 2010
Applies to: Exchange Server 2010 SP3, Exchange Server 2010 SP2
Topic Last Modified: 20131101
Switchovers and failovers are the two forms of outages in Microsoft Exchange Server 2010. A switchover is a scheduled outage of a database or server that's
explicitly initiated by an administrator, typically in preparation for performing a maintenance operation. Switchovers involve an administrator moving the active
mailbox database copy to another server in the database availability group DAG.
A failover refers to unexpected events that result in the unavailability of services, data, or both. A failover involves the system automatically recovering from the
failure by activating a passive mailbox database copy to make it the active mailbox database copy.
The high availability platform in Exchange 2010 is designed to handle both switchovers and failovers.
Looking for management tasks related to high availability and site resilience? See Managing High Availability and Site Resilience.
Switchovers
There are three types of switchovers in Exchange 2010:
Database switchovers
Server switchovers
Datacenter switchovers
Database Switchovers
A database switchover is the process by which an individual active database is switched over to another database copy a passive copy, and that database copy
is made the new active database copy. Database switchovers can happen both within and across datacenters. A database switchover can be performed by
using the Exchange Management Console EMC or the Exchange Management Shell. Regardless of which interface is used, the switchover process is the same:
1. The administrator initiates a database switchover to move the current active mailbox database copy to another server. The switchover can be initiated by
using the MoveActiveMailboxDatabase cmdlet or by using the Activate a Database Copy wizard.
2. The client used for the task makes an RPC call to the Microsoft Exchange Replication service on a DAG member.
3. If the DAG member doesn't hold the Primary Active Manager PAM role, the DAG member refers the task to the PAM.
4. The task makes an RPC call to the Microsoft Exchange Replication service on the PAM.
5. The PAM reads and updates the database location information that's stored in the cluster database for the DAG.
6. The PAM contacts the Microsoft Exchange Replication service on the DAG member whose passive copy is being activated as the new active mailbox
database copy.
7. The Microsoft Exchange Replication service on the target server queries the Microsoft Exchange Replication services on all other DAG members to
determine the best log source for the database copy.
8. The database is dismounted from the current server and the Microsoft Exchange Replication service on the target server copies the remaining logs to
the target server.
9. The Microsoft Exchange Replication service on the target server requests a database mount.
10. The Microsoft Exchange Information Store service on the target server replays the log files and mounts the database.
11. Any error codes are returned to the Microsoft Exchange Replication service on the target server.
12. The PAM updates the database copy state information in the cluster database for the DAG.
13. Any error codes are returned by the Microsoft Exchange Replication service on the target server to the Microsoft Exchange Replication service on the
PAM.
14. The Microsoft Exchange Replication service on the PAM returns any errors to the administrative interface where the task was called.
15. Remote PowerShell returns the results of the operation to the calling administrative interface.
For detailed steps about how to perform a database switchover, see Move the Active Mailbox Database.
Server Switchovers
https://technet.microsoft.com/enus/library/dd298067(d=printer,v=exchg.141).aspx
1/7
1/22/2015
A server switchover is the process by which all active databases on a DAG member are activated on one or more other DAG members. Like database
switchovers, a server switchover can occur both within a datacenter and across datacenters, and it can be initiated by using both the EMC and the Shell.
Regardless of which interface is used, the switchover process is the same:
1. The administrator initiates a server switchover to move all current active mailbox database copies to one or more other servers. The switchover can be
initiated by using the MoveActiveMailboxDatabase cmdlet, or by using the Switchover Server UI.
2. The task performs the same steps described earlier in this topic for database switchovers Steps 2 through 4 for each of the active databases on the
current server.
3. The PAM reads and updates the database location information that's stored in the cluster database for the DAG.
4. The PAM contacts the Microsoft Exchange Replication service on each DAG member that has a passive copy being activated.
5. The Microsoft Exchange Replication service on the target servers query the Microsoft Exchange Replication services on all other DAG members to
determine the best log source for the database copy.
6. The database is dismounted from the current server and the Microsoft Exchange Replication service on each target server copies the remaining logs.
7. The Microsoft Exchange Replication service on each target server requests a database mount.
8. The Microsoft Exchange Information Store service on each target server replays the log files and mounts the database.
9. Any error codes are returned to the Microsoft Exchange Replication service on the target server.
10. The PAM updates the database copy state information in the cluster database for the DAG.
11. Any error codes are returned by the Microsoft Exchange Replication service on the target server to the Microsoft Exchange Replication service on the
PAM.
12. The Microsoft Exchange Replication service on the PAM returns any errors to the administrative interface where the task was called.
13. Remote PowerShell returns the results of the operation to the calling administrative interface.
For detailed steps about how to perform a server switchover, see Perform a Server Switchover.
Datacenter Switchovers
A datacenter or site failure is managed differently from the types of failures that can cause a server or database failover. In a high availability configuration,
automatic recovery is initiated by the system, and the failure typically leaves the messaging system in a fully functional state. By contrast, a datacenter failure is
considered to be a disaster recovery event, and as such, recovery must be manually performed and completed for the client service to be restored and for the
outage to end. The process you perform is called a datacenter switchover. As with many disaster recovery scenarios, prior planning and preparation for a
datacenter switchover can simplify your recovery process and reduce the duration of your outage.
For more information about datacenter switchovers, including detailed steps for performing a datacenter switchover, see Datacenter Switchovers.
For assistance with performing a datacenter switchover, see Guided Walkthrough: Exchange Server 2010 Datacenter Switchover for a Database Availability
Group.
Failovers
A failover is an automatic activation process that can occur at either the database or server level. Failovers occur in response to a failure that affects an individual
database for example, an isolated storage loss or an entire server for example, a motherboard failure or a loss of power.
DAGs and mailbox database copies provide full redundancy and therefore rapid recovery of both the data and the services that provide access to the data. The
following table lists the expected recovery actions for a variety of failures. Some failures require the administrator to initiate the recovery, and other failures are
automatically handled by the system.
Description
Extensible Storage Engine

ESE soft database failure:
The drives storing the
database are returning errors
on some reads for example,
a 1018 error.
Automatic
activation
Automatic
repair action
Possible short
outage.
Automatic
patching of bad
page.
Possible
automatic
failover.
State
during
repair:
Active
State
during
repair:
Passive
Manual
switchover,
automatic
failover, or
online
repair.
Failed
Repair actions
Comments
RAID rebuild, database and

database copy repair, restore and
run recovery then page patching,
or page patching from copy.
There may be
other soft
database failure
codes.
Doesn't include
NTFS file system
block failures.
2/7
1/22/2015
If failover or
switchover is
performed, host
server is updated.
ESE "semisoft" database
failure: The drives storing the
database are returning errors
on some writes.
ESE "semisoft" log failure:

The drives storing the log
data are returning non
recovered errors on some
reads or writes.
Short outage
during automatic
failover.
Short outage
during automatic
failover.
Automatic
volume/disk
rebuilt after
possible drive
replacement.
Dismounted
if cant be
recovered.
Automatic
volume/disk
rebuilt after
possible drive
replacement.
Dismounted
if cant be
recovered.
Failed
RAID rebuild may solve the

problem.
Copy and repair, restore and run
recovery, or volume/disk rebuilt
after possible replacement.
Failed

problem.
Copy and repair, restore and run
recovery, or volume/disk rebuilt
after possible replacement.
An ESE semisoft
write error means
some writes are
successful.
Doesn't include an
NTFS block failure.
An ESE semisoft
read/write error
means some
reads/writes are
successful.
If the database
fails, automated
recovery will occur
before log data
recovery
processing starts.
ESE software error or

resource exhaustion: An error
where ESE terminates
instance for example, Event
ID 1022, checkpoint depth
too deep.
Short outage
during automatic
failover.
None.
Dismounted
if cant be
recovered.
Failed
Fix underlying resource issue.
This failure could

be the surfaced
error of other
cases.
NTFS block failures: The

drives storing the database or
logs experiences a read or
write error to an NTFS control
structure.
Short outage
during automatic
failover.
Volume
completely
rebuilt after
possible drive
replacement.
Dismounted
if cant be
recovered.
Failed

problem. NTFS utilities may solve
the NTFS problems. Exchange
recovery may be required.
This is more likely

to occur when
RAID isn't in use. If
this impacts the
active log volume,
some recent log
files will be lost.
Doesn't include
errors
automatically
corrected by NTFS
or its underlying
software or
hardware stack.
Database or log drive failure:

A drive storing the database
or logs has completely failed
and is inaccessible.
Database or log volume

failure: The volume fails due
to NTFS or lower level
volume issues.
Short outage
during automatic
failover.
Short outage
during automatic
failover.
Drive
reformatted or
replaced,
followed by
complete
volume rebuild.
Dismounted
if cant be
recovered.
Drive
reformatted or
replaced.
Dismounted
if cant be
recovered.
Failed
Drive replacement followed by

possible RAID rebuild.
Not applicable.

complete volume rebuild.
Complete volume rebuild.
Failed

possible RAID rebuild.
Not applicable.

complete volume rebuild.
Complete volume rebuild.
Database or log volume out

of space: The NTFS file
system with the database or
log files is out of space.
Automatic
failover if other
copy isn't in
similar state.
None.
Dismounted.
Failed
Run full or incremental backups,

manually delete logs, let time
pass, resume database copy, or
repair failed database copy.
Not applicable.
Administrator dismounts the

wrong database.
If automatic
failover isn't
blocked by the
administrator,
there will be a
short outage.
None.
Dismounted.
Not
applicable
Administrator corrects the error.
Not applicable.
If automatic
failover is
3/7
1/22/2015
prevented, there
will be an outage
until the database
is mounted.
Administrator suspends the
wrong database copy.
Depending on
configuration and
impacted copy,
auto recovery
may be
prevented.
None.
Not
applicable.
Suspended
Administrator corrects the error.
Not applicable.
Administrator dismounts a
database for storage, NTFS,
or volume maintenance.
If automatic
failover isn't
blocked by the
administrator,
there will be a
short outage.
None.
Dismounted.
Not
applicable
Administrator completes the

task.
Not applicable.
If automatic
failover is
blocked, there will
be an outage
until the
administrator
completes the
task.
Administrator suspends a
database copy for storage,
NTFS, or volume
maintenance.
Depending on
configuration and
impacted copy,
auto recovery
may be
prevented.
None.
Not
applicable.
Suspended

actions.
Not applicable.
Administrator dismounts a
database for offline database
maintenance.
Outage until
repaired.
None.
Dismounted.
Suspended

actions.
Active and passive

database copies
are diverged.
Administrator must
suspend copies.
Storage area network SAN,

disk, or storage controller
failure.
Short outage
during automatic
failover.
None.
Dismounted.
Any
Repair hardware.
A passive database
copy will be in the
state that existed
at the time when
the system failed.
Server hardware
maintenance.
Short outage
during automatic
failover unless
blocked by an
administrator.
None.
Dismounted.
Any
Complete actions.
A passive database
copy will be in the
state that existed
at the time when
the system was
shut down.
Server software maintenance.
Short outage
during automatic
failover unless
blocked by an
administrator.
None.
Dismounted.
Any
Complete actions.
A passive database
copy will be in the
state that existed
at the time when
the system was
shut down.
Microsoft Exchange
Information Store service is
stopped or paused by an
administrator.
None.
None.
Dismounted.
Any
Restart the Microsoft Exchange

Information Store service.
A passive database
copy will be in the
state that existed
at the time when
the service was
stopped.
Microsoft Exchange
Information Store service
fails; operating system is still
running.
Short outage
during automatic
failover.
Service Control
Manager
restarts the
Microsoft
Exchange
Information
Store service.
Dismounted.
Any
Manually or automatically restart

the Microsoft Exchange
A passive database
copy will be in the
state that existed
when the
Microsoft
Exchange
Information Store
service failed.
4/7
1/22/2015
Partial Microsoft Exchange
Information Store service
failure; some part of the
Exchange store stops
functioning, but it's not
identified as completely
failed.
Possible short
outage during
automatic
failover.
None.
Mounted
and partially
functional.
Any, but
may be
only
partially
functional
Restart server, operating system,

or Microsoft Exchange
Not applicable.
Server failure: The server fails

for one of the following
reasons:
Short outage
during automatic
failover.
Restart
computer.
Dismounted.
Any
Restore power, change operating

system settings, change
hardware settings, replace
hardware, restart operating
system, service operating system,
service hardware, or repair
communication problems.
Not applicable.
DAG experiences a quorum

failure.
Outage until
repaired.
None.
Dismounted.
Any
Repair failed quorum, assign new

quorum, or restore the network
that's causing quorum failure.
A passive database
copy will be in the
state that existed
at the time when
the system failed.
MAPI network
communication failure: The
server is no longer available
on the MAPI network.
Short outage
during automatic
failover; must be
lossless.
None.
Communication
continues to be
attempted.
Dismounted.
Any
Fix communication problem by

correcting hardware or software
issues.
Not applicable.
Replication network
server cant receive
heartbeats, log copies, or
seed through the failed
replication network.
Possible short
copying or
seeding outage
while the
workload is
switched to other
network.
None.
Communication
continues to be
attempted.
None.
Any

issues.
Resiliency
impacted by
failure.
Multiple network
server cant receive
heartbeats, log copies, or
seed through multiple
networks.
Short outage
during automatic
failover; must be
lossless.
None.
Communication
continues to be
attempted.
Dismounted.
Any

issues.
At least one
network is still
functional.
Partial failure of one or more

networks: Networks
experience high error rates.
Failure not
detected; no
action.
None.
Mounted,
but possible
performance
issues.
Any

issues.
Network
experiences higher
than normal error
rates.
Undetected operating system

hang: Operating system stops
responding but it's not
detected by monitoring or
clustering.
None.
None.
Any.
Any
Restart or terminate the

resources that aren't responding.
Hang isn't
detected so no
action is taken.
Operating system drive

experiences a failure.
Short outage
during automatic
failover.
None.
Dismounted.
Any
Replace drive and rebuild server

or rebuild volume by using RAID.
Not applicable.
Operating system drive out

of space.
Short outage
during automatic
failover.
None.
Dismounted.
Any
Manually free space on the

volume.
Not applicable.
Drive containing Exchange

binaries experiences a
volume or drive failure.
Short outage
during automatic
failover.
None.
Dismounted.
Any
Replace drive and reinstall

application or rebuild volume by
using RAID.
Not applicable.
Complete power
failure
Unrecovered failure of
the processor chip,
motherboard, or
backplane
Operating system stop
error
Operating system
stops responding
Complete
communication failure
Some functionality
may be
operational.
5/7
1/22/2015
Drive containing the
Exchange binaries is out of
space.
Short outage
during automatic
failover.
None.
Dismounted.
Any
Manually free space on the

volume.
Not applicable.
Invalid new log detected: The

log sequence is disrupted by
an existing file.
Short outage
during automatic
failover; assume
other copies
don't have the
same problem.
None.
Dismounted.
Failed
Remove disruptive logs after

determining source.
The disruptive logs

shouldn't replicate.
Continuous replication
detects invalid log: Replay
detects an inappropriate log
during copy or replay.
Not applicable.
Discard log.
Not
applicable.
Failed
Discard invalid log; move

impacting log stream.
Not applicable.
Database Failovers
A database failover occurs when a database copy that was active is no longer able to remain active. The following occurs as part of a database failover:
1. The database failure is detected by the Microsoft Exchange Information Store service.
2. The Microsoft Exchange Information Store service writes failure events to the crimson channel event log.
3. The Active Manager on the server that contains the failed database detects the failure events.
4. The Active Manager requests the database copy status from the other servers that hold a copy of the database.
5. The other servers return the requested database copy status to the requesting Active Manager.
6. The PAM initiates a move of the active database to another server in the DAG using a best copy selection algorithm.
7. The PAM updates the database mount location in the cluster database to refer to the selected server.
8. The PAM sends a request to the Active Manager on the selected server to become the database master.
9. The Active Manager on the selected server requests that the Microsoft Exchange Replication service attempt to copy the last logs from the previous
server and set the mountable flag for the database.
10. The Microsoft Exchange Replication service copies the logs from the server that previously had the active copy of the database.
11. The Active Manager reads the maximum log generation number from the cluster database.
12. The Microsoft Exchange Information Store service mounts the new active database copy.
Server Failovers
A server failover occurs when the DAG member is no longer able to service the MAPI network, or when the Cluster service on a DAG member is no longer able
to contact the remaining DAG members. The following occurs as part of a server failover:
1. The Cluster service on the PAM sends a notification to the PAM for one of two conditions:
a. Node Down The server is reachable but is unable to participate in DAG operations.
b. MAPI Network Down The server can't be contacted over the MAPI network and therefore can't participate in DAG operations.
2. If the server is reachable, the PAM contacts the Active Manager on the affected server and requests that all databases be immediately dismounted.
3. For each affected database copy:
a. The PAM requests the database copy status from all servers in the DAG.
b. The PAM receives a response from all reachable and active DAG members.
c. The PAM tries to determine the best log source among all responding servers by querying the most recent log generation number from each of
the responders.
d. Each of the servers responds with the log generation number.
4. The PAM retrieves the current search index catalog status from the cluster database.
5. Based on the log generation number and catalog health of each database copy, the PAM selects the best copies to activate.
6. The PAM updates the mounted location of the database in the cluster database.
7. The PAM initiates database failover by communicating with the Active Manager on one or more other servers.
8. The Active Manager on the selected servers requests that the Microsoft Exchange Replication service attempt to copy the last logs from the previous
6/7

Switchovers and Failovers - Exchange 2010 Help

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Switchovers and Failovers - Exchange 2010 Help

Transféré par

Droits d'auteur :

Formats disponibles

1/22/2015

Extensible Storage Engine

RAID rebuild, database and

ESE "semisoft" log failure:

RAID rebuild may solve the

RAID rebuild may solve the

ESE software error or

Fix underlying resource issue.

This failure could

NTFS block failures: The

RAID rebuild may solve the

This is more likely

Database or log drive failure:

Database or log volume

Drive replacement followed by

Drive replacement followed by

Drive replacement followed by

Drive replacement followed by

Database or log volume out

Run full or incremental backups,

Administrator dismounts the

Administrator corrects the error.

Administrator corrects the error.

Administrator completes the

Administrator completes the

Administrator completes the

Active and passive

Storage area network SAN,

Server software maintenance.

Restart the Microsoft Exchange

Manually or automatically restart

Restart server, operating system,

Server failure: The server fails

Restore power, change operating

DAG experiences a quorum

Repair failed quorum, assign new

Fix communication problem by

Fix communication problem by

Fix communication problem by

Partial failure of one or more

Fix communication problem by

Undetected operating system

Restart or terminate the

Operating system drive

Replace drive and rebuild server

Operating system drive out

Manually free space on the

Drive containing Exchange

Replace drive and reinstall

Manually free space on the

Invalid new log detected: The

Remove disruptive logs after

The disruptive logs

Discard invalid log; move

Vous aimerez peut-être aussi