Vous êtes sur la page 1sur 6

1/22/2015

SwitchoversandFailovers:Exchange2010Help

Switchovers and Failovers


Exchange 2010

Applies to: Exchange Server 2010 SP3, Exchange Server 2010 SP2
Topic Last Modified: 20131101
Switchovers and failovers are the two forms of outages in Microsoft Exchange Server 2010. A switchover is a scheduled outage of a database or server that's
explicitly initiated by an administrator, typically in preparation for performing a maintenance operation. Switchovers involve an administrator moving the active
mailbox database copy to another server in the database availability group DAG.
A failover refers to unexpected events that result in the unavailability of services, data, or both. A failover involves the system automatically recovering from the
failure by activating a passive mailbox database copy to make it the active mailbox database copy.
The high availability platform in Exchange 2010 is designed to handle both switchovers and failovers.
Looking for management tasks related to high availability and site resilience? See Managing High Availability and Site Resilience.

Switchovers
There are three types of switchovers in Exchange 2010:
Database switchovers
Server switchovers
Datacenter switchovers

Database Switchovers
A database switchover is the process by which an individual active database is switched over to another database copy a passive copy, and that database copy
is made the new active database copy. Database switchovers can happen both within and across datacenters. A database switchover can be performed by
using the Exchange Management Console EMC or the Exchange Management Shell. Regardless of which interface is used, the switchover process is the same:
1. The administrator initiates a database switchover to move the current active mailbox database copy to another server. The switchover can be initiated by
using the MoveActiveMailboxDatabase cmdlet or by using the Activate a Database Copy wizard.
2. The client used for the task makes an RPC call to the Microsoft Exchange Replication service on a DAG member.
3. If the DAG member doesn't hold the Primary Active Manager PAM role, the DAG member refers the task to the PAM.
4. The task makes an RPC call to the Microsoft Exchange Replication service on the PAM.
5. The PAM reads and updates the database location information that's stored in the cluster database for the DAG.
6. The PAM contacts the Microsoft Exchange Replication service on the DAG member whose passive copy is being activated as the new active mailbox
database copy.
7. The Microsoft Exchange Replication service on the target server queries the Microsoft Exchange Replication services on all other DAG members to
determine the best log source for the database copy.
8. The database is dismounted from the current server and the Microsoft Exchange Replication service on the target server copies the remaining logs to
the target server.
9. The Microsoft Exchange Replication service on the target server requests a database mount.
10. The Microsoft Exchange Information Store service on the target server replays the log files and mounts the database.
11. Any error codes are returned to the Microsoft Exchange Replication service on the target server.
12. The PAM updates the database copy state information in the cluster database for the DAG.
13. Any error codes are returned by the Microsoft Exchange Replication service on the target server to the Microsoft Exchange Replication service on the
PAM.
14. The Microsoft Exchange Replication service on the PAM returns any errors to the administrative interface where the task was called.
15. Remote PowerShell returns the results of the operation to the calling administrative interface.
For detailed steps about how to perform a database switchover, see Move the Active Mailbox Database.

Server Switchovers
https://technet.microsoft.com/enus/library/dd298067(d=printer,v=exchg.141).aspx

1/7

1/22/2015

SwitchoversandFailovers:Exchange2010Help
A server switchover is the process by which all active databases on a DAG member are activated on one or more other DAG members. Like database
switchovers, a server switchover can occur both within a datacenter and across datacenters, and it can be initiated by using both the EMC and the Shell.
Regardless of which interface is used, the switchover process is the same:
1. The administrator initiates a server switchover to move all current active mailbox database copies to one or more other servers. The switchover can be
initiated by using the MoveActiveMailboxDatabase cmdlet, or by using the Switchover Server UI.
2. The task performs the same steps described earlier in this topic for database switchovers Steps 2 through 4 for each of the active databases on the
current server.
3. The PAM reads and updates the database location information that's stored in the cluster database for the DAG.
4. The PAM contacts the Microsoft Exchange Replication service on each DAG member that has a passive copy being activated.
5. The Microsoft Exchange Replication service on the target servers query the Microsoft Exchange Replication services on all other DAG members to
determine the best log source for the database copy.
6. The database is dismounted from the current server and the Microsoft Exchange Replication service on each target server copies the remaining logs.
7. The Microsoft Exchange Replication service on each target server requests a database mount.
8. The Microsoft Exchange Information Store service on each target server replays the log files and mounts the database.
9. Any error codes are returned to the Microsoft Exchange Replication service on the target server.
10. The PAM updates the database copy state information in the cluster database for the DAG.
11. Any error codes are returned by the Microsoft Exchange Replication service on the target server to the Microsoft Exchange Replication service on the
PAM.
12. The Microsoft Exchange Replication service on the PAM returns any errors to the administrative interface where the task was called.
13. Remote PowerShell returns the results of the operation to the calling administrative interface.
For detailed steps about how to perform a server switchover, see Perform a Server Switchover.

Datacenter Switchovers
A datacenter or site failure is managed differently from the types of failures that can cause a server or database failover. In a high availability configuration,
automatic recovery is initiated by the system, and the failure typically leaves the messaging system in a fully functional state. By contrast, a datacenter failure is
considered to be a disaster recovery event, and as such, recovery must be manually performed and completed for the client service to be restored and for the
outage to end. The process you perform is called a datacenter switchover. As with many disaster recovery scenarios, prior planning and preparation for a
datacenter switchover can simplify your recovery process and reduce the duration of your outage.
For more information about datacenter switchovers, including detailed steps for performing a datacenter switchover, see Datacenter Switchovers.
For assistance with performing a datacenter switchover, see Guided Walkthrough: Exchange Server 2010 Datacenter Switchover for a Database Availability
Group.

Failovers
A failover is an automatic activation process that can occur at either the database or server level. Failovers occur in response to a failure that affects an individual
database for example, an isolated storage loss or an entire server for example, a motherboard failure or a loss of power.
DAGs and mailbox database copies provide full redundancy and therefore rapid recovery of both the data and the services that provide access to the data. The
following table lists the expected recovery actions for a variety of failures. Some failures require the administrator to initiate the recovery, and other failures are
automatically handled by the system.

Description

Extensible Storage Engine


ESE soft database failure:
The drives storing the
database are returning errors
on some reads for example,
a 1018 error.

Automatic
activation

Automatic
repair action

Possible short
outage.

Automatic
patching of bad
page.

Possible
automatic
failover.

State
during
repair:
Active

State
during
repair:
Passive

Manual
switchover,
automatic
failover, or
online
repair.

Failed

https://technet.microsoft.com/enus/library/dd298067(d=printer,v=exchg.141).aspx

Repair actions

Comments

RAID rebuild, database and


database copy repair, restore and
run recovery then page patching,
or page patching from copy.

There may be
other soft
database failure
codes.
Doesn't include
NTFS file system
block failures.

2/7

1/22/2015

SwitchoversandFailovers:Exchange2010Help
If failover or
switchover is
performed, host
server is updated.
ESE "semisoft" database
failure: The drives storing the
database are returning errors
on some writes.

ESE "semisoft" log failure:


The drives storing the log
data are returning non
recovered errors on some
reads or writes.

Short outage
during automatic
failover.

Short outage
during automatic
failover.

Automatic
volume/disk
rebuilt after
possible drive
replacement.

Dismounted
if cant be
recovered.

Automatic
volume/disk
rebuilt after
possible drive
replacement.

Dismounted
if cant be
recovered.

Failed

RAID rebuild may solve the


problem.
Copy and repair, restore and run
recovery, or volume/disk rebuilt
after possible replacement.

Failed

RAID rebuild may solve the


problem.
Copy and repair, restore and run
recovery, or volume/disk rebuilt
after possible replacement.

An ESE semisoft
write error means
some writes are
successful.
Doesn't include an
NTFS block failure.
An ESE semisoft
read/write error
means some
reads/writes are
successful.
If the database
fails, automated
recovery will occur
before log data
recovery
processing starts.

ESE software error or


resource exhaustion: An error
where ESE terminates
instance for example, Event
ID 1022, checkpoint depth
too deep.

Short outage
during automatic
failover.

None.

Dismounted
if cant be
recovered.

Failed

Fix underlying resource issue.

This failure could


be the surfaced
error of other
cases.

NTFS block failures: The


drives storing the database or
logs experiences a read or
write error to an NTFS control
structure.

Short outage
during automatic
failover.

Volume
completely
rebuilt after
possible drive
replacement.

Dismounted
if cant be
recovered.

Failed

RAID rebuild may solve the


problem. NTFS utilities may solve
the NTFS problems. Exchange
recovery may be required.

This is more likely


to occur when
RAID isn't in use. If
this impacts the
active log volume,
some recent log
files will be lost.
Doesn't include
errors
automatically
corrected by NTFS
or its underlying
software or
hardware stack.

Database or log drive failure:


A drive storing the database
or logs has completely failed
and is inaccessible.

Database or log volume


failure: The volume fails due
to NTFS or lower level
volume issues.

Short outage
during automatic
failover.

Short outage
during automatic
failover.

Drive
reformatted or
replaced,
followed by
complete
volume rebuild.

Dismounted
if cant be
recovered.

Drive
reformatted or
replaced.

Dismounted
if cant be
recovered.

Failed

Drive replacement followed by


possible RAID rebuild.

Not applicable.

Drive replacement followed by


complete volume rebuild.
Complete volume rebuild.
Failed

Drive replacement followed by


possible RAID rebuild.

Not applicable.

Drive replacement followed by


complete volume rebuild.
Complete volume rebuild.

Database or log volume out


of space: The NTFS file
system with the database or
log files is out of space.

Automatic
failover if other
copy isn't in
similar state.

None.

Dismounted.

Failed

Run full or incremental backups,


manually delete logs, let time
pass, resume database copy, or
repair failed database copy.

Not applicable.

Administrator dismounts the


wrong database.

If automatic
failover isn't
blocked by the
administrator,
there will be a
short outage.

None.

Dismounted.

Not
applicable

Administrator corrects the error.

Not applicable.

If automatic
failover is

https://technet.microsoft.com/enus/library/dd298067(d=printer,v=exchg.141).aspx

3/7

1/22/2015

SwitchoversandFailovers:Exchange2010Help
prevented, there
will be an outage
until the database
is mounted.
Administrator suspends the
wrong database copy.

Depending on
configuration and
impacted copy,
auto recovery
may be
prevented.

None.

Not
applicable.

Suspended

Administrator corrects the error.

Not applicable.

Administrator dismounts a
database for storage, NTFS,
or volume maintenance.

If automatic
failover isn't
blocked by the
administrator,
there will be a
short outage.

None.

Dismounted.

Not
applicable

Administrator completes the


task.

Not applicable.

If automatic
failover is
blocked, there will
be an outage
until the
administrator
completes the
task.
Administrator suspends a
database copy for storage,
NTFS, or volume
maintenance.

Depending on
configuration and
impacted copy,
auto recovery
may be
prevented.

None.

Not
applicable.

Suspended

Administrator completes the


actions.

Not applicable.

Administrator dismounts a
database for offline database
maintenance.

Outage until
repaired.

None.

Dismounted.

Suspended

Administrator completes the


actions.

Active and passive


database copies
are diverged.
Administrator must
suspend copies.

Storage area network SAN,


disk, or storage controller
failure.

Short outage
during automatic
failover.

None.

Dismounted.

Any

Repair hardware.

A passive database
copy will be in the
state that existed
at the time when
the system failed.

Server hardware
maintenance.

Short outage
during automatic
failover unless
blocked by an
administrator.

None.

Dismounted.

Any

Complete actions.

A passive database
copy will be in the
state that existed
at the time when
the system was
shut down.

Server software maintenance.

Short outage
during automatic
failover unless
blocked by an
administrator.

None.

Dismounted.

Any

Complete actions.

A passive database
copy will be in the
state that existed
at the time when
the system was
shut down.

Microsoft Exchange
Information Store service is
stopped or paused by an
administrator.

None.

None.

Dismounted.

Any

Restart the Microsoft Exchange


Information Store service.

A passive database
copy will be in the
state that existed
at the time when
the service was
stopped.

Microsoft Exchange
Information Store service
fails; operating system is still
running.

Short outage
during automatic
failover.

Service Control
Manager
restarts the
Microsoft
Exchange
Information
Store service.

Dismounted.

Any

Manually or automatically restart


the Microsoft Exchange
Information Store service.

A passive database
copy will be in the
state that existed
when the
Microsoft
Exchange
Information Store
service failed.

https://technet.microsoft.com/enus/library/dd298067(d=printer,v=exchg.141).aspx

4/7

1/22/2015

SwitchoversandFailovers:Exchange2010Help
Partial Microsoft Exchange
Information Store service
failure; some part of the
Exchange store stops
functioning, but it's not
identified as completely
failed.

Possible short
outage during
automatic
failover.

None.

Mounted
and partially
functional.

Any, but
may be
only
partially
functional

Restart server, operating system,


or Microsoft Exchange
Information Store service.

Not applicable.

Server failure: The server fails


for one of the following
reasons:

Short outage
during automatic
failover.

Restart
computer.

Dismounted.

Any

Restore power, change operating


system settings, change
hardware settings, replace
hardware, restart operating
system, service operating system,
service hardware, or repair
communication problems.

Not applicable.

DAG experiences a quorum


failure.

Outage until
repaired.

None.

Dismounted.

Any

Repair failed quorum, assign new


quorum, or restore the network
that's causing quorum failure.

A passive database
copy will be in the
state that existed
at the time when
the system failed.

MAPI network
communication failure: The
server is no longer available
on the MAPI network.

Short outage
during automatic
failover; must be
lossless.

None.
Communication
continues to be
attempted.

Dismounted.

Any

Fix communication problem by


correcting hardware or software
issues.

Not applicable.

Replication network
communication failure: The
server cant receive
heartbeats, log copies, or
seed through the failed
replication network.

Possible short
copying or
seeding outage
while the
workload is
switched to other
network.

None.
Communication
continues to be
attempted.

None.

Any

Fix communication problem by


correcting hardware or software
issues.

Resiliency
impacted by
failure.

Multiple network
communication failure: The
server cant receive
heartbeats, log copies, or
seed through multiple
networks.

Short outage
during automatic
failover; must be
lossless.

None.
Communication
continues to be
attempted.

Dismounted.

Any

Fix communication problem by


correcting hardware or software
issues.

At least one
network is still
functional.

Partial failure of one or more


networks: Networks
experience high error rates.

Failure not
detected; no
action.

None.

Mounted,
but possible
performance
issues.

Any

Fix communication problem by


correcting hardware or software
issues.

Network
experiences higher
than normal error
rates.

Undetected operating system


hang: Operating system stops
responding but it's not
detected by monitoring or
clustering.

None.

None.

Any.

Any

Restart or terminate the


resources that aren't responding.

Hang isn't
detected so no
action is taken.

Operating system drive


experiences a failure.

Short outage
during automatic
failover.

None.

Dismounted.

Any

Replace drive and rebuild server


or rebuild volume by using RAID.

Not applicable.

Operating system drive out


of space.

Short outage
during automatic
failover.

None.

Dismounted.

Any

Manually free space on the


volume.

Not applicable.

Drive containing Exchange


binaries experiences a
volume or drive failure.

Short outage
during automatic
failover.

None.

Dismounted.

Any

Replace drive and reinstall


application or rebuild volume by
using RAID.

Not applicable.

Complete power
failure
Unrecovered failure of
the processor chip,
motherboard, or
backplane
Operating system stop
error
Operating system
stops responding
Complete
communication failure

Some functionality
may be
operational.

https://technet.microsoft.com/enus/library/dd298067(d=printer,v=exchg.141).aspx

5/7

1/22/2015

SwitchoversandFailovers:Exchange2010Help
Drive containing the
Exchange binaries is out of
space.

Short outage
during automatic
failover.

None.

Dismounted.

Any

Manually free space on the


volume.

Not applicable.

Invalid new log detected: The


log sequence is disrupted by
an existing file.

Short outage
during automatic
failover; assume
other copies
don't have the
same problem.

None.

Dismounted.

Failed

Remove disruptive logs after


determining source.

The disruptive logs


shouldn't replicate.

Continuous replication
detects invalid log: Replay
detects an inappropriate log
during copy or replay.

Not applicable.

Discard log.

Not
applicable.

Failed

Discard invalid log; move


impacting log stream.

Not applicable.

Database Failovers
A database failover occurs when a database copy that was active is no longer able to remain active. The following occurs as part of a database failover:
1. The database failure is detected by the Microsoft Exchange Information Store service.
2. The Microsoft Exchange Information Store service writes failure events to the crimson channel event log.
3. The Active Manager on the server that contains the failed database detects the failure events.
4. The Active Manager requests the database copy status from the other servers that hold a copy of the database.
5. The other servers return the requested database copy status to the requesting Active Manager.
6. The PAM initiates a move of the active database to another server in the DAG using a best copy selection algorithm.
7. The PAM updates the database mount location in the cluster database to refer to the selected server.
8. The PAM sends a request to the Active Manager on the selected server to become the database master.
9. The Active Manager on the selected server requests that the Microsoft Exchange Replication service attempt to copy the last logs from the previous
server and set the mountable flag for the database.
10. The Microsoft Exchange Replication service copies the logs from the server that previously had the active copy of the database.
11. The Active Manager reads the maximum log generation number from the cluster database.
12. The Microsoft Exchange Information Store service mounts the new active database copy.

Server Failovers
A server failover occurs when the DAG member is no longer able to service the MAPI network, or when the Cluster service on a DAG member is no longer able
to contact the remaining DAG members. The following occurs as part of a server failover:
1. The Cluster service on the PAM sends a notification to the PAM for one of two conditions:
a. Node Down The server is reachable but is unable to participate in DAG operations.
b. MAPI Network Down The server can't be contacted over the MAPI network and therefore can't participate in DAG operations.
2. If the server is reachable, the PAM contacts the Active Manager on the affected server and requests that all databases be immediately dismounted.
3. For each affected database copy:
a. The PAM requests the database copy status from all servers in the DAG.
b. The PAM receives a response from all reachable and active DAG members.
c. The PAM tries to determine the best log source among all responding servers by querying the most recent log generation number from each of
the responders.
d. Each of the servers responds with the log generation number.
4. The PAM retrieves the current search index catalog status from the cluster database.
5. Based on the log generation number and catalog health of each database copy, the PAM selects the best copies to activate.
6. The PAM updates the mounted location of the database in the cluster database.
7. The PAM initiates database failover by communicating with the Active Manager on one or more other servers.
8. The Active Manager on the selected servers requests that the Microsoft Exchange Replication service attempt to copy the last logs from the previous

https://technet.microsoft.com/enus/library/dd298067(d=printer,v=exchg.141).aspx

6/7

Vous aimerez peut-être aussi