Vous êtes sur la page 1sur 217

®

tm
The Definitive Guide To

Active Directory
Troubleshooting
and Auditing

Don Jones
(Sponsor Logo Here)
Introduction

Introduction to Realtimepublishers
By Sean Daily, Series Editor

The book you are about to enjoy represents an entirely new modality of publishing and a major
first in the industry. The founding concept behind Realtimepublishers.com is the idea of
providing readers with high-quality books about today’s most critical technology topics—at no
cost to the reader. Although this feat might sound somewhat impossible to achieve, it is made
possible through the vision and generosity of a corporate sponsor who agrees to bear the book’s
production expenses and host the book on its Web site for the benefit of its Web site visitors.
It should be pointed out that the free nature of these publications does not in any way diminish
their quality. Without reservation, I can tell you that the book that you’re now reading is the
equivalent of any similar printed book you might find at your local bookstore—with the notable
exception that it won’t cost you $30 to $80. The Realtimepublishers publishing model also
provides other significant benefits. For example, the electronic nature of this book makes
activities such as chapter updates and additions and the release of a new edition possible in a far
shorter timeframe than is the case with conventional printed books. Because Realtimepublishers
publishes our titles in “real-time”—that is, as chapters are written or revised by the author—you
benefit from receiving the information immediately rather than having to wait months or years to
receive a complete product.
Finally, I’d like to note that our books are by no means paid advertisements for the sponsor.
Realtimepublishers is an independent publishing company and maintains, by written agreement
with the sponsor, 100 percent editorial control over the content of our titles. It is my opinion that
this system of content delivery not only is of immeasurable value to readers but also will hold a
significant place in the future of publishing.
As the founder of Realtimepublishers, my raison d’être is to create “dream team” projects—that
is, to locate and work only with the industry’s leading authors and sponsors, and publish books
that help readers do their everyday jobs. To that end, I encourage and welcome your feedback on
this or any other book in the Realtimepublishers.com series. If you would like to submit a
comment, question, or suggestion, please do so by sending an email to
feedback@realtimepublishers.com, leaving feedback on our Web site at
http://www.realtimepublishers.com, or calling us at 707-539-5280.
Thanks for reading, and enjoy!

Sean Daily
Founder & CTO
Realtimepublishers.com, Inc.

i
Table of Contents

Introduction to Realtimepublishers.................................................................................................. i
Chapter 1: Introducing Active Directory .........................................................................................1
The Importance of Directories and Directory Management ................................................3
Many Eggs, One Basket...........................................................................................3
New Tools for New Times.......................................................................................4
Meet AD...............................................................................................................................4
The AD Database.................................................................................................................6
Logical Architecture of AD .................................................................................................6
Objects and Attributes..............................................................................................6
The Schema..............................................................................................................7
LDAP .......................................................................................................................9
Domains, Trees, and Forests..................................................................................10
Organizational Units ..............................................................................................14
The Global Catalog ................................................................................................15
Physical Structure of AD ...................................................................................................17
Domain Controllers................................................................................................17
Directory Replication.............................................................................................17
The Operations Masters .........................................................................................18
Sites........................................................................................................................19
AD’s Backbone: DNS........................................................................................................20
Introduction to AD and Windows Monitoring...............................................................................21
AD, Win2K, and WS2K3 Monitoring Considerations ......................................................24
Change Monitoring and Auditing ......................................................................................26
Problem Resolution, Automation, and Alerting ................................................................26
Other Considerations .........................................................................................................27
Summary ........................................................................................................................................27
Chapter 2: Designing an Effective Active Directory.....................................................................28
AD’s Logical and Physical Structures ...........................................................................................28
Logical Structures ..............................................................................................................29
Namespace .............................................................................................................29
Naming Context .....................................................................................................30
Physical Structures.............................................................................................................30
Designing AD ................................................................................................................................31

ii
Table of Contents

Designing the Forest and Trees......................................................................................................31


Determining the Number of Forests ..................................................................................34
Setting Up and Managing Multiple Forests ...........................................................35
Determining the Number of Trees .....................................................................................36
Designing the Domains..................................................................................................................38
Determining the Number of Domains................................................................................39
Choosing a Forest Root Domain........................................................................................40
Using a Dedicated Root Domain ...........................................................................41
Assigning a DNS Name to Each Domain ..........................................................................43
Using an Internet-Registered Name for the Top-Level Domain ...........................44
Using Internet Standard Characters .......................................................................44
Using Locations to Name Child Domains .............................................................45
Never Using the Same Name Twice......................................................................46
Dividing the Forest ............................................................................................................46
Placing the Domain Controllers for Fault Tolerance.........................................................48
Determining Trust Relationships .......................................................................................48
Using Bi-Directional Transitive Trusts..................................................................48
Using One-Way Trusts ..........................................................................................50
Using Cross-Link Trusts ........................................................................................51
Designing OUs for Each Domain ..................................................................................................53
Creating OUs to Delegate Administration.........................................................................54
Creating OUs to Reflect Your Company’s Organization ..................................................55
Creating OUs for Group Policy .........................................................................................56
Creating OUs to Restrict Access........................................................................................56
Designing the Sites for the Forest..................................................................................................56
Creating Sites and Site Links Based on Network Topology..............................................57
About Sites.............................................................................................................57
About Site Links ....................................................................................................58
Creating the Site Topology ....................................................................................59
Using Sites to Determine the Placement of Domain Controllers ......................................60
Using Sites to Determine the Placement of DNS Servers .................................................61
Summary ........................................................................................................................................61
Chapter 3: Monitoring and Tuning the Windows Server 2003 System and Network ...................62

iii
Table of Contents

Monitoring WS2K3 Domain Controllers.......................................................................................62


Monitoring the Overall System......................................................................................................64
Using Task Manager ..........................................................................................................66
Using the Performance Console.........................................................................................67
Event Viewer .....................................................................................................................69
Events Tracked in Event Logs ...............................................................................69
Types of Event Logs ..............................................................................................69
Starting Event Viewer............................................................................................70
Types of Events Logged by Event Viewer ............................................................71
Sorting and Filtering Events ..................................................................................71
Exporting Events....................................................................................................72
Monitoring Memory and Cache.....................................................................................................73
Using Task Manager to View Memory on a Domain Controller ......................................75
Using the Performance Console to Monitor Memory on a Domain Controller ................76
Available Memory Counters..................................................................................77
Page-Fault Counters...............................................................................................78
Paging File Usage ..................................................................................................79
System Cache.........................................................................................................80
Monitoring Processors and Threads...............................................................................................83
Using Process Viewer to Monitor Processes and Threads.................................................83
Using Task Manager to View Processes on a Domain Controller.....................................85
Working with the List of Processes .......................................................................87
Viewing Information About Processes ..................................................................88
Using the Performance Console to View Processes on a Domain Controller ...................88
% Processor Time Counter ....................................................................................89
Interrupts/sec Counter............................................................................................90
Processor Queue Length Counter ..........................................................................91
Monitoring the Disk.......................................................................................................................92
Using the Performance Console to Monitor the Disk Subsystem......................................92
% Disk Time and % Idle Time Counters ...........................................................................92
Disk Reads/sec and Disk Writes/sec Counters ..................................................................93
Current Disk Queue Length Counter .................................................................................94
% Free Space Counter........................................................................................................95

iv
Table of Contents

Monitoring the Network ................................................................................................................96


Using Task Manager to Watch Network Traffic ...............................................................96
Using Network Monitor to Watch Network Traffic ..........................................................97
Using the Performance Console to Monitor Network Components on a Domain
Controller ...........................................................................................................................97
Domain Controller Network Throughput ..............................................................97
Network Interface Throughput ..............................................................................98
All That Monitoring…So Little Information.................................................................................99
Auditing as a Troubleshooting Tool ............................................................................................100
Planning for Auditing ......................................................................................................100
Other Auditing Techniques..............................................................................................101
Summary ......................................................................................................................................102
Chapter 4: Monitoring and Auditing Active Directory................................................................103
Using the Monitoring and Auditing Tools...................................................................................103
Third-Party Tools.............................................................................................................103
DirectoryAnalyzer................................................................................................104
ChangeAuditor for Active Directory ...................................................................105
DirectoryTroubleshooter......................................................................................106
Insight for Active Directory.................................................................................107
AppManager Suite ...............................................................................................108
Microsoft Operations Manager ............................................................................109
Built-In Tools...................................................................................................................109
System Monitor....................................................................................................109
Event Viewer .......................................................................................................110
Replication Diagnostics .......................................................................................111
Monitoring the AD Infrastructure................................................................................................113
Monitoring the Domain Controllers.............................................................................................114
Using DirectoryAnalyzer .................................................................................................114
Using NT Directory Service Performance Counters .......................................................118
Monitoring the Domain Partitions ...............................................................................................121
Using DirectoryAnalyzer .................................................................................................122
Using Domain Database Performance Counters..............................................................123
Installing the Counters .........................................................................................125
Monitoring the Global Catalog ....................................................................................................125

v
Table of Contents

Monitoring Operations Masters ...................................................................................................127


Monitoring Replication................................................................................................................130
Using Directory Partition Replicas ..................................................................................131
Schema Partition ..................................................................................................131
Configuration Partition ........................................................................................131
Domain Partition..................................................................................................131
Using Directory Updates..................................................................................................132
Using the Replication Topology ......................................................................................133
Using DirectoryAnalyzer .................................................................................................135
Monitoring via Auditing ..............................................................................................................138
Setting up Auditing ..........................................................................................................138
Reviewing Auditing Messages ........................................................................................139
Using ChangeAuditor for Active Directory.....................................................................140
Summary ......................................................................................................................................142
Chapter 5: Troubleshooting Active Directory and Infrastructure Problems................................143
Following a Specific Troubleshooting Methodology ..................................................................143
Troubleshooting Network Connectivity ......................................................................................144
Testing for Network Connectivity ...................................................................................144
Testing the IP Addresses..................................................................................................145
Testing the TCP/IP Connection .......................................................................................147
Performing Other Troubleshooting Tests Using DirectoryAnalyzer...............................148
Domain Controller Connectivity Test..................................................................148
Domain Connectivity Test ...................................................................................149
Site Connectivity Test..........................................................................................150
Troubleshooting Name Resolution ..............................................................................................152
Understanding Name Resolution .....................................................................................152
Checking that DNS Records Are Registered ...................................................................152
Using Event Viewer.............................................................................................153
Using PING..........................................................................................................154
Using NSLOOKUP..............................................................................................154
Checking the Consistency and Properties of the DNS Server .........................................155
When the DNS Server Doesn’t Resolve Names Correctly..............................................156
How the Caching DNS-Resolver Service Works ................................................156

vi
Table of Contents

Using Other Techniques ......................................................................................157


Troubleshooting the Domain Controllers ....................................................................................158
Understanding the AD Database and Its Associated Files...............................................158
Comparing Directory Information ...................................................................................159
First: What Changed? ......................................................................................................160
Analyzing the State of the Domain Controllers...............................................................160
Using NTDSUTIL ...........................................................................................................162
Locating the Directory Database Files.................................................................163
Checking for Low-Level Database Corruption....................................................165
Checking for Inconsistencies in the Database Contents ......................................166
Cleaning Up the Meta Data..................................................................................168
Moving the AD Database or Log Files ................................................................171
Repairing the AD Database .................................................................................172
Troubleshooting Secure Channels and Trust Relationships ............................................174
Troubleshooting the Operations Masters .....................................................................................175
When Operations Masters Fail.........................................................................................176
Schema Master.....................................................................................................176
Domain Naming Master.......................................................................................176
RID Master...........................................................................................................176
Infrastructure Master............................................................................................177
PDC Emulator......................................................................................................177
Determining the Operations Master Role Holders Locations..........................................177
Using the DSA and Schema MMC Snap-Ins.......................................................178
Using NTDSUTIL ...............................................................................................179
Using the Resource Kit’s Dumpfsmos.cmd.........................................................179
Using DCDIAG ...................................................................................................179
Using AD Replication Monitor............................................................................180
Using Third-Party Utilities ..................................................................................180
Seizing an Operations Master Role..................................................................................181
Checking for Inconsistencies Among Domain-Wide Operations Masters ......................182
Troubleshooting the Replication Topology .................................................................................183
Viewing the Replication Partners for a Domain Controller.............................................183
Forcing Domain Controllers to Contact Replication Partners .............................184

vii
Table of Contents

Tracking Replicated Changes ..............................................................................184


Forcing Replication Among Replication Partners ...........................................................184
Viewing Low-Level AD Replication Status ....................................................................185
Checking for KCC Replication Errors.............................................................................186
Troubleshooting by Using Change Management ........................................................................187
Summary ......................................................................................................................................188
Chapter 6: Creating an Active Directory Design that You Can Audit and Troubleshoot ...........189
Design Goals................................................................................................................................189
Performance Considerations ........................................................................................................190
Overauditing ....................................................................................................................190
Overmonitoring................................................................................................................192
Design Considerations .................................................................................................................194
Who Will Troubleshoot?..................................................................................................194
How Will Auditing Be Utilized? .....................................................................................195
For How Long Will Data Be Maintained?.......................................................................197
Design Guidelines........................................................................................................................199
Selecting Appropriate Tools ............................................................................................199
Configuring the Environment ..........................................................................................201
Maintaining the Proper Configuration .............................................................................202
Monitoring Core Areas ....................................................................................................202
Preventing Trouble.......................................................................................................................203
A Process for Change.......................................................................................................204
Tools to Manage Change .................................................................................................205
Summary ......................................................................................................................................207

viii
Copyright Statement

Copyright Statement
© 2005 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that
have been created, developed, or commissioned by, and published with the permission
of, Realtimepublishers.com, Inc. (the “Materials”) and this site and any such Materials are
protected by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web
site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be
held liable for technical or editorial errors or omissions contained in the Materials,
including without limitation, for any direct, indirect, incidental, special, exemplary or
consequential damages whatsoever resulting from the use of any information contained
in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, non-
commercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent
& Trademark Office. All other product or service names are the property of their
respective owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtimepublishers.com, please contact us via e-mail at
info@realtimepublishers.com.

ix
Chapter 1

Chapter 1: Introducing Active Directory


As computer networks have evolved over the years, the focus in enterprise computing has shifted
away from a PC network operating system-centric (NOS) model to one based on the concept of
directories, or directory services. A directory service is a network service that stores information
about network resources and makes those resources available to network users and applications.
Directories also provide an environment that allows for the uniform naming, location, access,
management, and security of network resources. These days, nearly all companies with large
enterprise-level networks, and even many of those with small to midsized networks, employ one
or more directories within their organizations. Although the concept of directories has been
around for some time, it is only in recent years that the directory has moved into the limelight of
network computing.
Directories provide a number of advantages, not the least of which is user convenience and
consistent security. Prior to the adoption of directories—most notable in the Novell NetWare 2.x
and 3.x days—users would maintain separate accounts on each server they accessed. When the
time came to change passwords, chaos often ensued. Directories solve this (and other) problems
by providing a single location for information such as security accounts, allowing this
information to be used uniformly across the enterprise.
Although Microsoft’s Windows NT operating system (OS) introduced a very basic directory in
the form of the NT Directory Service (whose heart and soul is the Security Accounts Manager—
SAM—database), this “directory” has several major limitations. Among these are:
• Non-hierarchical structure and namespace—NT’s directory uses a flat, non-hierarchical
directory structure that doesn’t support the naming and structural needs of complex
organizations.
• Lack of extensibility—NT’s directory stores only basic user information and can’t be
inherently extended. This lack of extensibility requires applications—such as Microsoft’s
popular Exchange Server messaging platform—to maintain their own, independent
directories capable of storing application-specific data.
• Lack of scalability—The NT directory is stored inside the NT system registry database;
due to this architecture, the maximum number of security accounts tops out in the
neighborhood of around 40,000 per domain.
• Poor manageability features—Administration roles aren’t layered and can’t be natively
delegated.
• Poor directory replication performance—Because NT’s architecture is bandwidth- and
network topology-ignorant, the NT OS can’t automatically tune replication frequency and
bandwidth usage to adapt to variable WAN link speeds between multiple physical
locations within a network.
• Single-master, single-point-of-failure architecture—NT’s architecture calls for a single
server in each network domain—the Primary Domain Controller (PDC)—to house the
“master” copy of the directory, thus making it a single point of failure for logon
authentication for the entire domain.

1
Chapter 1

In Windows 2000 (Win2K), NT’s successor OS, Microsoft set out to deliver a directory capable
of addressing each of these limitations. Win2K’s new directory service, dubbed Active Directory
(AD), provides an industrial-strength directory service that can serve the needs of both small and
very large organizations, and everyone in between. Because it stores its data outside the system
registry, AD has virtually unlimited storage capacity (AD databases can contain hundreds of
millions of entries, as compared with the tens of thousands NT is capable of storing). AD allows
administrators to define physical attributes of their network, such as individual sites and their
connecting WAN links, as well as the logical layout of network resources such as computers and
users. Using this information, AD is able to self-optimize its bandwidth usage in multi-site WAN
environments. AD also introduces a new administration model that provides a far more granular
and less monolithic model than is present under NT 4.0. Finally, AD also provides a central point
of access control for network users, which means that users can log on once and gain access to
all network resources.
Although other directories such as Banyan’s StreetTalk and Novell’s NDS have existed for some
time, many NT-centric organizations have opted to wait and use Microsoft’s entry in the
enterprise directory arena as the foundation for their organization-wide directory environment.
Consequently, AD represents the first foray into the larger world of directories and directory
management for many organizations and network administrators.
Windows Server 2003 (WS2K3) introduces a new version of AD that is essentially a more
mature, refined version of the AD introduced in Win2K. Several minor enhancements have been
made to improve performance, improve the experience of interacting with the directory, and to
enhance the directory’s manageability. AD as implemented in WS2K3 is completely backward-
compatible with Win2K’s version of AD; a series of “functional levels” disable functionality that
isn’t backward-compatible until the entire organization is running the latest version.

WS2K3 AD is definitely an evolutionary product, meaning it represents small but important changes
over prior versions. Win2K’s AD, however, could reasonably be called a revolutionary product, as it
represented a complete and total change over the prior “directory” offered in Windows.

With widespread adoption of AD finally a reality, the directory is taking on new and unforeseen
roles within most organizations. The concept of the directory as a single repository for user
accounts has been vastly expanded. Today’s directories are expected to serve as centralized
identity management applications. Everything related to the identity of security principals—user
names, digital certificates, application information, and more—is being stored in the directory. In
its relatively short life, AD has become one of the most mission-critical applications in
organizations that have deployed it, serving as the lynchpin for a variety of enterprise
applications and providing centralized identity management across the organization.

2
Chapter 1

The Importance of Directories and Directory Management


Directories provide a logically centralized repository for all critical information within an
enterprise network. Rather than spreading information around between many different databases,
organizations can use a centralized directory such as AD to consolidate all critical company
information into a single shared network resource. In addition to improving organizational
efficiency, this move also allows for significant reductions in the total cost of ownership (TCO)
of the corporate network. The concept of wholesale migration from NT (or non-Microsoft)
directories to AD has also become more feasible with both existing and announced support from
major application vendors, including those producing enterprise resource planning (ERP),
groupware, Human Resources (HR), and accounting packages.

Many Eggs, One Basket


Although the large-scale centralization and consolidation of critical data is one of the most
significant benefits of migrating to a directory-based NOS such as AD, this also represents one
of its greatest potential weaknesses. Whenever critical information is moved from a distributed
model to one that is highly centralized, the tolerance for downtime and problems is greatly
reduced, and at the same time, the risk of loss due to downtime is increased. One excellent
example of this potential problem is the Windows registry: Prior to NT and Windows 95,
Windows and Windows-based applications stored information in independent text files carrying
an INI filename extension. NT and Windows 95 introduced the centralized registry, which
contained all the information previously stored in independent files. Prior to the registry, a single
corrupted INI file would affect, at most, one application or aspect of Windows; a corrupted
registry, however, can render the entire system useless.
Furthermore, many organizations planning AD migrations have chosen to focus the majority of
their preparatory efforts and budgets on issues such as legacy hardware and software
compatibility and application interoperability in the Win2K or WS2K3 environment. Although
these are certainly worthwhile and important considerations, they are by no means the only steps
required to guarantee a successful AD deployment. In addition to compatibility and capacity
issues, IT departments within these organizations must determine which additional tools,
information, and training will be required to properly support their Windows network
environments on a day-to-day basis.
The moral is that by consolidating so much information into AD, you run the risk of creating a
single point of failure within your organization—the single point of failure being AD itself. It’s
therefore critical that you invest the necessary planning resources to ensure that AD is a robust,
fault-tolerant directory. AD provides all the necessary technologies and features to be a very
robust directory service, provided you design it to be so.

3
Chapter 1

New Tools for New Times


To effectively support AD-based networks, administrators need to engage in additional network
management activities beyond those taken with previous versions of NT in order to maintain the
same levels of network availability they had in the past. With any computer network, it is
imperative that critical statistics—such as server CPU, memory, and disk utilization, as well as
network connectivity statistics—be monitored on an ongoing basis. Win2K and WS2K3
introduce additional components, services, and dependencies that must also be regularly
monitored alongside these other metrics. These elements, which collectively comprise
Windows’s core infrastructure, include items such as domain controllers, AD databases and
services, the Global Catalog (GC), intra- and inter-site replication, site links, and DNS servers.
Because Windows and Windows-centric applications rely heavily on these services and
components for proper operation, administrators must be able to guarantee not only their general
availability but also an acceptable baseline of performance. Failure to do so can result in severe,
network-wide problems including slow or failed user logon authorizations, failed convergence of
directory data, the inability to access critical applications, printing problems, and similar
maladies. These problems are of particular concern for IT shops that offer service-level
agreements (SLAs) to their corporate parents or clients. To be able to properly maintain their
Windows infrastructure, IT shops will not only need Win2K- and WS2K3-aware monitoring and
management tools but also specific knowledge about what needs to be monitored, what
thresholds must be set to maintain acceptable levels of performance, and what needs to be done
in the event that problems should occur.
Win2K, WS2K3, and More
With the recent downturn in IT budgets and staff, today’s companies are less likely to undertake
wholesale migrations to new versions of Windows. Functioning Win2K machines are likely to be left in
place as file servers, print servers, application servers, and even domain controllers for the foreseeable
future. Although WS2K3 may be adopted for new deployments, companies aren’t necessarily going to
replace working Win2K machines. WS2K3’s version of AD is specifically designed to accommodate this
mixed-version environment, although you will obviously not be able to take advantage of every new
WS2K3 feature when operating in a mixed-version environment.
Throughout this book, the assumption will be that administrators are working in a mixed-version
environment containing both Win2K and WS2K3 servers in varying roles. Features, problems, tools, and
techniques specific to WS2K3 will be identified as such, providing administrators with an easy way of
identifying things which will, or will not, work in their specific environments. Any discussion specific to an
all-WS2K3 environment will also be called out as such.

Meet AD
Of all of the elements that comprise a Win2K or WS2K3 network, the most important by far is
AD, Windows’ centralized directory service. However, before we delve into the specifics of AD,
let’s first define some of the fundamental terms and concepts related to directory-enabled
networks. A directory (which is sometimes also referred to as a data store) maintains data about
objects that exist within a network, in a hierarchical structure, making the information easier to
understand and access. These objects include traditional network resources such as user and
machine accounts, shared network resources (such as shared directories and printers), and
resources such as network applications and services, security policies, and virtually any other
type of object an administrator or application wants to store within the directory data store.

4
Chapter 1

As mentioned earlier, a directory service is a composite term that includes both the directory data
store and the services that make the information within the directory available to users and
applications. Directory services are available in a variety of different types and from different
sources. OS directories, such as Microsoft’s AD and Novell’s NDS, are general purpose
directories included with the NOS and are designed to be accessible by a wide array of users,
applications, and devices. There are also some applications, such as ERP systems, HR systems,
and email systems (for example, Microsoft Exchange) that provide their own directories for
storing data specific to the functionality of those applications.

Microsoft Exchange Server 200x is a notable exception to this and is completely integrated with AD.
Exchange Server’s installation process extends AD’s structure to accommodate Exchange-specific
data and subsequently uses AD to store its own directory information.

AD is Microsoft’s directory service implementation in the Win2K and WS2K3 server OSs. AD
is hosted by one or more domain controllers, and is replicated in a multi-master fashion between
those domain controllers to ensure greater availability of the directory and network as a whole.
Any Windows server running AD is considered to be a domain controller; domain controllers,
then, are the only servers that implement the various services and features that comprise AD. In
addition to providing a centralized repository for network objects and a set of services for
accessing those objects, AD provides security in the form of access control lists (ACLs) on
directory objects that protect those objects from being accessed by unauthorized parties.

There are many features of other applications—such as Microsoft Exchange Server—that take
advantage of AD, although those applications themselves do not need to be running on a domain
controller.

The term multi-master indicates that multiple read/write copies of the database exist simultaneously,
one on each domain controller. Thus, each domain controller is effectively an equal peer of the other
controllers, and any controller can write directory updates and propagate those updates to other
controllers. This functionality is in notable contrast to NT 4.0’s single-master PDC/BDC replication
topology wherein a single domain controller, the PDC, houses a read/write copy of the database.
AD includes a complex and robust replication infrastructure that is designed to accommodate this
multi-master model. For example, conflicts, which occur when two controllers change the same thing
at nearly the same time, are resolved automatically.

5
Chapter 1

The AD Database
At a file-system level, AD uses Microsoft’s Extensible Storage Engine (ESE) to store the
directory database. Administrators familiar with Microsoft Exchange Server may recognize this
engine as the same database technology used in that product. Like Exchange Server, AD’s
database employs transactional log files to help ensure database integrity in the case of power
outages and similar events that interfere with the successful completion of database transactions.
In addition, AD shares Exchange’s ability to perform online database maintenance and
defragmentation. At the file level, AD stores its database in a single database file named Ntds.dit,
a copy of which can be found on every domain controller.
Although the building blocks that make up AD are largely masked by the directory’s high-level
management interfaces and APIs, the physical aspects of the directory are nonetheless an
important consideration for Windows administrators. For example, it is critical that all volumes
on domain controllers hosting the AD database and its transaction logs maintain adequate levels
of free disk space at all times. For performance reasons, it is also important that the AD
databases on these machines not become too heavily fragmented.
AD is a database, which effectively turns Windows domain controllers into critical database
servers on the network. These servers should therefore be treated no differently than any other
important database server in terms of fault tolerance preparation (for example, disk redundancy,
backups, and power protection) and capacity planning.

Logical Architecture of AD
To gain an appreciation for and understanding of AD and AD management concepts, it’s
important to first understand AD’s logical architecture. In this section, we’ll discuss the most
important concepts associated with AD, concepts which form the foundation of all Windows
networks.

Objects and Attributes


Just as the primary item of storage in a file system is a file, the primary item of storage in AD is
an object. Objects can take many different forms; for example, users, computers, and printers all
exist as objects within the directory. However, other items you might not immediately think of
are also stored as objects; for example, policies that define which applications a particular group
or user should have on their computer(s).
AD uses an object-oriented approach to defining directory objects. That is to say there exists a
set of classes, which define the kinds of objects one can create (or instantiate, as in “creating an
instance of...”) within the directory. Each class—such as user, computer, and so forth—has a set
of attributes that define the properties associated with that class. For example, AD has a user
class with attributes such as First Name, Address, and so on.

6
Chapter 1

There are special types of objects in AD known as container objects that you should be familiar with.
Put simply, container objects are objects that may contain other objects. This design allows you to
organize a tree or hierarchy of objects. Examples of container objects include organizational unit (OU)
and domain objects. Container objects may hold both objects and/or other container objects. For
example, an OU object can contain both regular objects such as users and computers and other OU
container objects.

Although it’s perfectly acceptable to say “create” in lieu of “instantiate” when referring to the
generation of a new object within the directory, we’ll use the latter more frequently in this book. The
reason is that “instantiate” is more appropriate when you consider the underlying event that actually
occurs—that being the “creation of an instance of” an object.

The Schema
As you might imagine, all of the object classes and attributes discussed thus far have some kind
of underlying reference that describes them—a sort of “dictionary” for AD. In Windows
parlance, this “dictionary” is referred to as the schema. The AD schema contains the definitions
of all object types that may be instantiated within the directory. The AD schema is also
extensible, meaning it can be extended to include additional classes and attributes to support
future features, other applications, and so forth.

Even the AD schema itself is stored in the directory as objects. That is, AD classes are stored as
objects of the class “classSchema” and attributes are stored as objects of class “attributeSchema.”
The schema, then, is just a number of instances of the classes “classSchema” and “attributeSchema,”
with properties that describe the relationship between all classes in the AD schema.

To understand the relationship between object classes, objects, and the schema, let’s go back to
the object-oriented model upon which the AD schema is based. As is the case with object-
oriented development environments (such as C++ and Java), a class is a kind of basic definition
of an object. When I instantiate an object of a certain class, I create an instance of that particular
object class. That object instance has a number of properties associated with the class from
which it was created. For example, suppose I create a class called “motorcycle” that has
attributes such as “color,” “year,” and “enginesize.” I can instantiate the class “motorcycle” and
create a real object called “Yamaha YZF600R6” with properties such as “red” (for the color
attribute), 2000 (for the year attribute), and 600 (for the motorcycle engine’s size in CCs).

7
Chapter 1

Similarly, an AD implementation within your enterprise is just the instantiation of the AD


schema classes and attributes into hundreds or thousands of different object classes and their
associated attributes. For example, I might create an object of the class user called Craig Daily,
which has properties such as password, address, and home directory location. You can view the
AD schema through the Active Directory Schema Microsoft Management Console (MMC) snap-
in, which Figure 1.1 shows.

Figure 1.1: Viewing the AD schema by using the Active Directory Schema MMC snap-in.

Editing the schema is a potentially dangerous activity—you need to know exactly what you’re doing
and why you’re doing it. Before you make schema changes, be sure to back up the current AD
database contents and schema (for example, by using ntbackup.exe or a third-party utility’s System
State backup option on an up-to-date domain controller).

8
Chapter 1

Viewing the AD Schema


Some of you may be curious about how to use the Active Directory Schema MMC snap-in to view the AD
schema. It’s not immediately obvious how to do so using the MMC console because the Active Directory
Schema console isn’t available in the default list of snap-ins. To use this snap-in, you need to manually
register it by selecting Start, Run (or entering a command-prompt session), and typing
regsvr32 schmmgmt.dll
You’ll receive a message stating that the OS successfully registered the .dll file. You can now load and
use the Active Directory Schema snap-in through the MMC utility. For example, you can open an MMC
session and choose Add/Remove Snap-in from the Console menu, then select Active Directory Schema
from the Add Standalone Snap-In dialog box (Figure 1.1 shows the Active Directory Schema snap-in’s
view of the AD schema).
To modify the AD schema, you need to use a different utility: the MMC ADSI Edit snap-in. ADSI Edit is
essentially a low-level AD editor that lets you view, change, and delete AD objects and object attributes.
In terms of usefulness and potential danger, ADSI Edit is to AD what the regedit or regedt32 registry
editors are to the system registry.
To use the ADSI Edit utility to make schema modifications, you first need to be a member of the Schema
Admins group. The Schema Admins group is a universal group in native-mode Win2K domains (and in
WS2K3 domains running in Win2K native mode or later) and a global group in other AD domains (that is,
those that still are still running NT 4.0 domain controllers or have no more NT domain controllers but
haven’t yet been converted to Win2K’s native mode or a later WS2K3 functional level). To use the snap-
in, first register the associated adsiedit.dll file at the command line:
regsvr32 adsiedit.dll
The ADSI Edit snap-in will be available from the MMC’s Console/Add/Remove snap-in menu. Once
you’ve added the snap-in, you can use the ADSI Edit console to make changes to AD objects and
attributes.

LDAP
One of the early design decisions that Microsoft made regarding AD was the use of an efficient
directory access protocol known as the Lightweight Directory Access Protocol. LDAP also
benefits from its compatibility with other existing directory services. This compatibility, in turn,
provides for the interoperability of AD with these other directory services.

AD supports LDAP versions 2 and 3.

9
Chapter 1

LDAP specifies that every AD object be represented by a unique name. These names are formed
by combining information about domain components, OUs, and the name of the target object,
known as a common name. Table 1.1 provides each of these LDAP name components and their
descriptions.
Attribute Type DN Abbreviation Description
Domain-Component DC An individual element of the DNS domain name of
the object’s domain (for example, com, org, edu,
realtimepublishers, Microsoft)
Organizational-Unit-Name OU An OU container object within an AD domain
Common-Name CN Any object other than domain components and
OUs (such as printers, computers, and users)
Organization-Name O The name of a single organization, such as a
company; although part of the X.500 and LDAP
standards, Organization is generally not used in
directories such as AD that use domain
components to organize the tree structure
Locality-Name L The name of a physical locale, such as a region or
a city; although part of the X.500 and LDAP
standards, Locality is generally not used in
directories such as AD that use domain
components to organize the tree structure
Country-Name C The name of a country; although part of the X.500
and LDAP standards, Country is generally not used
in directories such as AD that use domain
components to organize the tree structure

Table 1.1: LDAP name components.

For example, the LDAP name for the user object for a person named Don Jones in the
realtimepublishers.com domain’s Marketing OU would be as follows:
CN=Don Jones,OU=Marketing,DC=realtimepublishers,DC=com
This form of an object’s name as it appears in the directory is referred to as the object’s
distinguished name (DN). Alternatively, an object can also be referred to using its relative
distinguished name. The RDN is the portion of the DN that refers to the target object within its
container. In the previous example, the RDN of the user object would simply be Don Jones.

Domains, Trees, and Forests


A significant advantage of AD is that it allows for a flexible, hierarchical design. To facilitate
this design, the AD structure employs several logical components. The first of these components
is the domain. A domain serves as the core unit in AD’s logical structure and is defined as a
collection of computers that share a common directory database. In fact, this definition is
basically identical to that of NT domains. Like NT domains, AD domains have unique names.
However, unlike the NetBIOS-based domain names used in NT, AD domains use a DNS naming
structure (for example, realtimepublishers.com or mydomain.org).

10
Chapter 1

Domains also have several other important characteristics. First, they act as a boundary for
network security: each domain has its own separate and unique security policy that defines items
such as password expiration and similar security options. Domains also act as an administrative
boundary, because administrative privileges granted to security principals within a domain do
not automatically transfer to other domains within AD. Finally, domains act as a unit of
replication within AD—as all servers acting as domain controllers in an AD domain replicate
directory changes to one another, they contain a complete set of the directory information related
to their domain.

AD domain names don’t need to be Internet-registered domain names ending in Internet-legal top-
level domains (such as .com, .org, and .net). For example, it is possible to name domains with
endings such as .pri, .msft, or some other ending of your choosing. This of course assumes that the
domain’s DNS servers aren’t participating in the Internet DNS namespace hierarchy (which is by far
the most common scenario, due to security considerations with exposing internal DNS servers to the
Internet). If you do elect to use standard Internet top-level domains in your AD domain names, you
should register these names on the Internet even if they don’t participate in the Internet DNS
namespace. The reason is that most organizations are connected to the Internet, and using
unregistered internal domain names that may potentially be registered on the Internet could cause
name conflicts.

AD’s design also integrates the concepts of forests and trees. A tree is a hierarchical arrangement
of AD domains within AD that forms a contiguous namespace. For example, assume a domain
named xcedia.com exists in your AD structure. The two subdivisions of xcedia.com are europe
and us, which are each represented by separate domains. Within AD, the names of these domains
would be us.xcedia.com and europe.xcedia.com. These domains would form a domain tree
because they share a contiguous namespace. This arrangement demonstrates the hierarchical
structure of AD and its namespace—all of these domains are part of one contiguous related
namespace in the directory; that is to say, they form a single domain tree. The name of the tree is
the root level of the tree, in this case, xcedia.com. Figure 1.2 shows the single-domain tree
described in this example.

11
Chapter 1

Figure 1.2: An AD forest with a single tree.

A forest is a collection of one or more trees. A forest can be as simple as a single AD domain, or
more complex, such as a collection of multi-tiered domain trees.
Let’s take this single-tree example scenario a step further. Assume that within this AD
environment, the parent organization, Xcedia, also has a subsidiary company with a domain
name of Realtimepublishers.com. Although the parent company wants to have both
organizations defined within the same AD forest, it wants their domain and DNS names to be
unique. To facilitate this configuration, you would define the domains used by the two
organizations within separate trees in the same AD forest. Figure 1.3 illustrates this scenario. All
domains within a forest (even those in different trees) share a schema, configuration, and GC
(we’ll discuss the GC in a later section). In addition, all domains within a forest automatically
trust one another due to the transitive, hierarchical Kerberos trusts that are automatically
established between all domains in an AD forest.

The Kerberos version 5 authentication protocol is a distributed security protocol based on Internet
standards and is the default security mechanism used for domain authentication within or across AD
domains. Kerberos replaces NT LAN Manager (NTLM) authentication used in NT Server 4.0 as the
primary security protocol for access to resources within or across AD domains. AD domain controllers
still support NTLM to provide backward compatibility with NT 4.0 machines.

12
Chapter 1

Figure 1.3: Example of a multi-tree AD forest.

In the case of a forest with multiple trees, the name of the forest is the name of the first domain
created within the forest (the root domain of the first tree created in the forest).

Although cohabitation of different organizations within the same AD forest is appropriate in


some circumstances, in others, it is not. For example, unique security or schema needs may
require two companies to use entirely different AD forests. In these situations, Kerberos trusts
aren’t established between the two forests, but you can create explicit trusts between individual
domains in different forests.
Each AD forest contains a group named Enterprise Admins. Members of this special group have
full control over each domain in the forest. In many companies, then, political reasons make it
necessary to create multiple forests. Suppose in the previous example that the folks running
Xcedia.com and the folks running Realtimepublishers.com need to maintain their own,
independent IT environments. In that case, having the two trees in one forest might not be
advisable, because members of the forest-wide Enterprise Admins group would have control
over both domain trees.
Trusts, therefore, are important in any AD design. Within a single domain tree, each domain has
a two-way, transitive trust with other domains in the tree. For example, the us.xcedia.com
domain trusts the xcedia.com domain and vice-versa. The us.xcedia.com domain and the
europe.xcedia.com domains also enjoy an implicit two-way trust through their shared trust of the
xcedia.com domain. The trust relationship simply means that security principals—such as
users—in one trusted domain can be assigned access permissions in another trusting domain; the
mere existence of the trust, however, doesn’t automatically confer any special permissions.

13
Chapter 1

Win2K provides the ability to create one-way, nontransitive trusts. For example, the xcedia.com
domain might be configured to trust the realtimepublishers.com domain. As the trusted domain,
realtimepublishers.com users could be given permissions to resources in the trusting xcedia.com
domain. However, the reverse would not be true because the trust is one-way. What’s more,
xcedia.com’s child domains wouldn’t participate in the trust because this manual, inter-domain
trust is nontransitive.
WS2K3 domains running at the highest forest functional level can also establish one-way,
nontransitive trusts between other entire forests. Microsoft provides this capability to correct a
problem with Win2K in the Enterprise Admins group. Because this group has overriding control
over every domain in a forest, many organizations were forced to create multiple forests to
maintain their desired security boundaries. However, without trusts between forests, providing
users in other forests with access to resources was difficult, often requiring users to maintain
accounts in each of an organization’s forests, and partially defeating one of the directory’s
primary purposes which is to have one user account per person. WS2K3 forest trusts allow
forests to be used as an ultimate security boundary, while still providing cross-forest access when
needed.

There are several resources you might find helpful when planning your organization’s AD structure
and namespace, such as the Microsoft white papers that contain valuable information about AD
design and architectural concepts, including “Active Directory Architecture” and “Domain Upgrades
and Active Directory.” These and others technical documents related to AD can be found on
Microsoft’s Web site at http://www.microsoft.com/windows2000/server.

Organizational Units
An OU is a special container object that is used to organize other objects—such as computers,
users, and printers—within a domain. OUs can contain all these object types, and even other
OUs (this type of configuration is referred to as nested OUs). OUs are a particularly important
element of AD for several reasons. First, they provide the ability to define a logical hierarchy
within the directory without creating additional domains. OUs allow domain administrators to
subdivide their domains into discrete sections and delegate administrative duties to others. More
importantly, this delegation can be accomplished without necessarily giving the delegated
individuals administrative rights to the rest of the domain. As such, OUs facilitate the
organization of resources within a domain. Figure 1.4 shows an example of OUs within a
domain.

There are several models used for the design of OU hierarchies within domains, but the two most
common are those dividing the domain organizationally (for example, by business unit) or
geographically.

14
Chapter 1

Figure 1.4: OUs within a domain.

Broadly speaking, your OU structures should reflect the way you plan to delegate control over
your domain’s objects. If every object will be administered by one small group of administrators,
one OU might be all you need. If each office in your organization will be managed at least
somewhat independently (perhaps giving a local office administrator the ability to reset
passwords, for example), having one OU per office will facilitate your administrative model.

The Global Catalog


Because AD is the central component of a Windows network, network clients and servers
frequently query it. In order to increase the availability of AD data on the network as well as the
efficiency of directory object queries from clients, AD includes a service known as the GC. The
GC is a separate database from AD and contains a partial, read-only replica of all the directory
objects in the entire AD forest.

15
Chapter 1

Only Windows servers acting as domain controllers can be configured as GC servers. By default,
the first domain controller in a Windows forest is automatically configured to be a GC server
(this designation can be moved later to a different domain controller if desired; however, every
forest must contain at least one GC). Like AD, the GC uses replication in order to ensure updates
between the various GC servers within a domain or forest. In addition to being a repository of
commonly queried AD object attributes, the GC plays two primary roles on a Windows network:
• Network logon authentication—In native-mode domains (networks in which all domain
controllers have been upgraded to Win2K or later, and the domain’s functional level has
been manually set to the appropriate level), the GC facilitates network logons for AD-
enabled clients. It does so by providing universal group membership information to the
account sending the logon request to a domain controller. This applies not only to regular
users but also to every type of object that must authenticate to AD (including computers).
In multi-domain networks, at least one domain controller acting as a GC must be
available in order for users to log on. Another situation that requires a GC server occurs
when a user attempts to log on with a user principal name (UPN) other than the default. If
a GC server is not available in these circumstances, users will only be able to logon to the
local computer (the one exception is members of the domain administrators group, who
do not require a GC server in order to log on to the network).
• Directory searches and queries—With AD, read requests such as directory searches and
queries, by far tend to outweigh write-oriented requests such as directory updates (for
example, by an administrator or during replication). The majority of AD-related network
traffic is comprised of requests from users, administrators, and applications about objects
in the directory. As a result, the GC is essential to the network infrastructure because it
allows clients to quickly perform searches across all domains within a forest.

Although mixed-mode Win2K domains do not require the GC for the network logon authentication
process, GCs are still important in facilitating directory queries and searches on these networks and
should therefore be made available at each site within the network.

Domain Modes and Functional Levels


Let’s establish some correlations between Win2K domain “modes” and WS2K3 domain “functional
levels.” A Win2K domain in mixed mode is similar to a WS2K3 domain in the Win2K mixed functional
level—both allow for the presence of NT BDCs. Once all domain controllers are running at least Win2K, a
WS2K3 domain can be raised to Win2K native functional level; Win2K domains can be placed in their
native mode. Doing so enables new functionality such as universal security groups. When all domain
controllers are running WS2K3, the domain can be raised to WS2K3 functional level, and even more
features—all WS2K3-specific—become available, including enhanced replication capabilities.
WS2K3 forests have functional levels, too. In Win2K functional levels, forests can accommodate AD
domains of any kind. At the WS2K3 functional level, a forest can contain only domains also running in
their WS2K3 functional level.

16
Chapter 1

Physical Structure of AD
Thus far, our discussion of AD has focused on the logical components of the directory’s
architecture; that is, the components used to structure and organize network resources within the
directory. However, an AD-based network also incorporates a physical structure, which is used
to configure and manage network traffic.

Domain Controllers
The concept of a domain controller has been around since the introduction of NT. As is the case
with NT, a Win2K or WS2K3 domain controller is a server that houses a replica of the directory
(in the case of Win2K or WS2K3, the directory being AD rather than the NT SAM database).
Domain controllers are also responsible for replicating changes to the directory to other domain
controllers in the same domain. Additionally, domain controllers are responsible for user logons
and other directory authentication as well as directory searches.

Fortunately, Win2K and WS2K3 do away with NT’s restriction that converting a domain controller to a
member server or vice-versa requires reinstallation of the server OS. Servers may be promoted or
demoted to domain controller status dynamically (and without reinstallation of Windows itself) by
using the Dcpromo.exe domain controller promotion wizard.

At least one domain controller must be present in a domain, and for fault tolerance reasons it’s a
good idea to have more than one domain controller at any larger site (for example, a main office
www.netpro.com
or large branch office).

Directory Replication
As we’ve discussed, domain controllers are responsible for propagating directory updates they
receive (for example, a new user object or password change) to other domain controllers. This
process is known as directory replication, and can be responsible for a significant amount of
WAN traffic on many networks.
AD is replicated in a multi-master fashion between all domain controllers within a domain to
ensure greater availability of the directory and network as a whole. The term multi-master
indicates that multiple read/write copies of the database exist simultaneously on each domain
controller computer. Thus, each domain controller is effectively a peer of the other controllers,
and any domain controller can write directory updates and propagate those updates to other
domain controllers. This is in notable contrast to NT 4.0’s single-master PDC/BDC replication
topology wherein a single domain controller, the PDC, houses the only read/write copy of the
database; other domain controllers—BDCs—contain a read-only copy replicated from the PDC.

AD’s replication design means that different domain controllers within the domain may hold different
data at any given time—but usually only for short periods of time. As a result, individual domain
controllers may be temporarily out of date at any given time and unable to authenticate a logon
request. AD’s replication process has the characteristic of bringing all domain controllers up to date
with each other; this characteristic is called convergence.

17
Chapter 1

The Operations Masters


Although multi-master replication is a central feature of AD networks, the potential for collisions
and conflict between multiple servers makes this functionality inappropriate for some network
operations and roles. AD accommodates these special cases by electing specific domain
controllers to serve as operations masters (also referred to as flexible single master operations—
FSMOs) for each of these network roles. There are five different types of operations masters in
AD—two that are forest-specific and three that are domain-specific. AD automatically elects the
operation master servers during the creation of each AD forest and domain, assigning them to the
first domain controller installed into a new forest.
When you use the Active Directory Installation Wizard to create the first domain in a new forest,
all five of the FSMO roles are automatically assigned to the first domain controller in that
domain. In a small AD forest with only one domain and one domain controller, that domain
controller continues to own all the operations master roles. In a larger network, whether with one
or multiple domains, you can re-assign these roles to one or more of the other domain
controllers. The following list highlights the two forest-wide operations master roles:
• Schema master—The domain controller that serves the schema master role is responsible
for all updates and modifications to the forest-wide AD schema. The schema defines
every type of object and object attribute that can be stored within the directory.
Modifications to a forest’s schema can only be done by members of the Schema
Administrators group, and can be done only on the domain controller that holds the
schema master role.

This is not to say that schema changes require physical access to the domain controller holding the
schema master role; AD’s administrative tools are smart enough to seek out the schema master and
connect to it remotely when necessary.

• Domain naming master—The domain controller elected to the domain naming master
role is responsible for making changes to the forest-wide domain name space of AD. This
domain controller is the only one that can add or remove a domain from the directory or
add/remove references to domains in external directories.
The three domain-specific operations master roles are as follows:
• PDC emulator—If an AD domain contains non-AD-enabled clients or is a mixed-mode
domain containing NT BDCs, the PDC emulator acts as an NT PDC for these systems. In
addition to replicating the NT-compatible portion of directory updates to all BDCs, the
PDC emulator is responsible for time synchronization on the network (which is important
for Windows’ Kerberos security mechanism as well as some aspects of AD replication),
and processing account lockouts and client password changes.

Win2K and later clients synchronize their system clocks with the domain controller that authenticates
them to the domain. Domain controllers synchronize their time with the domain’s PDC emulator. The
PDC emulators in child domains synchronize their time with the PDC emulators of their parent; the
forest root domain’s PDC emulator should be configured to synchronize with some authoritative
external time source, such as the US Naval Observatory’s atomic clock.

18
Chapter 1

• RID master—The RID (relative ID) master allocates sequences of RIDs to each domain
controller in its domain. Whenever a domain controller creates an object such as a user,
group, or computer, that object must be assigned a unique security identifier (SID). A
SID consists of a domain security ID (this ID is identical for all SIDs within a domain)
and a RID. When a domain controller has exhausted its internal pool of RIDs, it requests
another pool from the RID master domain controller.
• Infrastructure master—When an object in one domain is referenced by an object in
another domain, it represents the reference by the Globally Unique Identifier (GUID), the
SID (for objects that reference security principals), and the DN of the object being
referenced. The infrastructure master is the domain controller responsible for updating an
object’s SID and DN in a cross-domain object reference. The infrastructure master is also
responsible for updating all inter-domain references any time an object referenced by
another object moves (for example, whenever the members of groups are renamed or
changed, the infrastructure master updates the group-to-user references). The
infrastructure master distributes updates using multi-master replication.

Except where there is only one domain controller in a domain, never assign the infrastructure master
role to the domain controller that is also acting as a GC server. If you use a GC server, the
infrastructure master will not function properly. Specifically, the effect will be that cross-domain object
references in the domain will not be updated. In a situation in which all domain controllers in a domain
are also acting as GC servers, the infrastructure master role is unnecessary because all domain
controllers will have current data.

Because the operations masters play such critically important roles on a Windows network, it’s
essential for proper network operation that all the servers hosting these roles are continually
available.

Sites
The final, and perhaps most important component of AD’s physical structure, is a site. Sites
allow administrators to define the physical topology of a Windows network, something that
wasn’t possible under NT. Sites can be thought of as areas of fast connectivity (for example,
individual office LANs), but are defined within AD as a collection of one or more IP subnets.
When you look at the structure of IP, this begins to make sense—different physical locations on
a network are typically going to be connected by a router, which in turn, necessitates the use of
different IP subnets on each network. It’s also possible to group multiple, non-contiguous IP
subnets together to form a single site.
So why are sites important? The primary reason is that the definition of sites makes it possible
for AD to gain some understanding of the underlying physical network topology, and tune
replication frequency and bandwidth usage accordingly (under NT, this could only be done via
manual adjustments to the replication service). This “intelligence” conferred by the knowledge
of the network layout has numerous other benefits. For example, it allows AD-enabled
computers hosting users who are logging on to the network to automatically locate their closest
domain controller and use that controller to authenticate, rather than crossing the WAN to do so.
In a similar fashion, sites give other components within a Windows network new intelligence.
For example, a client computer connecting to a server running the Distributed File System (Dfs)
feature in Windows can use sites to locate the closest Dfs replica server.

19
Chapter 1

It’s important to remember that sites are part of the physical structure of AD and are in no way
related to the logical constructs we’ve already discussed, such as domains and OUs. It’s possible
for a single domain to span multiple sites, or conversely, for a single site to encompass multiple
domains. The proper definition of sites is an essential aspect of AD network design planning.

For sites that house multiple domains (for example, an organization that divides business units into
domains rather than OUs, thus hosting multiple business unit domains on a single site), it’s important
to remember to place at least one, and possibly two, domain controllers for each domain that users
will authenticate to from that site (because users can only authenticate to a domain controller from
their domain). This outlines the biggest disadvantage of the business unit domain model: the potential
for requiring many domain controllers at each and every site.

AD’s Backbone: DNS


The TCP/IP network protocol plays a far larger role in Win2K and WS2K3 than with previous
versions of NT. Although other legacy protocols such as IPX continue to be supported, most of
the internal mechanics of modern Windows networks and AD are based on TCP/IP.
In Windows, as with all TCP/IP-based networks, the ability to resolve names to IP addresses is
an essential service. A bounded area within which a given name can be resolved is referred to as
a namespace. In NT-based networks, NetBIOS is the primary namespace and WINS is the
primary name-to-IP address resolution service. With Win2K and later, Microsoft has abandoned
the use of NetBIOS as the primary network namespace and replaced it with DNS, which is also
used on the Internet (although Win2K and later clients continue to support WINS as a secondary,
backward-compatible name resolution mechanism)
Like AD, DNS provides a hierarchical namespace. Both systems also make use of the word
domains, although they define them somewhat differently. Computer systems (called “hosts”) in
a DNS domain are identified by their fully qualified domain name (FQDN), which is formed by
appending the host’s name to the domain name within which the host is located. Multi-part
domain names (that is, domains that are several levels deep in the hierarchy of the DNS
namespace) are listed with most important domain division (.com, .org, .edu, and so on) at right
and the least important—the host name—at left. In this way, a host system’s FQDN indicates its
position within the DNS hierarchy. For example, the FQDN of a computer named mercury
located in the domain realtimepublishers.com would be mercury.realtimepublishers.com.
Although it is possible to incorporate a DNS namespace within an NT network for name-to-IP
address resolution, the use of DNS is optional and mainly of interest to enterprises running
Internet-based applications or in heterogeneous environments. However, DNS plays a far more
critical role in AD. In Win2K and later networks, DNS replaces NetBIOS as the default name
resolution service. In addition, AD domains use a DNS-style naming structure (an AD domain
might have a name such as santarosa.realtimepublishers.com or mydomain.net), which means
that the namespace of AD domains is directly tied to that of the network’s DNS namespace.

This namespace duplication may be limited to the internal DNS namespace for companies using the
Microsoft-recommended configuration of separate DNS configurations for the internal LAN and the
Internet. It is possible, however, to use a “split-brain” DNS design in which a publicly resolvable DNS
name—such as realtimepublishers.com—is used for both the external and internal namespaces,
without exposing internal DNS servers to the public Internet.

20
Chapter 1

Finally, AD uses DNS as the default locator service; that is, the service used to convert items
such as AD domain, site, and service names to IP addresses. It’s important to remember that
although the DNS and AD namespaces in a Windows network are identical in regards to domain
names, the namespaces are otherwise unique and used for different purposes. DNS databases
contain domains and the record contents (host address/A records, server resource/SRV records,
mail exchanger/MX records, and so on) of the DNS zone files for those domains, whereas AD
contains a wide variety of different objects including domains, OUs, users, computer, and Group
Policy objects (GPOs).
Another notable connection between DNS and AD is that Windows DNS servers can be
configured to store their DNS domain zone files directly within AD rather than in external text
files. Although DNS doesn’t rely on AD for its functionality, the converse is not true: AD relies
on the presence of DNS for its operation.
Windows includes an implementation of Dynamic DNS (DDNS—defined by Request for
Comment—RFC—2136) that allows AD-enabled clients to locate important network resources,
such as domain controllers, through special DNS resource records called SRV records. The
accuracy of these SRV records is therefore critical to the proper functioning of a Windows
network (not to mention the availability of the systems and services they reference).

Introduction to AD and Windows Monitoring


As you’ve already learned, Windows introduces a number of new and important infrastructure
components that does not exist in NT networks. As a result, ensuring the health and availability
of your Windows servers means that you will need to account for these additional components in
your network monitoring routine. Monitoring will provide early warning indicators that will help
mitigate the risk of loss associated with network downtime.
The network monitoring procedures employed by most organizations tend to fall into one of the
following categories:
• Limited or no proactive monitoring procedures in place—Unfortunately, IT departments
in some organizations are purely reactive when it comes to network infrastructure
problems, and either do not regularly monitor critical network resources and components
or have limited monitoring in place. Some may conduct regular reviews of server event
logs or generate reports based on these logs, but because such information is delivered in
an on-demand fashion, it is of diminished value when compared with the information
provided by real-time monitoring systems. These organizations will be at high risk of
downtime and financial loss in a Windows environment.

21
Chapter 1

• Existing monitoring procedures in place using home-grown or built-in tools—A second


category is where the need for proactive network monitoring is recognized but has been
implemented by the organization using basic, low-cost tools. This includes tools such as
Event Viewer and Performance Monitor, resource kit utilities (NLTEST, BROWMON,
NETDOM, DOMMON, DATALOG, REPADMIN, REPLMON, DFSCHECK, and
similar utilities), and freeware/shareware utilities (utilities that test machine and service
availability using PING, NT service status queries, and queries to well-defined ports such
as DNS, HTTP, and FTP) Although all of these tools can be helpful in ensuring network
health, many require high levels of attention from administrators and suffer from
significant limitations when it comes to scalability and identifying the various types of
problems that may exist on the network.
• Existing monitoring procedures in place with full-featured network monitoring tools—
The third category is organizations with network monitoring routines built on
sophisticated, full-featured network monitoring software. In addition to many of the basic
services provided by the tools that come with Windows, the resource kits, and
freeware/shareware utilities, these utilities typically include intelligent scripting to
provide sophisticated testing as well as corrective actions in the event of failure. In
addition, many network-monitoring tools include a knowledge base that helps
administrators understand why a problem is happening and offer suggestions as to how to
resolve it. For organizations running large or multi-site Windows networks, this type of
tool is highly recommended.
For administrators of networks that have existing monitoring tools and procedures, the migration
to Win2K and WS2K3 will mainly involve an upgrade of existing tools and staff knowledge
about the vulnerabilities of the new environment. Administrators familiar with Win2K will find
WS2K3 to be remarkably similar, as it represents a more evolutionary set of changes to
Windows (compared with the revolutionary changes seen between NT and Win2K). However, if
your organization has employed a more reactive stance (fix it only when it breaks) with regards
to resolving network problems, you’ll quickly find that this methodology can be especially
troublesome in a Win2K or WS2K3 environment.

22
Chapter 1

Although it is true that newer versions of Windows provide a far greater level of reliability and
performance than its predecessors, it also involves a higher number of “moving parts” and
dependencies that need to be accounted for. For example, newer versions of Windows have an
integrated Web browser, integrated media player, integrated Web server, additional networking
services and tools, and so forth. Although legacy NT networks have their own set of
dependencies and vulnerabilities, they are far fewer in number due to NT’s simpler (and less
capable) network architecture. Let’s quickly review the primary monitoring considerations in an
NT environment:
• PDC availability and performance—Due to the single-master nature of NT domains,
there is a high dependence (and thus, a high availability requirement) on the PDC of each
NT domain. Although BDCs exist to create fault-tolerance and load-balancing for client
logon authentication, an NT domain without a PDC essentially grinds to a halt until the
PDC is brought back online or replaced via the manual promotion of a BDC to PDC
status by a network administrator. In addition, network logon traffic loads on domain
controllers should be monitored to assess domain controller performance and the ability
to respond to client network logon authentication requests within an acceptable period of
time.
• Domain trust relationships—On multi-domain NT networks, there typically exists a
complex array of trust relationships between domains in order to accommodate network
access requirements for the business. NT trust relationships (formed between domain
controllers) are notoriously fragile and prone to failure, and thus require continual
monitoring and testing in order to assure the availability of network resources to users.
• Name servers—Another aspect of NT networks requiring continual monitoring is the
availability of network name servers. For the majority of NT-based networks (including
those with Windows 95/2K/XP clients), NetBIOS is the predominant namespace and
Windows Internet Name Service (WINS) the predominant name-to-IP address resolution
service. WINS databases and replication are also notoriously fragile elements of NT
networks, and must be regularly monitored to ensure their functionality. Even for
networks using DNS as the primary name resolution service, the availability of the DNS
name servers is equally important as it is with WINS.

23
Chapter 1

• Network browser service—NT, Windows 9x, and other members of the Windows
product family rely on a network browsing service to build lists of available network
resources (servers, shared directories, and shared printers). The architecture of this
service, which calls for each eligible network node to participate in frequent elections to
determine a browse master and backup servers for each network segment, is another
infamously unreliable aspect of Microsoft networks and requires frequent attention and
maintenance.
• Other critical services and applications—In addition to name resolution services such as
WINS and DNS, NT environments may house other mission-critical services required for
proper operation of the network or the business in question. For example, critical
applications such as backup, antivirus, mail, FTP, Web, and database servers should be
polled using intelligent service-level queries to verify that they are functioning properly
and at acceptable levels of performance.
• Basic network and system metrics—All networks, NT or otherwise, should be monitored
to protect against problems stemming from resource allocation problems on individual
servers or the network itself. For example, any good network monitoring regimen will
include the monitoring of CPU, memory, disk space resource usage, and network
connectivity and bandwidth usage on all critical servers.

AD, Win2K, and WS2K3 Monitoring Considerations


A functioning, modern Windows network is a complex mesh of relationships and dependencies
involving a variety of different systems and services, including AD, DNS, the GC, and
operations master servers. Running an effective Windows network means having a handle of
every aspect of your network environment at all times.
It’s no surprise that the primary monitoring consideration in Windows is AD and its related
services and components. This includes responsiveness to DNS and LDAP queries, AD inter-site
and intra-site replication, and a special Windows service called the Knowledge Consistency
Checker (KCC). In addition, the health and availability of services such as DNS, the GC, and Dfs
are also important.

The KCC is a special Windows service that automatically generates AD’s replication topology and
ensures that all domain controllers on the network participate in replication.

However, knowing what metrics to monitor is only a first step. By far, the most important and
complex aspect of monitoring network health and performance isn’t related to determining what
to monitor but rather how to digest the raw data collected from the array of metrics and make
useful determinations from that data. For example, although it would be possible to collect data
on several dozen metrics (via Performance Monitor) related to AD replication, simply having
this information at hand doesn’t tell you how to interpret the data or what you should consider
acceptable tolerance ranges for each metric. A useful monitoring system not only collects raw
data but also understands the inter-relation of that data and how to use the information to identify
problems on the network. This kind of artificial intelligence represents the true value of network
monitoring software.

24
Chapter 1

In order to ensure the health and availability of AD as well as other critical Windows network
services, organizations will need to regularly monitor a number of different services and
components, which are listed in Table 1.2.
Category Potential Problems
Domain controllers/AD Low CPU or memory resources on domain controllers
Low disk space on volumes housing the Sysvol folder, the AD
database (NTDS.DIT) file, and/or the AD transactional log files
Slow or broken connections between domain controllers
Slow or failed client network logon authentication requests
Slow or failed LDAP query responses
Slow or failed Key Distribution Center (KDC) requests
Slow or failed AD synchronization requests
NetLogon (LSASS) service not functioning properly
Directory Service Agent (DSA) service not functioning properly
KCC not functioning properly
Excessive number of SMB connections
Insufficient RID allocation pool size on local server
Problems with transitive or external trusts to Win2K or down-level
NT domains
Low AD cache hit rate for name resolution queries (as a result of
inefficient AD design)
Replication Failed replication (due to domain controller or network connectivity
problems)
Slow replication
Replication topology invalid/incomplete (lacks transitive
closure/consistency)
Replication using excessive network bandwidth
Too many properties being dropped during replication
Update Sequence Number (USN) update failures
Other miscellaneous replication-related failure events
GC Slow or failed GC query responses
GC replication failures
DNS Missing or incorrect SRV records for domain controllers
Slow or failed DNS query responses
DNS server zone file update failures
Operation masters (FSMOs) Inaccessibility of one or more operation master (FSMO) servers
Forest or domain-centric operation master roles not consistent
across domain controllers within domain/forest
Slow or failed role master responses
Miscellaneous problems Low-level network connectivity problems
TCP/IP routing problems
DHCP IP address allocation pool shortages
WINS server query or replication failures (for legacy NetBIOS
systems and applications)
Naming context lost + found items exist
Application or service failures or performance problems

Table 1.2: Common problems in AD-based Win2K networks.

25
Chapter 1

Change Monitoring and Auditing


In addition to monitoring and troubleshooting problems within the Windows network
infrastructure, another distinct advantage of monitoring software is the ability to monitor and
audit changes made to the AD database. In many organizations, there may be dozens or even
hundreds of administrators making daily changes to AD. In order to manage the potential chaos
this situation presents, it’s essential that a system be in place to identify all recent changes made
to objects within the directory, and to be able to ascertain who did what—and when. Examples of
the types of changes that you might want to track include changes to the AD schema, OUs,
contacts, computers, and printers as well as directory recovery actions taken by administrators
(for example, a site administration restoring AD on a local domain controller). In today’s
security-sensitive environments, this form of change management and auditing may even be a
regulatory requirement.
One problem faced by administrators is Windows’ relatively poor built-in monitoring and
auditing tools, which make it difficult to assemble a cohesive change management and auditing
solution. Commercial tools—including those from Microsoft as well as third-party tools from
companies such as NetPro, NetIQ, Winternals Software, and more—help to fill the gap left by
Windows’ built-in capabilities, offering varying features to capture, log, and audit changes that
occur within the directory.

Problem Resolution, Automation, and Alerting


Monitoring and troubleshooting critical network infrastructure components is an important
starting point, but it is by no means the only proactive measure that you can take to increase the
availability of your network. Good network monitoring software provides a wide assortment of
alerting options, such as console alerts, network pop-up messages, event log entries, email alerts,
pager notifications, and Simple Network Management Protocol (SNMP) traps.
In addition to providing problem identification and alerting features, many third-party products
offer automatic problem resolution features. For example, it is possible to configure many
products to take specific corrective actions when a problem is detected, such as restarting a
particular service when it is found to be unresponsive. Many tools use scripting and/or the ability
to call external utilities to accomplish these tasks. The most comprehensive utilities base their
decisions on rule sets derived from an internal database and/or intelligent escalation routines that
emulate what an administrator might do. For example, you might configure a system such that on
the first failure of a given service, that service is restarted; the computer is restarted in the event
that the service restart fails; a different machine is promoted to replace that system in the event
that the computer restart attempt fails, and so on.

26
Chapter 1

Other Considerations
There are several considerations you should keep in mind when creating a Windows network
monitoring and troubleshooting solution. One is the overall architecture of the application(s)
being used in the solution. It’s important to understand how the product collects its data and what
impact this collection will have on your network and servers. For example:
• Does the product employ local agents to gather metrics or does it use remote queries?
• Do throttling features exist to control network bandwidth and system resource usage?
• Is there a machine/site/domain hierarchy that allows data to be passed to the central
collection database in an efficient manner?
• Does the product provide Web-based management?
All of these questions are important because the answers can have a significant impact on your
network environment and your overall satisfaction with the product.
Another differentiating feature about network monitoring software packages is whether they
provide a support knowledge base of common problems and solutions. This kind of knowledge is
invaluable from both a technical and financial standpoint because it serves to reduce the learning
curve of the supporting IT staff as well as the amount of time and money administrators must
expend researching and resolving problems. Some utilities augment this feature by allowing
administrators to add their own experiences to the knowledge base or a problem tracking and
resolution database, thereby leveraging internal IT staff expertise and creating a comprehensive
problem resolution system.
Organizations facing regulatory compliance issues may also seek software that provides specific
functionality for their specific regulatory issues. For example, tools providing real-time change
auditing and efficient, securable logs and databases may be more useful than tools that provide
batch notification of changes, simple text files instead of a robust database, or easily manipulated
logs that are subject to untraceable tampering. A final feature provided by some applications, and
one that may be of interest to IT shops engaged in SLAs, is the ability to generate alerts and
reports that address exceptions to, or compliance with, SLA obligations.

Summary
Although AD represents a quantum leap forward in the NT product line, it also introduces new
levels of network infrastructure complexity that must be properly managed in order to maintain
an efficient and highly available network. Real-time, proactive monitoring and management of
AD and other critical services is an essential part of managing Windows-based networks. In this
chapter, we discussed the most important features and components of Windows and AD, their
roles within the enterprise, differences between managing NT 4.0-based networks and Win2K or
WS2K3 AD-based networks, and some of the basic metrics and statistics that modern Windows
network administrators need to watch to help them ensure high availability on their networks. In
the remaining chapters of this guide, we’ll drill down and explore each of the vital areas of AD
and Windows networks in detail, providing the information, tools, and techniques you’ll need to
employ to maintain a healthy and highly available Windows network.

27
Chapter 2

Chapter 2: Designing an Effective Active Directory


The main function of AD is to allow the network resources for Windows to be identified and
accessed. AD accomplishes this goal by providing a single namespace where users and
applications can go to register and gain access to the information they need. For example, a file
server can list available shared folders in AD, and a print server can list available shared printers.
In this way, AD acts as a sort of Yellow Pages of available resources and services; thus, it is
referred to as a directory. AD can be set up and designed in many ways to meet the needs of
users and administrators. It’s your job as an administrator to properly set up and design your AD
for maximum efficiency.
The best way to troubleshoot AD problems is to avoid problems in the first place. To do so, you
need to start with an effective design. The design of AD not only includes the layout of the
forests, trees, domains, and organizational units (OUs) but also the site and site links that
represent the physical network.
In this chapter, I’ll give you a solid understanding of how to design AD for your environment
and network. Because the information in AD can be distributed across the network, there may be
unique aspects of your design and implementation that apply only to your site. Regardless of
these customizations, my goal is to give you enough information to ensure that the design serves
the needs of your users and administrators. Also keep in mind that AD design goals and
techniques have changed over the past few years since AD was introduced. If you already have
an AD environment, reviewing this information may help you make better decisions for future
growth or for design changes. I’ll focus on the design features of AD 2003, which is included
with Windows Server 2003 (WS2K3), and point out specific points that are appropriate to a
Win2K environment or to a mixed-version environment (which must typically operate in a
Win2K backward-compatible mode).

AD’s Logical and Physical Structures


As I mentioned in Chapter 1, AD has internal structures that can be categorized as logical and
physical. These structures are the building blocks you’ll use to design and build your AD service.
Your challenge is to understand each building block and use it to build an efficient AD. The
concepts behind these structures are sometimes complex, but understanding them and using them
correctly are the keys to a good design.

28
Chapter 2

Logical Structures
Table 2.1 provides a list of logical structures used in AD.
Logical Structure Description
Namespace AD is a namespace because it resolves an object’s name to the object
itself
Naming context Represents a contiguous subtree of AD
Organizational Unit A container object that allows you to organize your objects and resources
Domain A partition in AD that provides a place to group together users, groups,
computers, printers, servers, and other resources
Tree A grouping of domains that have a parent-child relationship with one
another
Forest A collection of one or more trees
Trust relationship A logical connection between two domains that forms one administrative
unit
Global catalog A central source for AD queries for users and other objects

Table 2.1: The logical structures of AD, which are used to design and build the object hierarchy.

Two important logical structures that you need understand to design an AD are the namespace
and naming context. Although these two concepts seem similar, they’re actually different. To
help you understand how they differ, I’ll give you a quick overview of each. These structures are
also discussed throughout the chapter.

Namespace
Another term for a directory is namespace. A namespace refers to a logical space in which you
can uniquely resolve a given name to a specific object in the directory. AD is a namespace
because it resolves a name to the object name and the set of domain servers that stores the object
itself. Domain Name System (DNS) is a namespace because it translates easy-to-remember
names (such as www.company.com) into an IP number address (for example, 124.177.212.34).

AD depends on DNS and the DNS-type namespace that names and represents the domains in the
forest. It’s important to design your domain tree in a DNS-friendly way and to provide clients with
reliable DNS services. Although AD uses DNS to create its structure, DNS and AD are totally
separate namespaces.

One way to think about a namespace in non-technical terms is to compare it with a phone book.
The phone book for Las Vegas, Nevada is only capable of resolving phone numbers for names in
the Las Vegas area. In other words, you can’t use it to look up a phone number for New York
City. Therefore, the namespace of the directory is said to be Las Vegas. If you wanted to look up
phone numbers in New York, you would need to obtain a directory for that namespace.

29
Chapter 2

Naming Context
The naming context represents a contiguous subtree of AD in which a given name is resolved to
an object. If you look at the internal layout of AD, you see a structure that looks similar to a tree
with branches. If you expand the tree, you see the containers, the objects that reside in them, and
the attributes associated with the objects. In AD, a single domain controller always holds at least
three naming contexts.
• Domain—Contains the object and attribute information for the domain of which the
domain controller is a member
• Configuration—Contains the rules for creating the objects that define the logical and
physical structure of the AD forest
• Schema—Contains the rules for creating new objects and attributes.

Physical Structures
In addition to the logical structures in AD, several physical structures help you implement the
logical structures on your network. Table 2.2 describes these physical structures.
Physical Structure Description
Object and attributes An object is defined by the set of attributes or characteristics assigned
to it. Objects include users, printers, servers, groups, computers, and
security policies.
Domain controller A domain controller is a network server that hosts the AD service in a
domain. Many computers can belong to a domain without being a
domain controller, but only domain controllers actually run the software
that makes AD operate. All members of a domain must contact a
domain controller in order to work with the domain.
Directory server role A server that takes the role of Flexible Single Master Operation
(FSMO). Directory server roles are single-master servers that perform
special roles for AD, such as managing domains, managing schemas,
and supporting down-level clients (Windows NT clients, for example).
Site A location on the physical network that contains AD servers. A site is
defined as one or more well-connected Transmission Control
Protocol/Internet Protocol (TCP/IP) subnets.
Global Catalog (GC) server Stores the GC information for AD.

Table 2.2: The physical structures of AD, which are used to implement the logical directory structures on the
network

30
Chapter 2

Designing AD
Your primary objective in designing AD is to build a system that reflects the network resources
in your company. You need to arrange the forest and trees to reflect the location and placement
of your network resources. You need to design the domains and OUs to implement an
administrative and security structure for both users and administrators. When designing the
layout of AD, you also need to design the users’ groups and security policies as well as the
administrative methods that will be used.
From the list of logical and physical structures that you have to work with, four structures are
critical to the design of AD: forests and trees, domains, OUs, and sites. The process of designing
and implementing each of these four structures builds on the previous one. Implementing these
structures properly is crucial to a successful design. Design your AD structure in the following
order:
• Design the forest and trees
• Design the domains for each tree
• Design the OUs for each domain
• Design the sites for the forest and domains
In the next four sections, I’ll describe how to design each of these main structures.

Designing the Forest and Trees


A forest is a collection of one or more trees. A forest can also be a set of domain trees that
doesn’t form a common naming context. For example, a forest can contain a domain named
braincore.net and a domain named netpro.com—two domains with different namespaces. The
trees in a forest share the same directory schema and configuration but don’t need to share the
same namespace. Figure 2.1 illustrates how two companies named company1.com and
company2.com form a single forest.

Figure 2.1: Two organizations named company1.com and company2.com can form a forest in AD.

31
Chapter 2

The forest serves two main purposes. First, it simplifies workstation interaction with AD because
it provides a GC through which the client can perform all searches. Second, the forest simplifies
administration and management of multiple trees and domains. A forest has the following key
characteristics and components:
• Global schema—The directory schema for the forest is a global schema, meaning that the
schema is exactly the same for each domain controller in the forest. The schema exists as
a naming context and is replicated to every domain controller. The schema defines the
object classes and the attributes of object classes. In other words, every domain within the
same forest will share the same schema, giving them all access to the same classes and
attributes. This feature is especially important if you plan to deploy schema-altering
applications such as Microsoft Exchange Server, because every domain in the forest will
be updated to have the new Exchange classes and attributes—even if only one of those
domains will actually contain Exchange servers.
• Global configuration container—The configuration container exists as a naming context
that is replicated to every domain controller in the forest. Thus, it’s exactly the same
across the domain controllers in the forest. The configuration container contains the
information that defines the structure of the forest. This information includes the
domains, trust relationships, sites, site links, and the schema. By replicating the
configuration container on every domain controller, each domain controller can reliably
determine the structure of the forest, allowing it to replicate to the other domain
controllers.
• Complete trust—AD automatically creates bi-directional transitive trust relationships
among all domains in a forest. This relationship allows the security principals, such as
users and groups of users, to authenticate from any computer in the forest. However, such
is only the case if the users’ access rights have been set up correctly. This concept is
important: Trusts do not instantly confer access permissions. For example, a user in
Domain A cannot immediately access resources in Domain B just because the two
domains are in the same forest. The forest simply makes such access possible, enabling
an administrator to select the Domain A user’s account when assigning permissions to the
resources in Domain B.
• GC—The GC contains a copy of every object from every domain in the forest. However,
it only stores a select set of attributes from the objects; that subset is referred to as the
universally interesting information for the objects. By default, the GC isn’t placed on
every domain controller in the forest; instead, you determine which domain controllers
should hold a copy. The purpose of the GC is to provide a sort of cross-domain lookup
service.
To re-use the phone book analogy, imagine that you’re holding a Las Vegas phone book
and need to look up a number for a New York resident. The phone book you have might
contain a GC section, which lists a number for New York Directory Assistance. This
reference allows you to contact a directory in that other, New York namespace, to handle
your query. More specifically, the GC as implemented in AD will list each name
available in New York, and refer you to a New York directory for the number. In other
words, the GC is aware of every object in the forest and knows where to go to find more
information about each object.

32
Chapter 2

The other attribute shared across the entire forest is the built-in Enterprise Admins group.
Members of this group have the highest possible level of administrative control over every
domain within the forest. Whoever belongs to this group, then, must be trusted by every domain
within the forest to manage the entire forest and all of its domains. There are occasions, however,
in which no one group of users can be given this trust, in which case two or more forests will be
necessary.
For example, consider an organization that has just acquired another company. If the two forests
are merged into a single forest, all the users can view the entire AD. However, the forests might
not be merged because the two autonomous administrative groups might not agree on how to
manage the forest. The winner of this dispute depends on your priority: Do your users have a
higher priority than your administrators?
If the administrators win, the users inherit two forests and no longer have a single, consistent
view of AD. Each administrative group manages its own forest independently. This scenario is
common and is the reason that the forest is often referred to as AD’s ultimate security
boundary—because you can create a strict border between the two forests but cannot create a
border within a forest because of the built-in Enterprise Admins group.
The answer to the administrator vs. user priority question also depends on which type of
organization your company is. If it isn’t important for the users to have a consistent view of AD,
it might be appropriate to have multiple forests with separate administrators. For example,
consider an application service provider (ASP) company, which hosts AD services on behalf of
other companies. The users from those companies have no reason to view the host company’s
information. In addition, each administrative group wants its independence.
WS2K3 introduces a new capability called the forest trust. As with the domain trusts present
within a forest, a forest trust makes it possible for accounts in one trusted forest to be granted
access to resources in another, trusting forest. However, unlike the automatic, two-way,
transitive trusts within a domain tree or forest, inter-forest trusts must be manually created and
are one-way. In other words, the administrators in Forest A could decide to grant access to Forest
B user accounts, but the reverse will not be true unless the Forest B administrators implement
their own trust to Forest A. This inter-forest trust capability is only possible between forests
running at the WS2K3 functional level, which means all domain controllers within the forest
must be running WS2K3 and all domains within the forest must be running at the WS2K3
functional level as well.
The Types of Trusts
Remember that WS2K3 offers a variety of trust relationships. There are, for example, the automatic, two-
way, transitive trusts that exist between parent and child domains in a domain tree. There are also one-
way, non-transitive external trusts that you can create between, for example, a WS2K3 domain and an
NT domain. Forest trusts are similar to these external trusts, although they are WS2K3-specific. You can
also establish external trusts with UNIX Kerberos realms, a domain-like structure that exists in UNIX
systems, by using the Kerberos authentication protocol.

33
Chapter 2

Determining the Number of Forests


When determining the number of forests for your company, consider the requirements of the
organization itself. In smaller, centrally managed organizations, you typically need one forest.
However, if the company is large and has multiple locations in one or multiple countries, you
may need multiple forests. To properly determine the number of forests for your company, you
need to understand the maintenance and overhead of having one forest compared with having
multiple forests.
An environment with a single forest is simple to create and maintain. All users view one AD by
using the GC. Maintaining a single forest is easy because you need to apply configuration
changes only once to affect all the domains in the forest. For example, when you add a domain to
the forest, all the trust relationships are set up automatically. In addition, the new domain
receives any additional changes made to the forest.
When deciding on the number of forests you need, remember that a forest has shared elements:
the schema, configuration container, and GC. Thus, all the administrators need to agree on the
content and management of forests and their elements. Managing these elements becomes more
complicated when you add a forest because it incurs a management cost. The following brief list
highlights many of the management issues surrounding multiple forests:
• Each additional forest must contain at least one domain, domain controller, and someone
to manage the forest. Practically, you should have at least two domain controllers for
redundancy.
• Each additional forest creates a schema. Maintaining consistency among schemas is
difficult and creates overhead.
• Each additional forest creates a configuration container. Maintaining consistency among
configuration containers when the network configuration changes is difficult and creates
overhead.
• If you want user access among forests, you must create and maintain explicit one-way
trusts for every relationship you establish. If every domain and forest is WS2K3, you can
create one trust between the forests; otherwise, you must create direct domain-to-domain
trusts, which can result in a large number of trust relationships that must be monitored.
• Users who want access to resources outside of their forest need to make explicit queries;
this task is difficult for the ordinary user because the syntax for doing so isn’t
straightforward and the Windows user interface (UI) doesn’t make this function readily
accessible. Ideally, then, users should be able find the vast majority of the resources they
need within their own forest.
• Any synchronization of components among multiple forests must be performed manually
or by using a metadirectory service or other synchronization solution.
• Users cannot easily access the network resources contained in other forests.

34
Chapter 2

One situation in which you might consider managing multiple forests occurs when two organizations
merge or participate in a joint venture. This merger puts the administration of your network into the
hands of two autonomous groups. For this reason, multiple forests are typically more costly to
manage. To reduce this cost, organizations such as partnerships and conglomerates need to form a
central group that can drive the administrative process. In contrast, in short-lived organizations such
as joint ventures, it might not be realistic to expect administrators from each organization to
collaborate on forest administration.
There is another good example of a situation in which multiple forests may be required. Many
enterprise organizations elect to maintain separate, parallel AD forests for testing purposes. Other
organizations maintain multiple forests because they have a disjointed organizational structure with
no common infrastructure among business units. Although you’ll certainly want to keep your network
and AD design as simple as possible, your forest structure should follow the organizational,
administrative, and geographical structure of your organization.

Setting Up and Managing Multiple Forests


Setting up and managing two forests might be necessary if two organizations in a company don’t
trust one another or cannot agree on administrative policies. For example, suppose that a
company has two locations in cities in different countries—New York and London. Each
location has its own administrative group, which needs to manage its network resources
according to its own policies. In this case, two different forests can be used to separate the
administrative requirements. The requirements of such a situation are the most common reason
for having multiple forests—and you need to understand that it is a purely business, or political,
reason; there are very few technical reasons to have multiple forests.
In other situations, such as creating a testing environment separate from the production
environment, your company might have multiple forests, but you want to have a central
administrative group. To set up central management of multiple forests, you need to add
administrators to the Enterprise and Schema Administration groups of each forest. Because there
is only one Enterprise and Schema Administration group per forest, you must agree on a central
group of administrators who can be members of these groups.
As mentioned previously, it’s difficult to manage user access between two or more forests. The
simplest method in any Win2K or mixed-version forest is to create explicit one-way trusts
among the domains that must trust one another. The one-way trust allows access only among the
domains in the direction in which the trust is set up. Figure 2.2 illustrates this approach of
connecting forests.

35
Chapter 2

Figure 2.2: Two forests can allow user access between domains by establishing explicit one-way trusts. Only
the domains connected by the trusts can allow access between the forests.

Explicit one-way trusts aren’t transitive; the one-way trusts in Win2K are the same as the one-
way trusts that exist in NT. Creating one-way trusts among multiple forests or trees can be
complicated, so it’s important to keep it simple by limiting the domains that trust one another.
In WS2K3, as I’ve mentioned, you have the option to create inter-forest trusts. This setup allows
all domains in the trusted forest to potentially access resources in all domains of the trusting
forest. This feature greatly simplifies access management across forest boundaries, although only
forests running at the WS2K3 functional level have this capability. Thus, every domain
controller in every domain must be running WS2K3, every domain must have been raised to the
WS2K3 functional level, and both forests must have been raised to the WS2K3 functional level.
If so much as one Win2K domain controller exists in a single domain in either forest, inter-forest
trusts are not an option.

Determining the Number of Trees


A tree is simply a grouping of domains with parent-child relationships. The domains that form a
tree are arranged hierarchically and share a common and contiguous namespace. Trees can be
viewed one of two ways. The first view is the trust relationships among domains. The second
view is of the namespace of the domain trees (see Figure 2.3).

36
Chapter 2

Figure 2.3: The namespace of the domain tree shows the hierarchical structure of the tree.

In a single forest, in which all domains trust one another, the tree relationship is defined by the
namespace that is necessary to support the domain structure. For example, the root domain called
company.com might have two subdomains (or child domains) named seattle.company.com and
chicago.company.com. The relationship between the root domain and the two child domains is
what forms the tree, or namespace.
In the previous section, I emphasized that multiple forests in an organization are generally not
recommended if there is a way to avoid it. However, there are situations in which multiple trees
are appropriate or even recommended. For one, multiple trees allow you to have multiple
namespaces that coexist in a single directory. Multiple trees give you additional levels of
separation of the namespaces—something that domains don’t provide. Although multiple trees
work well in most situations, I recommend that you start by creating one tree until the
circumstances arise that call for more. For example, if your company has a domain named
company.com, then launches a subsidiary named corporation.com, it might make sense to have
both domain trees in the same forest. Doing so would provide you with the necessary
namespaces, while maintaining the convenience of a single forest’s central administrative
structure, common schema, and so forth.
You might be wondering if there are any extraordinary benefits to having multiple trees. For
example, do multiple trees reduce the replication or synchronization that occurs among domain
servers? The answer is no, because the schema and configuration container are replicated to all
domain controllers in the forest. In addition, the domain partitions are replicated only among the
domain controllers that are in the domain. Having multiple trees doesn’t reduce replication.
Likewise, you may be wondering whether multiple trees cause any problems. For example, does
having multiple trees require you to establish and maintain explicit one-way trust relationships?
Again, the answer is no because the transitive trust relationships are automatically set up among
all domains in the forest. This trust includes all domains that are in separate trees but in the same
forest.

37
Chapter 2

Designing the Domains


The next task in planning AD is creating and designing the domains. A domain gives you a place
to group users, groups, computers, printers, servers, and other resources that belong together. In
addition, a domain is a security boundary for these objects, and it defines the set of information
that is replicated among domain controllers. The domain in AD works like the domain that exists
in NT.
A domain is a physical piece of AD that contains the object information. Figure 2.4 shows a
domain structure with its contents of network resources.

Figure 2.4: The domain structure is a piece of AD. It contains the users, groups, computers, printers, servers,
and other resources.

The purpose of domains is to logically partition the overall forest. Most large implementations of
AD need to divide the forest into smaller pieces. Domains enable you to partition AD into
smaller, more manageable units that you can distribute across your network servers.
Domains are the basic building blocks of AD. As you’ve seen, they can be connected to form
trees and forests; domains are connected by trust relationships, which are automatically
established and maintained within the forest. These trusts allow the users of one domain to
access the information contained in the other domains. When multiple domains are connected by
trust relationships and share a common schema, you have a domain tree. Every AD installation
consists of at least one domain tree, even if that tree has only a root domain.

38
Chapter 2

It’s your role as an administrator to decide the structure of domains and which objects, attributes,
groups, and computers are created. The design of a domain includes a determination of DNS
naming, security policies, administrative rights, and how replication will be handled. When you
design domains, follow these steps:
• Determine the number of domains
• Choose a forest root domain
• Assign a DNS name to each domain
• Partition the forest
• Place the domain controllers that will be used for fault tolerance and high network
availability
• Determine the explicit trust relationships that need to be established, if any

Determining the Number of Domains


I recommend that you start with one domain in your environment—even if there are two or more
physical locations; AD has a separate way of understanding your physical network structure, and
domains aren’t necessarily intended to reflect that physical structure. A single-domain design
keeps the layout of your domain simple and easy to maintain, and you can then add other
domains as needed. The mistake some people make is to initially create many domains, then not
know what to do with them. One domain will be adequate for many companies, especially
smaller ones.
Although one domain will work in most circumstances, other circumstances necessitate having
more than one domain for an entire organization. The following list highlights such situations;
you must decide which of these fit your needs:
• Administrative rights—If your organization has multiple administrative groups that want
some level of autonomy, you might need to create additional domains and give each
group its individual rights. For example, if two companies merge together, one group
might need to operate and maintain autonomous activities. Understand, however, that the
forest-wide Enterprise Admins group can always control every domain within the forest.
If one administrative group within your company needs absolute and total control over
their resources, only a separate forest will provide that level of autonomy.
• International setting—If your organization is international, you might need to create
additional domains to support other languages. (Administrators, users, and others may
need to access AD in their first language, and the schema contains language-specific
attribute display names.)

39
Chapter 2

• Replication traffic—Because the AD database can be distributed, you might want to


create additional domains to hold the distributed partitions. The need to create additional
domains typically arises when you have a single domain trying to replicate across wide
area network (WAN) links. If the replication is too slow, you can alleviate the problem
by splitting the domain into two. You can then place the two domains on each side of the
WAN so that they can be closest to the users. Understand, however, that individual
domains still replicate with one another. Specifically, the GC servers—which must exist
in each domain—replicate information about every domain in the forest. Thus, placing
domains on either side of a WAN link will reduce replication traffic across that link, but
will by no means eliminate it.
• Account security settings—Account security settings apply to the entire domain. Account
security settings include password length and expiration period, account lockout and
intruder detection, and Kerberos ticket policy. These settings cannot be changed in the
domain for individual OUs or groups. If you need to have unique settings, you will need
to create another domain.
• Preserve an existing NT domain—If your company already has an existing NT domain,
you might want to keep it instead of consolidating it into an AD domain. This
requirement could produce more domains than planned.
Determining the number of domains for your organization is an individual effort. No one can tell
you definitively how many domains to have and how to split them without knowing more about
your company’s organization and network. However, using these simple guidelines, you can
establish parameters that enable you to effectively design domains and determine the appropriate
number for your company. Be prepared to recognize that many of the reasons for having multiple
domains—like the reasons for having multiple forests—are primarily political or business-related
in nature, and not technical. Sometimes a single domain makes sense from almost every possible
technical standpoint (unlike NT domains, which had a technical limit on the number of objects
that could reside in the domain, for example), but multiple domains are still required for business
or political reasons.

Choosing a Forest Root Domain


The first domain that you create becomes the forest root domain (or root domain), which is the
top of the forest and the top of the first tree. The forest root domain is extremely important
because it determines the beginning of the namespace and establishes the forest. Because the AD
forest is established with the first domain, you need to make sure that the name of the root
domain matches the top level in the namespace. For example, root domains are domains with
names such as company.com and enterprise.com. These domain names are the roots of the DNS
structures and the root of AD. Any subsequent domains you create or add to the tree form the
tree hierarchy.

40
Chapter 2

The first domain you create in an AD forest contains two forest-wide groups that are important to
administering the forest: the Enterprise Administrators, which is often referred to as Enterprise
Admins, group and the Schema Administrators, or Schema Admins, group. Containing these two
groups makes the root domain special. You cannot move or re-create these groups in another
domain. Likewise, you cannot move, rename (at least not easily), or reinstall the root domain. In
addition to these groups, the root domain contains the configuration container, or naming
context, which also includes the schema naming context.
After you install the root domain, I recommend that you back up the domain often and do
everything you can to protect it. For example, if all the servers holding a copy of the root domain
are lost in a catastrophic event and none of them can be restored, the root domain is permanently
lost. The reason for this loss is that the permissions in the Enterprise Administrator and Schema
Administrator groups are also lost. There is no method for reinstalling or recovering the root
domain and its groups in the forest other than completely backing up and restoring it.

Always have at least two domain controllers in every domain, and always have at least two
geographic locations, if possible, containing domain controllers for each domain. If you have two
offices, try to have domain controllers from your root domain in every office. This setup might not be
possible in every situation due to replication traffic and availability of technical personnel at each
location, but it’s a worthy goal because it helps reduce the likelihood that a single catastrophic event
will destroy the root domain. At the very least, ensure that the root domain is backed up frequently
and that copies of the backup are maintained offsite.

Using a Dedicated Root Domain


As I described earlier, the first domain you create in AD becomes the forest root domain. For
smaller implementations of AD, you might only need to create the root domain, nothing more.

For more information about determining the number of domains, see “Determining the Number of
Domains” earlier in this chapter.

For a larger implementation with multiple locations around the world, however, you’ll probably
want to use a dedicated root domain. A dedicated root domain is a root domain that is kept small,
with only a few user account objects. Keeping the root domain small allows you to replicate it to
other locations at low cost (that is, with little impact on network usage and bandwidth). Figure
2.5 illustrates how you can replicate a dedicated root domain to the other locations on your
network.

41
Chapter 2

Figure 2.5: A dedicated root domain is small enough to efficiently replicate copies to the other locations on
your network.

A dedicated root domain focuses on the overall operations, administration, and management of
AD. There are at least two advantages to using a dedicated root domain in a larger
implementation of AD:
• By keeping the user and printer objects out of the root domain, you enhance security by
restricting access to only a few administrators.
• By keeping the root domain small, you can replicate it to other domain controllers on the
network at distant geographic locations. This approach helps increase the availability of
the network.
Because domain administrators can access and change the contents of the Enterprise
Administrators and Schema Administrators groups, having a dedicated root domain limits
normal access. Membership in these built-in groups should only be given to the enterprise
administrators, and they should only access the domain when doing official maintenance. In
addition, membership in the Domain Administrators group of the root domain should be granted
only to the enterprise administrators. Taking these steps allows you to avoid any accidental
changes to the root domain. You should also create a regular user account for each of your
administrators so that they don’t carry administrative privileges while doing regular work.
As I mentioned earlier, always replicate the root domain to multiple servers in an effort to
provide fault tolerance for this domain. Because a dedicated root domain is small (no user or
printer objects), it can be replicated across the network more quickly and easily. In addition to
replicating the root domain across the local area network (LAN), you can replicate the root
domain across the WAN to reduce the trust-traversal traffic among trees.

42
Chapter 2

Trust-traversal traffic can be complicated to understand. Essentially, anytime a user accesses


resources outside of his or her domain, the user (or the user’s client computer) must contact a
domain controller within the domain that contains the resources. If a direct trust does not exist
between the domains, additional domain controllers may need to be contacted.
Having a local domain controller for trusting domains helps reduce this trust-traversal traffic across
WAN links. However, locating a domain controller in a site may increase replication traffic across the
WAN link, which can, in some cases, be exponentially higher than trust traffic. You’ll need to evaluate
your circumstances to decide on the best approach for your organization.

Assigning a DNS Name to Each Domain


After you’ve determined the number of domains and installed the root domain, you need to
determine the DNS names for each domain. DNS is a globally recognized, industry-standard
system for naming computers and network services that are organized in a hierarchy. AD clients
make queries to DNS in an attempt to locate and log on to domains and domain controllers.
Network users are better at remembering name-based addresses, such as www.company.com,
than they are at remembering number-based addresses, such as 124.177.212.34. DNS translates
an easy-to-remember name address (www.company.com) into a number address
(124.177.212.34).
As I’ve mentioned, the domain is identified by a DNS name. You use DNS to locate the physical
domain controller that holds the objects and attributes in the domain. DNS names are
hierarchical (like AD domains). In fact, the DNS name for a domain indicates the position of the
domain in the hierarchy. For example, in the domain name company.com, the DNS name tells us
that the domain must be at the top of the forest and is the root domain. Another example is
marketing.chicago.company.com. From this domain name, we know that the domain is the
marketing department’s domain in the Chicago location of the company. The domain is two
levels from the root domain, or top of the tree. The Chicago domain is a child domain of the root
domain, or company, and the marketing domain is a child domain under Chicago.
When you create DNS names for the domains in AD, I recommend that you follow these
guidelines:
• Use an Internet-registered name for the top-level domain
• Use Internet standard characters
• Use locations to name child domains
• Never use the same name twice

43
Chapter 2

Using an Internet-Registered Name for the Top-Level Domain


When you name your top-level domain, I recommend that you use only a DNS name that has
been registered on the Internet and is thus globally unique. For example, realtimepublishers.com
is a top-level domain name and is registered with the Internet Corporation for Assigned Names
and Numbers (ICANN). You don’t need to register the names of underlying domains because
they fall under the control and jurisdiction of the top-level domain owner/registrant. For
example, the domain name research.realtimepublishers.com doesn’t need to be registered.
You don’t need to use an Internet-registered name; AD itself doesn’t care. However, if you name
your domain Microsoft.com, your users will have difficulty accessing the real Microsoft.com. If
you must use a non-registered domain name, use one with the special “pri” top-level domain
name, as in company.pri. Domain names ending in “pri” cannot be registered on the Internet, so
you’re guaranteed not to conflict.

You may not be responsible for your company’s Internet access. In that case, it’s important that you
coordinate your AD naming efforts with the person or group that is responsible for that access to
ensure that you’re not creating any unmanaged name resolution problems. For exampling, choosing
to use your company’s registered Internet domain name requires additional internal and external
configuration steps to ensure uninterrupted name resolution for your domain clients and other
network clients.

Because you choose to use an Internet-registered name, however, does not mean you must
expose that name on the Internet. For example, you might register companyinternal.org as your
internal DNS name for AD’s use, and use company.com on the Internet. Doing so will help
avoid exposing your internal DNS infrastructure to the Internet. You can also use something
called a split-brain DNS. In this technique, you would use a single registered name—say,
company.com—for both your internal AD domain name and your external Internet presence. An
external DNS server, connected to the Internet, would resolve names for public hosts in your
domain, such as “www.” A separate internal DNS server would resolve internal host names and
support AD; it would also contain static records for external names such as “www.” This
technique allows the same domain name to serve both internal and external uses, while ensuring
that Internet users have no access to your internal DNS infrastructure.

Using Internet Standard Characters


When you assign DNS names, you’re restricted to using only Internet standard characters to
ensure compatibility with AD and the Internet. The basic standards for naming are as follows:
• Domain names contain only letters, numbers, and hyphens (-). The underscore character
(_)is specifically not allowed, although Microsoft’s DNS implementation will accept it.
• Domain names cannot begin or end with a hyphen.
• The domain names of .com, .net, and .org cannot exceed 67 characters.
• Relative domain names (that is, the components between the dots in a fully qualified
domain name—FQDN) cannot exceed 22 characters apiece (this limitation doesn’t
include any extensions).
• Domain names aren’t case sensitive (although they are traditionally typed in lowercase).
• Domain names cannot include spaces.

44
Chapter 2

Using Locations to Name Child Domains


When you determine the name of the child domains, I recommend that you use a name that
describes the administrative layer that the domain represents. For example, if you’re creating a
domain so that individuals in your company’s research division will have more direct control
over their own resources, name the domain “research.” A domain intended for a European
subsidiary might be named “europe,” and so forth. This naming process applies to the first layer
of child domains that you create under the root domain. For example, the first layer of domains
that you create under the root might have names based on physical locations, if that’s how it
makes sense to divide the administrative responsibilities in the forest. Figure 2.6 portrays the first
layer of domain names, representing the physical locations on a network.

Figure 2.6: The first layer of domains directly under the root domain is named after the physical locations on
the network.

One strong recommendation for sticking with a geographic naming scheme is that other,
organizationally based naming schemes are prone to constant change. For example, the research
division might be absorbed into a larger operations division, and it wouldn’t make sense to have
the domain named “research” anymore. However, it’s unlikely that New York, for example, will
be changing its name. The AD domain hierarchy isn’t nearly as fluid or adaptable as the business
itself. Once you create and name domains, you cannot move or rename them easily. In addition,
you cannot move or rename the root domain.

WS2K3 provides the ability to rename domains, which is a new feature. However, renaming domains
is a complex process and there is no guarantee that the rename won’t break other applications that
rely on AD. For planning purposes, work under the assumption that domains can’t be renamed.

Using locations to name child domains is more flexible because physical locations on a network
seldom change. The organization at the specific site may change but not the physical location
itself. This design allows the tree to be more flexible to everyday changes. However, if the
physical location is changed or removed, the resources are moved (including the physical
resources, such as domain controllers, printers, and other equipment supporting the site).

45
Chapter 2

If your company is smaller and contained in one physical location, you could name domains after
the company or organization. These domains then hold all the objects and attributes for your
company. This design is easy and efficient. However, if your company has multiple physical
locations with network resources spread across them, you’ll want to create a second layer of
domains (under the root domain), and give the domains location names. The organizational
structures of business units, divisions, and departments will then be placed under each of these
location domains.
The caveat you knew was coming: Nobody can tell you the perfect domain naming scheme
without knowing more about your organization. It might very well be that an organizational-
based naming scheme makes absolute sense for your company. For example, some companies
have many divisions, all of whom share office space in various cities. However, these divisions
are independently managed and maintain their own network resources. In such a case, it makes
the most sense to name the domains after the divisions rather than the locations.

Never Using the Same Name Twice


I recommend that you never use the same DNS name twice, even if the names are used on
different networks. This simple guideline will help eliminate any confusion down the road. For
example, let’s say you decide to use the domain name engineering.company.com. Don’t use this
name for any other domain, even if the domain is on a different network. A client may connect to
both networks and query engineering.company.com. Depending on the layout of the network, the
client may locate the wrong domain in the wrong forest.

Dividing the Forest


In larger organizations, the implementation of AD can become quite large. As it grows, I
recommend that you break it into smaller pieces, which I’ll call a partition. A partition, in this
context, is a domain or portion of the directory tree that extends from the beginning of a branch
(or naming context) to the bottom of the tree.

The term partition can be a confusing one in the world of directory services. AD defines the concept
of a partition as a segment of a particular domain or forest. In WS2K3, not every domain controller
may contain all of the domain database. For example, an AD-integrated DNS zone in WS2K3 might
be included in a partition that is only stored on domain controllers that are also DNS servers.
In the context of this discussion, I’m using partition in a somewhat different sense, mainly because
there really isn’t a better term to use. In all cases, partition refers to a portion of the directory services,
such as a portion of the forest, a portion of a domain, and so forth.

The domain physically stores the containers, objects, and attributes in that branch. Several rules
control the creation of partitions in AD and how they operate:
• The topmost partition is the root domain
• Partitions don’t overlap (one object cannot be held in two partitions)
• The partitions contain all the information for the naming context

46
Chapter 2

In AD, the basic unit of partitioning is the domain. Thus, when you create your first partition,
you’re actually creating a child domain under the root domain. The domains in AD act as
partitions in the database; for example, a child domain helps split up the total user objects
(among other things) that are contained within the directory. Thus, each domain represents a
portion, or partition, in the overall AD database. Partitioning this database increases its
scalability. As you partition AD, you break it into smaller, more manageable pieces that can be
distributed across the domain controllers, or network servers. Figure 2.7 illustrates how you can
divide the AD database into smaller pieces that can be distributed to the domain controllers.

Figure 2.7: You can partition the AD database into smaller pieces, then distribute them among network
servers or domain controllers.

Breaking AD into smaller pieces and distributing them among multiple servers places a smaller
load and less overhead on any one server. This approach also allows you to control the amount
and path of traffic generated to replicate changes among servers. Once you create a partition,
replication occurs among servers that hold copies.
In AD, you can create many partitions at multiple levels in the forest. In addition, copies of the
domain can be distributed to many different servers on the network. Although AD is distributed
using partitions, any user can access the information completely transparently. Users can access
the entire AD database regardless of which server holds which data. Of course, users must have
been granted the proper permissions.
Although a single domain controller may not contain the entire AD database (that is, the entire
forest-wide database), users can still receive whatever information they request. AD queries the
GC on behalf of a user to identify the requested object, then resolves the name to a server
(domain controller) address using DNS. Again, this process is entirely transparent to the user.

47
Chapter 2

I want to point out that the proper way to think about AD is in terms of the entire forest. The directory,
or the AD database, contains everything in the entire forest; domains represent a portion—or
partition—of that larger forest-wide database. Thinking about databases in terms of a single domain
or even a single domain controller makes it too easy to miss the larger scope of AD. Per-server
databases went away with NetWare 3.x; per-domain databases go away with NT. AD is a larger
database, consisting of many domains.
Similarly, too many partitions—domains—is something to be avoided. As I’ve mentioned previously,
you generally want as few domains as possible to ease both management and operational activities.
There are reasons, which I’ve covered, to have multiple domains, but simply creating more domains
to arbitrarily partition the forest isn’t one of those reasons.

Placing the Domain Controllers for Fault Tolerance


After you’ve partitioned AD, you need to decide how to distribute each new domain or partition
across the network servers. The domain controllers are the servers that store the domains and
their distributed copies. One domain controller holds only one copy of an AD partition or domain
unless it’s a GC. The domain controller stores the objects for the domain or partition to which it
belongs.
The availability of domain information is strictly determined by the availability of the domain
controllers. It’s obvious that the domain controllers must be available so that users can log on
and access AD information. For this purpose, never have only one domain controller for any
domain. I recommend that you have at least two domain controllers for each domain to provide
redundancy and fault tolerance for every domain in your organization. Of course, locations with
more users or greater availability needs will have additional domain controllers to meet those
needs.

Determining Trust Relationships


Trust relationships are logical connections that combine two or more domains into one
administrative unit. Trust relationships allow permissions to be associated and passed from one
domain to another. Without some sort of trust among domains, users cannot communicate or
share resources. In this section, I’ll describe the advantages of using bi-directional trusts, one-
way trusts, and cross-link trusts.

Using Bi-Directional Transitive Trusts


In AD, trust relationships are automatically established between every domain and its parent
domain in the tree or forest. This setup greatly reduces the overhead of managing trust
relationships. The types of trusts that are created are called bi-directional transitive trusts. The
best way to understand the concept of transitive trusts is to use an example. Figure 2.8 shows bi-
directional trusts being established among all the domains and their child domains.

48
Chapter 2

Figure 2.8: Each domain has a bi-directional transitive trust relationship between itself and each of its child
domains.

One of the advantages of these new trusts is that they’re automatically established among all
domains; this benefit allows each domain to trust all the other domains in the forest. Another
advantage is that these bi-directional trusts, which are automatically established using Windows’
Kerberos security mechanism, are much easier to set up and administer than NT–style trusts.
Having bi-directional trusts also reduces the total number of trust relationships needed in a tree
or forest. For example, if you tried to accomplish the same thing in NT, you would need to create
two-ways trusts between one domain and every other domain. This setup would increase the total
number of trusts exponentially with the number of domains.
If you have experience with NT domains, you may know something of trust relationships.
However, the trusts in AD differ from NT trusts because AD trusts are transitive. To help you
understand what this means, I’ll provide an example. AD transitive trusts work much like a
transitive equation in mathematics. A basic mathematical transitive equation reads as follows:
A=B, B=C, therefore A=C
When applying this transitive concept to trust relationships, you get an understanding of how
transitive trusts work among domains. For example, if Domain A trusts Domain B, and Domain
B trusts Domain C, then Domain A trusts Domain C. Figure 2.9 illustrates this idea. Transitive
trust relationships have been set up between Domain A and Domain B and between Domain B
and Domain C. Thus, Domain A trusts Domain C implicitly.

49
Chapter 2

Figure 2.9: A domain tree viewed in terms of its transitive trust relationships. Because transitive trust
relationships have been set up between Domain A and Domain B and between Domain B and Domain C,
Domain A trusts Domain C implicitly.

In NT, trusts were non-transitive, so they didn’t allow this implicit trust to exist. For one domain
to trust another domain, an explicit trust relationship had to be created between them.
When domains are created in an AD forest, bi-directional trust relationships are automatically
established. Because the trust is transitive and bi-directional, no additional trust relationships are
required. The result is that every domain in the forest trusts every other domain. Transitive trusts
greatly reduce your overhead and the need to manually configure the trusts. Because trusts are
automatically set up, users have access to all resources in the forest as long as they have the
proper permissions.
Transitive trusts are a feature of the Kerberos authentication protocol. The protocol is used by
AD and provides distributed authentication and authorization. The parent-child relationship
among domains is only a naming and trust relationship. Thus, the trust honors the authentication
of the trusted domain. However, having all administrative rights in a parent domain doesn’t
automatically make you an administrator of a child domain. Policies set in a parent don’t
automatically apply to child domains because the trust is in place.

Using One-Way Trusts


One-way trusts aren’t transitive and are used among domains that aren’t part of the same forest.
If you’re familiar with the one-way trusts in NT, the one-way trusts that exist in newer versions
of Windows are just the same. However, they’re only used in a handful of situations.
First, one-way trusts are often used when new trust relationships must be established among
domains of different forests (when, for example, an inter-forest trust isn’t possible). You can use
them among domains to isolate permissions. For example, you can use one-way trusts to allow
access among forests and among the domains of the same tree. Figure 2.10 shows how you can
create a one-way trust between two domains in two different forests. Setting up a one-way trust
allows users to access network resources in the direction of the trust. The actual user rights
depend on the access control lists (ACLs) governing the domains.

50
Chapter 2

Figure 2.10: A one-way trust is established between a domain in Forest 1 and a domain in Forest 2. The trust
allows access to network resources in each domain.

The second use of one-way trusts is to create a relationship from an AD domain to backward-
compatible domains, such as an NT domain. Because NT domains cannot naturally participate in
AD transitive trusts, you must establish a one-way trust to them. You must to manage one-way
trusts manually, so try to limit the number you use.
In both of these situations, you can create two one-way trusts among the domains. However, two
one-way trusts don’t equal a bi-directional transitive trust in AD.
A third type of one-way trust is, as I’ve mentioned, the inter-forest trust. This type works exactly
like a one-way trust between two domains, except that it establishes trust between every domain
in the respective forests. Again, this type of trust is available only in an environment in which
every domain controller is running WS2K3.

Using Cross-Link Trusts


Cross-link trusts are used to increase performance among domains. Cross-link trust relationships
help increase the speed at which users authenticate among domains. However, cross-link trusts
are needed only between two domains that are both far from the root domain. To completely
understand the need for the cross-link trusts, you first need to understand how user authentication
works in AD.
When a user needs to authenticate to a resource that doesn’t reside on its own domain, the client
first must to determine where the resource is and locate it. If the resource isn’t in the local
domain, the domain controller will pass back a referral list of other domain controllers that might
have the resource. The workstation then contacts the appropriate servers in the referral list to find
the resource. This process continues until the requested resource is found. This process is often
referred to as chasing referrals and can take time, especially on large or complex AD networks.

51
Chapter 2

Walking up and down the domain tree branches lengthens the time it takes to query each domain
controller and respond to the user. To speed this process, you can establish a cross-link, or
shortcut, trust relationship between two domains. If you decide to use a cross-link trust, I
recommend that you place it between the two domains that are farthest from the root domain.
For example, suppose you have a domain tree that has domains 1, 2, 3, 4, and 5 in one branch
and domains 1, A, B, C, and D in another branch. Domains 5 and D are located farthest from the
root domain (see Figure 2.11).

Figure 2.11: The domain tree has two branches, domains 1, 2, 3, 4, and 5 are one branch, and domains 1, A,
B, C, and D are the second branch. The cross-link trust can be established between domains 5 and D.

Let’s say that a user in Domain 5 needs to access a resource in Domain D. To accomplish this
request, the authentication process must traverse up the first branch and down the second branch
while talking to each domain controller. Continuous authentications such as this create a
significant amount of network traffic. To alleviate this problem, you can establish a cross-link
between Domain 5 and Domain D.
The cross-link between Domain 5 and Domain D will serve as an authentication bridge between
the two domains. The result is better authentication performance between the domains.

The need for cross-link trusts is a good reason to keep your domain hierarchy as flat as possible.
Cross-link trusts should only be used when absolutely necessary, and a good design won’t require
them.

52
Chapter 2

Designing OUs for Each Domain


An OU is a container object that allows you to organize your objects and tie a Group Policy
Object (GPO) to it. Using the OU, you can group similar objects into logical structures in a
domain. OUs can also be nested to build a hierarchy in a domain. This hierarchy of containers is
typically named after divisions, departments, and groups in your company. When you’re
designing and creating the hierarchical structure in each domain, it’s important to understand the
following characteristics of OUs:
• OUs can be nested—An OU can contain other OUs, enabling you to build a hierarchy
inside each domain.
• OUs can help delegate administration—You can delegate administrative tasks to
subordinate administrators by creating subordinate OUs. Using nested OUs, you can fine-
tune the level of control you need.
• OUs aren’t security principals—You cannot make an OU a member of a group. You
cannot grant users permissions to resources because they reside in a particular OU.
Because OUs are used to delegate administration, they can specify who manages the
resources in the OUs, but they don’t indicate the resources a user can access.
• OUs can be associated with a GPO—A GPO enables you to define configurations for
users and computers in OUs. For example, you can create a desktop policy that every
user in the OU will use.

AD comes with two built-in containers, Users and Computers, that look superficially like OUs but
aren’t. They can’t have GPOs linked to them, for example.

• OUs don’t need to be viewed by users—It isn’t necessary for you to design OUs with
user navigation in mind. Although users can view the OU structure, this structure isn’t an
efficient method for finding resources. The preferred method for users to find resources is
by querying the GC.
Now that you understand a few of the basic characteristics for OUs, consider the following
guidelines for designing an efficient and effective OU structure:
• Create OUs to delegate administration
• Create OUs to reflect your company’s organization
• Create OUs for Group Policy
• Create OUs to restrict access

53
Chapter 2

Creating OUs to Delegate Administration


I’ve mentioned that OUs can be used to create administrative areas in a domain. Using OUs, you
can delegate administrative tasks to subordinate administrators. For example, suppose the
engineering department wants to administer its own objects and resources in the Chicago
domain. You can accomplish this setup by creating the following OU:
engineering.chicago.company.com. After you’ve created the new OU and placed all the objects
and resources into it, you can grant explicit permissions to the administrators of the engineering
department so that they can control their own objects. Figure 2.12 illustrates how you can create
the Engineering OU in the Chicago domain.

Figure 2.12: You can create the engineering OU in the Chicago domain, then assign permissions to the
engineering department administrators to manage all the objects.

54
Chapter 2

Another useful feature that I mentioned earlier is that OUs can be nested. This feature enables
you to build a hierarchy in each domain. For example, suppose that the testing group in the
engineering department wants full administrative control over all its resources, such as users,
printers, and computers. To accommodate this request, you simply create a new OU directly
under the Engineering OU in the Chicago domain. The hierarchical structure now looks like the
following: testing.engineering.chicago.company.com. After you’ve created the new OU and
placed the resources, you can give full privileges to the testing group’s administrator. If an OU is
nested, it inherits the properties of the parent OU by default. For example, if the Engineering OU
has certain security or GPOs set, they’re passed down to the Testing OU. The Testing OU is
considered nested under the Engineering OU.

Be careful to limit the number of OU layers you create. Creating too many layers can increase the
administrative overhead. Limiting the number of OU layers also increases user logon performance.
When a user logs on to AD, the security policies take effect. To find all these policies, the workstation
must search all layers of the OU structure. Having fewer OU layers allows the client to complete this
search more quickly.

Creating OUs to Reflect Your Company’s Organization


If you don’t create OUs in your domain, all users, printers, servers, computers, and other
resources are displayed in a single list. This type of layout makes it difficult to search for
resources. This problem increases as the number of objects in the domain grows.
One of the many benefits of creating OUs in a domain is the organization of this flat layout. OUs
allow you to create an organization that reflects your company’s divisions, departments, and
groups. In fact, you can use your company’s organizational chart or a similar document to help
you. Figure 2.13 illustrates how you can create OUs based on an organizational chart.

Figure 2.13: OUs have been created in a domain based on an organizational chart.

55
Chapter 2

Creating OUs for Group Policy


Group Policy enables you to define desktop configurations for users and computers. Desktop
configurations have settings that govern and control the users’ experience at their workstations.
GPOs can also be associated with an OU. This association allows all users and computers in the
OU and any nested OUs to receive the settings defined in Group Policy. These settings configure
several items, including installed software, registry settings, and logon scripts—just to name a
few.
The ability to set Group Policy on OUs allows you to control a large set of users and computers
from a central point. If you have a special need for certain users and computers, you can create
an OU and establish Group Policy. For example, if the Accounting Department needs specific
settings on its desktops, you can create an OU=Accounting and establish the specific policy.
Group Policy will then apply to every user and computer in the new OU.
As I mentioned earlier, GPOs can be associated with OUs as well as the domain and site objects
in AD. Because GPOs can be associated with each of these objects, you can create
implementations using GPOs to generate various combinations. If you aren’t careful, these
combinations can become very complicated and cause you headaches.

Creating OUs to Restrict Access


The quickest and easiest way to restrict total access to network resources is to create a new OU
and place the network resources into it. You can then restrict access to the OU, thereby removing
access to the network resources. In addition, the objects representing the network resources are
no longer visible.
Users who don’t have the right to read an object can normally still see it in AD. This visibility
may be a problem if you have highly secure network resources that you don’t want anyone else
to see. You can restrict and hide the resources by creating a new OU in the domain and limiting
access to only the few who need it.

Designing the Sites for the Forest


Sites are locations on the physical network that contain AD servers. A site is stored in AD as an
object and is defined as one or more well-connected TCP/IP subnets. By “well-connected,” I
mean that the network connectivity among the subnets is highly reliable and supports a data-
transfer rate of at least 10 megabits per second (Mbps)—in other words, your typical LAN at
your typical office location.
Designing sites and site links in AD takes advantage of the physical network layout. The basic
assumption is that servers and workstations with the same subnet address are connected to the
same network segment and have LAN speeds. Defining a site as a set of subnets allows
administrators to easily configure AD access and replication topology to take advantage of the
physical network. Sites also help you locate network servers so that they’re physically close to
the users who depend on them.

For an explanation of site links, see “Creating Sites and Site Links Based on Network Topology” later
in this chapter.

56
Chapter 2

It’s your role as an administrator to design the site objects and site links for your tree or forest
that assure the best network performance. It’s also your job to determine what speed assures this
performance and reduces server downtime as a result of network outages. Establish site objects
and site links based on network and subnet speed. Although many subnets can belong to a single
site, a single subnet can’t span multiple physical sites. To help you establish a design for the sites
in your forest, you need to consider the following guidelines:
• Create sites and site links based on network topology
• Use sites to determine the placement of domain controllers
• Use sites to determine the placement of GC servers

Creating Sites and Site Links Based on Network Topology


When you create sites and site links for your tree or forest, use the physical layout of your
network, or topology. Before you can properly create sites and site links, you need a solid
understanding of what they are.

About Sites
Sites are groups of computers (or subnets) that share high-speed bandwidth connections on one
or more TCP/IP subnets. Subnets are groups of local segments on the network that are physically
located in the same place. Multiple site objects create a site topology. Figure 2.14 portrays a site
with TCP/IP subnets that exist between the servers and workstations. A LAN—as opposed to a
WAN, MAN, or other lower-speed connection—always connects a site.

Figure 2.14: A site is one or more TCP/IP subnets or LAN networks that exist between the servers and
workstations.

57
Chapter 2

One domain can span more than one site, and one site can contain multiple domains. However, for
design purposes, it’s important to remember that sites define how replication occurs among domain
controllers and which domain controller a user’s workstation contacts for initial authentication.
Normally, the workstation first tries to contact domain controllers in its site.

About Site Links


Site links are objects that represent the WAN links on your network. They also represent any
low-bandwidth connections between two locations. Site links connect two or more sites. They
help you determine the replication schedule and latency, and they help you determine where to
place network servers. The rule is to create a site link when a connection is slower than a LAN-
speed connection. Defining site links allows administrators to configure AD and replication to
take advantage of the network.
The site link object has four settings:
• Cost—Helps the replication process determine the path of the communication among
domain controllers; the cost of every site link along a potential replication path is added
up, and the least expensive path is utilized
• Replication schedule—Determines what time of day the replication process can execute
• Replication interval—Helps the replication process determine how often to poll the
domain controllers on the other side of the link
• Transport—Helps the replication process determine which transport protocol to use
during communications
Site and site link objects are stored in a special container called the configuration container. The
configuration container is stored and replicated to every AD domain controller, providing each
server with complete details of the physical network topology. A change to any of the
information in the site or site link objects causes replication to every domain controller in the
forest.

58
Chapter 2

Creating the Site Topology


Sites and site links create the site topology, as Figure 2.15 illustrates. The site topology helps the
replication process determine the path, cost, and protocol among domain controllers.

Figure 2.15: The site topology is created from the site objects and the site links. The site topology helps the
replication process determine the path, cost, and protocol among domain controllers.

When you create the site topology, it’s useful to have a complete set of physical LAN and WAN
maps. If your company has campus networks at one or more locations, you’ll need to have the
physical maps of those locations. These maps should include all the physical connections, media
or frame types, protocols, and speed of connections.
When defining the sites, begin by creating a site for every LAN or set of LANs that are
connected by high-speed bandwidth connections. If there are multiple physical locations, create a
site for each location that has a LAN subnet. For each site that you create, keep track of the IP
subnets and addresses that comprise the site. You’ll need this information when you add the site
information to AD.

Site names are registered in DNS by the domain locator, so they must be legal DNS names. You
must also use Internet standard characters—letters, numbers, and hyphens. (For more information,
see “Using Internet Standard Characters” earlier in this chapter.)

After you’ve created the sites, you need to connect them with site links to truly reflect the
physical connectivity of your network. To do so, you need to first assign each site link a name.
By default, site links are transitive, just like trust relationships in AD. Thus, if Site A is
connected to Site B, and Site B is connected to Site C, it’s assumed that Site A can communicate
with Site C. This transitive connectivity is called site link bridging, and by default, AD creates
bridges throughout all site links.

59
Chapter 2

The practical upshot of bridging is this: Imagine that you have three sites, A, B, and C. A is
connected to B, and B is connected to C. When you create the two site links to represent these
connections, they’ll be named something like AtoB and BtoC (for example). AD will
automatically bridge these connections, creating a transitive link from A to C. Domain
controllers in site A will therefore replicate with domain controllers in site B and in site C. The
idea behind site bridging is to reduce replication latency by allowing domain controllers at
various sites to replicate with one another directly. You can disable site bridging, forcing A to
replicate only with B, and preventing any changes made at site A from reaching site C until site
C replicates them from site B over its private site link. You can also manually create site link
bridges, if desired, to “shortcut” the site topology and reduce replication latency.
The process of generating this site replication topology is automatic, and it’s handled by a special
service called the Knowledge Consistency Checker (KCC). If you don’t like the topology that the
KCC generates for you, you can create the topology manually.
The purpose of creating the site topology is to ensure rapid data communications among AD
servers. The site topology is used primarily when setting up replication of AD. However, the
placement of the domain controllers and partitions govern when and how replication takes place.

Using Sites to Determine the Placement of Domain Controllers


After you’ve properly created site and site link objects, you can use them to help you decide how
to properly distribute AD partitions across the network servers. Network servers are the domain
controllers that store AD domains and their copies. One domain controller holds only one copy
of an AD partition or domain. The server works to authenticate users and provide responses to
queries about the objects and attributes.
Your responsibility is to determine where to place the domain controllers on the network to best
suit the needs of the users. I recommend that the domain controllers be located on or near the
users’ subnet or site. When a workstation connects to the network, it typically receives a TCP/IP
address from DHCP. This TCP/IP address identifies the subnet or site to which the workstation is
attached. If the workstation has a statically assigned IP address, it will also have statically
configured subnet information.
In either case, when users log on to the network, their workstations can reach the closest domain
controller site by knowing the assigned address and subnet information. Because computers in
the same site are physically close to each other, communication among them is reliable and fast.
Workstations can easily determine the local site at logon because they already know what
TCP/IP subnet they’re on, and subnets translate directly to AD sites.
If no domain controller is available in the local site, user traffic will cross the WAN links and
sites to find other servers. To place the domain controller for best overall connectivity, select the
site where the largest numbers of users are located. All the users in that site will authenticate to
the local domain controller. This approach guarantees that the users will retrieve their object
information from the GC partition. The location of the server is important because users are
required to access a GC server when they log on.

60
Chapter 2

Using Sites to Determine the Placement of DNS Servers


I’ve already mentioned that DNS and AD are inseparably connected. AD uses DNS to locate the
domain controllers. The DNS service enables users’ workstations to find the IP addresses of the
domain controllers. The DNS server is the authoritative source for the locator records of the
domains and domain controllers on the network. To find a particular domain controller, the
workstation queries DNS for the appropriate service (SRV) and address (A) resource records.
These records from DNS provide the names and IP addresses of the domain controller.
The availability of DNS directly affects the availability of AD and its servers. As mentioned,
users rely on DNS as a service. To guarantee DNS as a service, I recommend that you place or
have available at least one DNS server for every site on your network. This setup allows all users
to access the DNS service locally. You don’t want users to have to query DNS servers that are
offsite to locate the domain controllers that are on the users’ own subnet.

The AD domain controllers query DNS to find each other during replication. A new domain controller
participates in replication by registering its locator records with DNS. Likewise, each domain controller
must be able to look up these records. Such is the case even if the domain controllers are on the
same subnet.

If you depend on an outside DNS service, you might need to adjust the number of DNS servers
and physical placement, if possible. You’ll also need to verify that the outside DNS service
supports the required SRV resource record. If it doesn’t, you may need to install and configure
your own implementation of Microsoft’s DNS to support AD.
If you don’t want to depend on an existing DNS service or a DNS service that is offsite, you
might want to install the Microsoft DNS service that is integrated into AD. The Microsoft DNS
service stores the locator records for the domain and domain controllers in AD. You can then
have one or more domain controllers provide the DNS service. Again, I recommend that you
place at least one DNS server for each site object in your environment. Using the Microsoft DNS
service is an optional configuration, and storing the locator records in AD may have a negative
impact on replication traffic on large networks.

Summary
My first recommendation for troubleshooting AD is to make sure that its components are
designed and implemented correctly. In addition, the efficiency of AD depends on the design and
implementation of key structures—forests, trees, domains, and OUs. I also recommend that the
sites and site links be properly established to support the distribution and replication of the
system. In this chapter, we explored these topics as well as the placement of other supporting
servers, such as domain controllers, GC servers, and DNS servers. The design and
implementation of these structures is strictly your responsibility as network administrators.
Before you can effectively troubleshoot AD, make sure you feel confident about your design.

61
Chapter 3

Chapter 3: Monitoring and Tuning the Windows Server 2003


System and Network
A Windows Server 2003 (WS2K3) network is a system of devices that work together toward the
common goal of providing communication among users, servers, and applications. The most
important of these devices are WS2K3 domain controllers. This guide is primarily interested in
the components and software that are required for WS2K3 Active Directory (AD) services. This
chapter focuses on how you can monitor WS2K3 domain controllers and their subsystems to
help you reduce downtime and improve AD performance.

Monitoring WS2K3 Domain Controllers


Domain controllers are the single most important type of device on an AD-based WS2K3
network. These devices share the responsibility of storing the directory information, and they
interact with each other to replicate the directory information and keep it up to date. In addition,
domain controllers are responsible for authenticating user logons and servicing other requests for
access to directory and network resources. Domain controllers are also the gatekeepers to
changes in the domain, as all domain configuration, security, and operational changes must be
made through a domain controller. Because domain controllers are crucial to the performance
and operation of the directory, it’s critical that you continually monitor these servers. A poorly
performing or misbehaving domain controller can easily cause network downtime and loss of
directory functionality. For example, when the directory slows significantly or is unavailable,
users can’t log on, there is no Address Book for Exchange, and users may not be able to print or
access Web-based applications.
When you consider how you’ll monitor your domain controllers, first remember that no one
domain controller contains all of the directory information. In any well-built WS2K3 network,
each domain partition in the directory typically has two or more domain controllers hosting the
domain to provide fault tolerance for directory services. With this kind of redundancy in place,
you might initially be fooled into thinking that monitoring each domain controller for
performance and downtime isn’t all that important.
However, each domain controller plays a role in supporting your users. For example, if two
domain controllers in the same directory partition are placed in separated sites or subnets, users
in each site will use the domain controller nearest them. However, if one of the domain
controllers goes down, users in that location must traverse the wide area network (WAN) to log
on and access the directory. This traversal is usually undesirable, especially if there are too many
users and/or if the WAN link is slow.

62
Chapter 3

Another example of why you need to monitor domain controllers is that some domain controllers
on a WS2K3 network (no matter how many domain controllers it may have) are unique. For
example, some domain controllers perform special duties called Flexible Single-Master
Operation (FSMO) roles. Although the replication of AD is multimaster, the FSMO roles held
by these domain controllers are single-master (much like a Windows NT 4.0 primary domain
controller—PDC). Thus, these domain controllers don’t have additional copies or replicas to
provide fault tolerance if the domain controller hosting a particular role is down.
These FSMO domain controllers perform special roles for AD, such as managing the domain,
managing the schema, and supporting down-level clients. If any of these critical domain
controllers go down, the directory loses functionality and can no longer update or extend the
schema, or add or remove a domain from the directory.
Failing to monitor domain controllers can adversely affect a network’s performance and
availability. For example, if an entire department is unable to access the domain controller or
directory, users lose time, and the company loses money. To help you ensure that your domain
controllers are available, you can, and should, monitor and analyze Windows in the five
following areas:
• Overall system
• Memory and cache
• Processor and thread
• Disk
• Network

I’ll discuss each of these areas, and the reason for their importance, in the following sections. I’ll
discuss monitoring AD itself in Chapter 4.

Another critical monitoring area is auditing. Although auditing is usually considered a security-
related task, auditing can play a crucial role in operational issues as well. For example, if a
domain controller that has been performing acceptably suddenly slows, what is the first question
that you’ll ask yourself: What changed? Monitoring elements such as the memory, disk, and
network will tell you that something changed, but not what; only proper auditing can provide the
clue as to what changed.
Unfortunately, Windows auditing leaves a lot to be desired from a reporting and investigative
point of view. For example, each domain controller maintains its own audit logs; in a large
company with many domain controllers, poring through each of them to find out what changed
in the domain can be a time-consuming task. So much so, in fact, that few administrators ever
turn to auditing as a troubleshooting tool. Later in this chapter, we’ll explore examples of how
auditing can be an effective troubleshooting tool and how you can make auditing more efficient
and useable.

63
Chapter 3

Monitoring the Overall System


Monitoring a Windows domain controller means watching the operation of both the server’s
operating system (OS) and its hardware subsystems. When you monitor domain controllers,
begin by establishing a performance and reliability baseline for each—that is, a nominal and
acceptable level of operation under real-world conditions on your network. Establishing a
baseline allows you to track the operation of the domain controller over time. If a potential
problem or bottleneck occurs, you can recognize it immediately it because you can compare that
behavior with the baseline established for that domain controller.
Another way to think about a baseline is in terms of health, a concept increasingly recognized by
products such as Microsoft Operations Manager (MOM). Consider this: Performance data—such
as the fact that the CPU is running at 60 percent capacity—is interesting, but it has no context. Is
60 percent good? Is it bad? For a server that normally averages 40 percent, 60 percent represents
a significant increase; for a server normally averaging 80 percent, 60 percent is a marked
improvement. For a server that normally shows 60 percent, it is the status quo. It’s these
comparisons to what’s normal—the baseline—which adds context to performance data and
creates a health indicator: good, bad, really bad, or whatever you like to call it. Without a
baseline for comparison, performance data is just interesting numbers.
Monitoring domain controllers means watching for problems or bottlenecks in the OS and its
subsystems. A simple example of a bottleneck occurs when a domain controller’s processor is
running at 100 percent usage because one application has tied up the CPU. Almost every
Windows administrator has seen this occur at some point.
Windows provides several utilities that can assist you in monitoring your domain controllers and
their subsystems. These tools provide features that will help you search for bottlenecks and other
problems:
• Task Manager—Gives you a quick view of which applications and processes are running
on the domain controllers. This utility allows you to view a summary of the overall CPU
and memory usage for each of these processes and threads.

64
Chapter 3

• Performance console—Allows you to view the current activity on the domain controller
and select the performance information that you want collected and logged. You can
customize WS2K3’s performance-counter features and architecture to allow applications
to add their own metrics in the form of objects and counters, which you can then monitor
using the Performance console. By default, the Performance console has two
applications, System Monitor and Performance Logs and Alerts.
System Monitor enables you to monitor nearly every aspect of a domain controller’s
performance and establish a baseline for the performance of your domain controllers.
Using System Monitor, you can see the performance counters graphically logged and set
alerts against them. The alerts will appear in Event Viewer.
The Performance Logs and Alerts application enables you to collect information for those
times when you can’t detect a problem in real time. This application allows you to collect
domain controller performance data for as long as you want—days, weeks, or even
months.
• Event Viewer—Allows you to view the event logs that gather information about a
domain controller and its subsystems. There are three types of logs: the Application Log,
the System Log, and the Security Log. Although the event logs start automatically when
you start the domain controller, you must start Event Viewer manually.

When monitoring domain controllers using the Performance console’s logging feature, make sure you
don’t actually create a problem by filling the computer’s disk with large log files. Be sure to only
include those statistics in the logging process that you absolutely need. Keep the sampling period to
the minimum required to evaluate domain controller performance and usage. To select an appropriate
interval for your computer, establish a baseline of performance and usage. Also, take into account the
amount of free disk space on your domain controller when you begin the logging process. Finally,
make sure that you have some application in place (such as the Performance console) that
continually monitors the domain controller to ensure that it has plenty of free disk space.
In addition to monitoring the local domain controller, you can use the Performance console to monitor
domain controllers remotely and store the log files on a shared network drive. Doing so enables you
to monitor all the domain controllers in a directory from one console or utility.

At the heart of the Performance console and Task Manager are the performance counters that are
built-in to the Windows OS. I’ll introduce each of these monitoring utilities briefly in the
upcoming sections and demonstrate how they can help you monitor specific subsystems. Keep in
mind that this chapter isn’t intended to be an in-depth study of all the capabilities of these
utilities. Instead, the intention is to provide a general introduction to them and show you how you
can use them to assist you in monitoring your domain controllers.

65
Chapter 3

Using Task Manager


The easiest and quickest way to view how each application or system process is using the CPU
and memory is by using Windows’ Task Manager. This utility allows you to see which processes
or threads are running on a Windows domain controller at any given moment; it also shows a
summary of overall CPU and memory usage. To launch Task Manager, either right-click the
taskbar (typically at the bottom of the screen) and choose Task Manager or press Ctrl+Alt+Del,
then click Task List from the menu. Figure 3.1 shows an example of Task Manager.

Figure 3.1: Windows’ Task Manager allows you to view and manage the applications and processes running
on a domain controller and manage their performance.

Task Manager supplies five pages of information: Applications, Processes, Performance,


Networking, and Users. Each of these pages will help you understand more about a domain
controller’s processes and memory. I’ll discuss some of these screens in greater detail later in
this chapter.

66
Chapter 3

Using the Performance Console


In WS2K3, one of the main utilities for monitoring a domain controller is the Performance
console. The Performance console allows you to view the current activity on the domain
controller and select the performance information that you want collected and logged.

In NT, the Performance console was known as Performance Monitor, and like most NT administration
utilities, it was a standalone utility rather than a Microsoft Management Console (MMC) snap-in.

The Performance console helps you accurately pinpoint many of the performance problems or
bottlenecks in your system. It monitors your WS2K3 domain controller by capturing the selected
performance counters that relate to the system hardware and software. The performance counters
are programmed by the developer of the related system. The hardware-related counters typically
monitor the number of times a device has been accessed. For example, the physical disk counters
indicate the number of physical disk reads or writes and how fast they were completed. Software
counters monitor activity related to application software running on the domain controller. To
launch the Performance console, choose Start, Programs, Administrative Tools, Performance.
The first application that starts in the Performance console is System Monitor. Using System
Monitor, you can view the current activity on the domain controller and select information to be
collected and logged for analysis. You can also measure the performance of your own domain
controller as well as that of other domain controllers on your network. Figure 3.2 shows System
Monitor.

Figure 3.2: The Performance console includes both System Monitor and Performance Logs and Alerts.

67
Chapter 3

When it starts, System Monitor isn’t monitoring any counters or performance indicators for the
system. You determine which counters System Monitor tracks and displays. To add a counter,
click the addition sign (+) icon on the toolbar or right-click anywhere in the System Monitor
display area and choose Add Counters from the shortcut menu. Using either approach, the Add
Counters dialog box appears, which Figure 3.3 shows, in which you can choose the counters to
monitor.

Figure 3.3: In System Monitor, you can choose which counters you want to track and monitor on the display.

Once you choose the counter that you want to view, System Monitor tracks performance in real
time. When you first start using System Monitor, the number of counters that are available seems
overwhelming because there are counters for almost every aspect of the computer. However, in
the spirit of the age-old 80/20 rule, you’ll probably find that you tend to use about 20 percent of
the available counters 80 percent of the time (or more), using the other counters only when you
need specific monitoring or troubleshooting information.

If you don’t understand the meaning of a particular Performance console counter, highlight it and click
Explain. The informational dialog box that appears provides a description of the selected counter
(and, in some cases, what the various values or ranges might indicate).

Later sections of this chapter discuss how you can use System Monitor to monitor memory, view
processes, and monitor network components on a domain controller as well as monitor the disk
subsystem.

68
Chapter 3

Event Viewer
As with its NT and Win2K predecessor, WS2K3 uses an event logging system to track the
activity of each computer and its subsystems. The events that are logged by the system are
predetermined and tracked by the OS. In addition, WS2K3 provides Event Viewer, which allows
you to view the events that have been logged.

Events Tracked in Event Logs


Windows domain controllers occasionally encounter serious error conditions and halt operation.
This situation is called a stop error (also informally known to some users as the blue screen of
death—BSOD). The error message is displayed on a solid blue background on a domain
controller’s console. Several stop errors are worth monitoring because they affect the reliability
of a computer. Fortunately, stop errors are recorded in a domain controller’s event logs when the
computer restarts. To view the stop errors in the event logs, you need to launch Event Viewer.
In addition to stop errors, if a Windows domain controller restarts, these events are recorded in
the System Log section of the event logs. The reasons for a restart could include OS crashes, OS
upgrades, and hardware maintenance.
Another type of event that a domain controller tracks in the event logs is application crashes.
Windows uses the Dr. Watson utility (Drwtsn32.exe) to record problems and failures in
applications running on the domain controller. Failures are recorded in the Application Log
section of the event logs. Again, you can use Event Viewer and the information in the event logs
to analyze problems with an application.

Types of Event Logs


When you use Event Viewer, the event logs are separated into three logs, as follows:
• Application Log—Contains events logged by applications or programs such as Exchange
or IIS that are running on the computer. The developer of an application decides which
events to record.
• System Log—Contains events logged by the subsystems and components of the domain
controller. For example, if a disk driver has problems or fails, it records the events in the
System Log. You can use this log to determine the general availability and uptime of the
domain controller.
• Security Log—Records security events, such as when a user successfully logs on or
attempts to log on. This log also records events that relate to file access. For example, an
event is recorded when a file is created, opened, or deleted. By default, the Security Log
can only be seen by systems administrators.
Additional logs are often created on domain controllers, including a dedicated log for the DNS
Server service, if it’s installed. These logs provide information specific to particular services,
making it somewhat easier to track the information you need.

69
Chapter 3

Starting Event Viewer


The event logs start automatically when you start the domain controller; you must start Event
Viewer manually. To start or display Event Viewer, choose Start, Run, type
eventvwr
then click OK, or choose Start, Programs, Administrative Tools, Event Viewer. Event Viewer
starts and displays the screen that Figure 3.4 shows.

Figure 3.4: The startup screen or display for Event Viewer.

Only a user with administrative privileges can view the Security Log. Regular users can only view the
Application Log and System Log.

70
Chapter 3

Types of Events Logged by Event Viewer


Event Viewer logs several types of events, each of which has a different severity to help you
analyze a problem:
• Error—Signifies that a severe problem has occurred. This event means that data or
functionality was lost. For example, if a service fails to load during startup or stops
abruptly, an error is logged.
• Warning—Is less significant than an error and indicates that a problem could occur in the
future. For example, a warning is logged if disk space becomes too low.
• Information—Describes important situations that need noting. This event is typically
used to notify when an operation is successful—for example, a disk driver loaded
successfully and without errors.
• Success audit—Logs successful access to a secured system resource such as a file or
directory object. A success audit event is a successful security-access attempt. For
example, if a user attempts to log on to the system and is successful, a success audit event
is logged.
• Failure audit—Is the opposite of the success audit event. For example, if a user attempts
to log on to the system or access a secured resource and fails, a failure audit is logged.

Sorting and Filtering Events


Using Event Viewer, you can sort events on the screen so that you can easily review and analyze
the information. To sort events on the screen, choose View, Newest First (the default) or Oldest
First.
In addition to selecting the sort order for events, you can filter them. Filtering events allows you
to select and view only the events that you want to analyze. To set a filter for events in Event
Viewer, choose View, Filter Events. Figure 3.5 shows the dialog box that appears to help you
specify the filter characteristics.

71
Chapter 3

Figure 3.5: Events can be filtered in Event Viewer to restrict the list of events that are displayed.

Exporting Events
In addition to sorting and filtering events in Event Viewer, you can export events in a variety of
formats to use with applications such as Microsoft Excel. To export events, choose Action,
Export List. When the Save As dialog box appears (see Figure 3.6), you can type a file name
with the .xls extension or choose a file type such as Text (Comma Delimited) (*.csv).

72
Chapter 3

Figure 3.6: The events in Event Viewer can be exported for use with various applications.

A major shortcoming of the Windows event logs, which I’ve already mentioned, is the fact that
each server maintains it own logs. If you’re relying on system-generated events or auditing
events to help provide troubleshooting clues, you might find yourself running around to a dozen
domain controllers to determine whether any of them have anything of use in their logs.
Microsoft provides a basic solution in its Audit Collection Service (ACS), which is designed to
aggregate log data into a central Microsoft SQL Server database. However, ACS is only
designed to work with the Security Log, again focusing on the security uses of auditing and not
the operational and troubleshooting aspects.

Monitoring Memory and Cache


One of the most common performance problems with Windows domain controllers (and all
servers, for that matter) is excessive paging, which is caused by insufficient random access
memory (RAM). In this situation, one of the greatest performance gains you can achieve is
adding more physical RAM to a system. Therefore, I recommend that the first subsystem you
monitor be the domain controller’s memory and cache. Problems caused by lack of memory can
often appear to be problems in other parts of the system. For instance, a lack of memory can
cause insufficient file system cache, which can lead to and be seen as a performance problem in
the disk subsystem.
Before I give you the details of monitoring your domain controller’s memory, I’ll first briefly
introduce the memory model for Windows. Memory in Windows provides a page-based virtual
memory management scheme (called Virtual Memory Manager—VMM) that allows
applications to address 4 gigabytes (GB) of memory. Memory in Windows is able to do exactly
that by implementing virtual addresses. Each application is able to reference a physical chunk of
memory, at a specific virtual address, throughout its lifetime. VMM takes care of whether the
memory should be moved to a new location or swapped to disk completely independently of the
application.

73
Chapter 3

Because everything in the system is realized using pages of physical memory, it’s easy to see
that pages of memory become scarce rather quickly. VMM uses the hard disk to store unneeded
pages of memory in one or more files called paging files. Paging files represent pages of data
that aren’t currently being used but may be needed spontaneously at any time. By swapping
pages to and from paging files, VMM is able to make pages of memory available to applications
on demand and provide much more virtual memory than the available physical memory.
One of the first monitoring or troubleshooting tasks you’ll carry out is to verify that your domain
controller has enough physical memory. Table 3.1 shows the minimum memory requirements for
a WS2K3 domain controller.
Installation Type Memory Requirement
Minimum installation 256MB
Server running a basic set of services 512MB
Server running an expanded set of services 1GB or more

Table 3.1: Minimum memory requirements for a WS2K3 domain controller.

Your physical memory requirements for actual production servers will typically be much higher
if you expect decent performance. Because your domain controllers will at least be running AD,
I recommend that you always start with at least 512 megabytes (MB) RAM. If you want to load
other applications that come with their own memory requirements, you’ll need to add memory to
support them.
If there isn’t enough memory on your domain controller, it will start running slower as it pages
information to and from its hard drive. When physical memory becomes full and an application
needs access to information not currently in memory, VMM moves some pages from physical
memory to a storage area on the hard drive called a paging file.
As the domain controller pages information to and from the paging file, the application must
wait. The wait occurs because the hard drive is significantly slower than physical RAM. This
paging also slows other system activities such as CPU and disk operations. As I mentioned
earlier, problems caused by lack of memory often appear to be problems in other parts of the
system. To maximize the performance and availability of your domain controller servers, it’s
important for you to understand and try to reduce or eliminate wherever possible the
performance overhead associated with paging operations.
Fortunately, there are a couple of utilities that you can use to track memory usage. Two of the
most common are utilities I’ve already introduced: Task Manager and the Performance console.

74
Chapter 3

Using Task Manager to View Memory on a Domain Controller


You can use Task Manager to view memory usage on a domain controller. To do so, click the
Performance tab. Figure 3.7 shows an example of the Performance page in Task Manager
running on WS2K3.

Figure 3.7: The Performance page of WS2K3’s Task Manager allows you to view a domain controller’s
memory usage.

75
Chapter 3

The Performance page in Task Manager contains eight informational panes. The first two are
CPU Usage and CPU Usage History. These two panes and the Totals pane all deal with usage on
the CPU, or processor. The remaining panes can be used to analyze the memory usage for the
domain controller and include the following:
• PF Usage—A bar graph that shows the amount of paging your domain controller is
currently using. This pane is one of the most useful because it can indicate when VMM is
paging memory too often and thrashing. Thrashing occurs when the OS spends more time
managing virtual memory than it does executing application code. If this situation arises,
you need to increase the amount of memory on the system to improve performance.
• Page File Usage History—A line graph that tracks the size of virtual memory over time.
The history for this pane is only displayed in the line graph and not recorded anywhere.
You can use this information to help determine whether there is a problem with virtual
memory over a longer period of time.

• Physical Memory—This pane tells you the total amount of RAM in kilobytes (KB) that
has been installed on your domain controller. This pane also shows the amount of
memory that is available for processes and the amount of memory used for system cache.
The amount of available memory will never go to zero because the OS will swap data to
the hard drive as the memory fills up. The system cache is the amount of memory used
for file cache on the domain controller.

• Commit Charge—This pane shows three numbers, which all deal with virtual memory on
the domain controller: Total, Limit, and Peak. The numbers are shown in kilobytes. Total
shows the current amount of virtual memory in use. Limit is the maximum possible size
of virtual memory. (This is also referred to as the paging limit.) Peak is the highest
amount of memory that has been used since the domain controller was started.

• Kernel Memory—Shows you the total amount of paged and non-paged memory, in
kilobytes, used by the kernel of the OS. The kernel provides core OS services such as
memory management and task scheduling.
I mentioned that you can easily and quickly check the memory usage on your domain controller
by using Task Manager. Task Manager allows you to see the amount of virtual memory in use.

Using the Performance Console to Monitor Memory on a Domain Controller


In addition to using Task Manager, you can use the Performance console to determine whether
the current amount of memory on a domain controller is sufficient. The System Monitor
application in the Performance console allows you to graphically display memory counters over
time. I also recommend that you display the memory cache counters.

76
Chapter 3

Available Memory Counters


To determine whether there is a bottleneck in memory, you need to check three memory
counters:
• Available Bytes (under the Memory object in System Monitor)
• Available Kbytes (kilobytes or KB)
• Available Mbytes (megabytes or MB)
You can use any of these three counters to understand your domain controller’s memory
commitment. I recommend that you reserve at least 20 percent of available memory for peak use.
To view one or all of the available memory counters, either click the Plus (+) tool on the toolbar
or right-click anywhere in the display area and choose Add Counters from the shortcut menu.
Once the Add Counters dialog box appears, choose Performance Object, Memory, then choose
one of the available memory counters. Figure 3.8 shows the Available Bytes counter of the
memory Performance Object.

Figure 3.8: Using the Available Bytes memory counter to monitor or track how much memory is left for users
or applications.

77
Chapter 3

The Available Bytes counter shows the amount of physical memory available to processes
running on the domain controller. This counter displays the last observed value only; it isn’t an
average. It’s calculated by summing space on three memory lists:
• Free—Memory that is ready or available for use
• Zeroed—Pages of memory filled with zeros to prevent later processes from seeing data
used by a previous process
• Standby—Memory removed from the working set of a process and en route to disk but
still available to be recalled
If Available Bytes is constantly decreasing over a period of time and no new applications are
loaded, it indicates that the amount of working memory is growing, or it could signal a memory
leak in one or more of the running applications. A memory leak is a situation in which
applications or processes consume memory but don’t release it properly. To determine the
culprit, monitor each application or process individually to see whether the amount of memory it
uses constantly increases. Whichever application or process constantly increases memory
without decreasing it is probably the culprit.

Page-Fault Counters
When a process or thread requests data on a page in memory that is no longer there, a domain
controller issues a page fault. Here, the page has typically been moved out of memory to provide
memory for other processes. If the requested page is in another part of memory, the page fault is
a soft page fault. However, if the page has to be retrieved from disk, a hard page fault has
occurred. Most domain controllers can handle large numbers of soft page faults, but hard page
faults can cause significant delays.
Page-fault counters help you determine the impact of virtual memory and page faults on a
domain controller. These counters can be important performance indicators because they
measure how VMM handles memory:
• Page Faults/sec—Indicates the number of page faults without making a distinction
between soft page faults and hard page faults
• Page Reads/sec—Indicates the number of times the disk was read to resolve hard page
faults; this counter indicates the impact of hard page faults
• Pages Input/sec—Indicates the number of pages read from disk to resolve hard page
faults; this counter also indicates the impact of hard page faults

78
Chapter 3

Figure 3.9 illustrates how you can use System Monitor to track page-fault counters.

Figure 3.9: The Page Faults/sec, Page Reads/sec, and Pages Input/sec counters determine the impact of
virtual memory and paging.

If the numbers recorded by these counters are low, your domain controller is responding quickly
to memory requests. However, if the numbers are high and remain consistently high, it’s time to
add more RAM to the domain controller.

Paging File Usage


Another important set of counters helps you determine the size of virtual memory. These
counters are related to paging file usage. Before I discuss how you can effectively use these
counters, it’s important that you better understand the paging file and its function. The paging
file is the space on a domain controller that enables the OS to swap out memory to the hard
drive. As the domain controller loads more applications than it can run in actual memory, it
pages some memory to the hard drive to create room for the new applications.
You can see how much the paging file is being used by watching two counters under the Paging
File object:
• % Usage—Indicates the current usage value that was last recorded
• % Usage Peak—Indicates the high watermark for the paging file

79
Chapter 3

If a domain controller was perfect, the OS would have enough memory for every application that
was loaded and would never page memory out. Both the % Usage counter and the % Usage Peak
counter would be at zero. The opposite is that the domain controller is paging memory as fast as
possible, and the usage counters are high. An example of a bad situation is one in which your
domain controller has 128MB of memory, the % Usage Peak counter is at 80 percent, and the %
Usage counter is above 70 percent. In this situation, it’s fairly certain that your domain controller
will be performing poorly.
By default, Windows automatically creates a paging file on the system drive during installation.
Windows bases the size of the paging file on the amount of physical memory present on the
domain controller (in most cases, it’s between 768MB and 1536MB). In addition to this paging
file, I recommend that you create a paging file on each logical drive in the domain controller. In
fact, I recommend that you stripe the paging file across multiple physical hard drives, if possible.
Striping the paging file improves performance of both the file and virtual memory because
simultaneous disk access can occur on multiple drives simultaneously.

The recommendation for using disk striping on the paging file works best with Small Computer
System Interface (SCSI) drives rather than those based on Integrated Device Electronics (IDE)
interfaces. The reason is that SCSI handles multiple device contention more efficiently than IDE and
tends to use less CPU power in the process. Also, I don’t recommend that you spread the paging file
across multiple logical drive volumes (partitions) located on the same physical drive. Doing so won’t
generally aid paging file performance—and it may actually hinder it.

To change or set the virtual memory setting on your domain controller, right-click My Computer,
then choose Properties from the shortcut menu. In the System Properties dialog box, click the
Advanced tab, then click Performance Options. Notice that the Performance Options dialog box
allows you to see the current setting for Virtual Memory. Next, click Change to display more
information and to change the paging file settings.

Changing the paging file size or location is unfortunately one of those rare setting changes in WS2K3
that requires you to restart the domain controller before the change takes effect. Thus, if you decide
to change any settings for the paging file, do so during a scheduled maintenance time when it’s safe
to take the domain controller down and doing so won’t affect your users.

System Cache
In addition to tracking the amount of memory and virtual memory in the domain controller, you
need to keep an eye on the computer’s system cache settings. The system cache is an area in
memory dedicated to files and applications that have been accessed on the domain controller.
The system cache is also used to speed both file system and network input/output (I/O). For
example, when a user program requests a page of a file or application, the domain controller first
looks to see whether it’s in memory (system cache). The reason is that a page in cache responds
more quickly to user requests. If the requested information isn’t in cache, the OS fulfills the user
request by reading the file page from disk.

80
Chapter 3

If the system cache isn’t large enough, bottlenecks will occur on your domain controller. The
Cache object in System Monitor and its counters help you understand caching in Win2K. In
addition, several counters under the Memory object help you determine the amount of file cache.
Two of the counters that best illustrate how the file cache is responding to requests are:
• Copy Read Hits %—This counter is under the Cache object and tracks the percentage of
cache-copy read requests that are satisfied by the cache. The requests don’t require a disk
read to give the application access to the page. A copy read is a file-read operation that is
satisfied by a memory copy from a page in the cache to the application’s buffer.
• Cache Faults/sec—This counter is under the Memory object and tracks the number of
faults that occur when a page sought in the system cache isn’t found. The page must be
retrieved from elsewhere in memory (a soft fault) or from the hard drive (a hard fault).

When you’re considering using the Copy Read Hits % counter to assess file-cache performance, you
might also consider tracking the Copy Reads/sec counter, which measures the total number of Copy
Read operations per second. By assessing these numbers together, you’ll have a better sense of the
significance of the data provided by the Copy Reads Hits % counter. For example, if this counter were
to spike momentarily without a corresponding jump (or perhaps even a decrease) in the number for
overall Copy Reads/sec, the data might not mean much. Ideally, you can identify a cache bottleneck
when there is a steady decrease in the Copy Read Hits % counter with a relatively flat Copy
Reads/sec figure. A steady increase in both counters, or an increase in Copy Read Hits % and a
relatively flat Copy Reads/sec, indicates good file cache performance.

Thus, the Copy Read Hits % counter records the percentage of successful file-system cache hits,
and the Cache Faults/sec counter tracks the number of file-system cache misses. Figure 3.10
shows these counters in System Monitor. Remember that one of the counters is a percentage and
the other is a raw number, so they won’t exactly mirror each other.

81
Chapter 3

Figure 3:10: The Copy Read Hits % and the Cache Faults/sec counters show how the domain controller’s
cache is responding.

Generally speaking, I recommend that a domain controller have at least an 80 percent cache hit
rate over time. If these two counters show that your domain controller has a low percentage of
cache hits and a high number of cache faults (misses), you may want to increase the total amount
of RAM. Increasing the RAM allows the domain controller to allocate more memory for system
cache and should increase the cache hit rate.

82
Chapter 3

Monitoring Processors and Threads


Many people mistakenly believe that the focus of monitoring a domain controller is primarily the
domain controller’s physical processor(s), or CPUs. However, the truth is that the processor
doesn’t do anything unless there are processes and threads to run. In Windows, a process is made
up of one or more threads that run on the CPU. A bottleneck at the processor typically means
that either one or more processes are consuming most of the processor time or there are too many
threads contending for the CPU.
A process is an executable program that follows a sequence of steps. Each process requires a
cycle from the domain controller’s processor as it runs. A thread is the component of a process
that is being executed at any time. Thus, a process must contain at least one thread before it can
perform an operation. A single process executing more than one thread is referred to as being
multithreaded. Windows is a multithreaded OS that is capable of running multiple processor
threads simultaneously—even, when they’re present, across multiple CPUs.
When an application is developed, the developer determines the number of threads each process
will use. In a single-threaded process, only one thread is executed at one time. In a multithreaded
process, more than one thread can be executed concurrently. Being multithreaded allows a
process to accomplish many tasks at the same time and avoid unnecessary delay caused by
thread wait time. To change threads, the OS uses a process called context switching, which
interrupts one thread, saves its information, then loads and runs another thread.
In addition to the multithreaded and multitasking approach to handling processes and threads,
Windows allows priorities to be assigned to each process and thread. The kernel of the Windows
OS controls access to the processor using priority levels.

Using Process Viewer to Monitor Processes and Threads


Every domain controller comes with the Process Viewer utility (Pviewer.exe). This utility is part
of the Win2K Support Tools, which is located in the Support folder on the Windows CD-ROM.
Process Viewer is a useful tool for looking at the various processes and associated threads
currently running on your domain controller. To launch Process Viewer on your computer,
choose Start, Programs, Accessories, Command Prompt, then type pviewer. Figure 3.11 shows
an example of Process Viewer.

83
Chapter 3

Figure 3.11: The Process Viewer utility allows you to view the processes and threads running on your
domain controller.

Using this utility, you can view the name of each process, the amount of time each process has
been running, the memory allocated to each process, and the priority of each process. You can
also view each thread that makes up a selected process. For each thread, you can see how long it
has been running, its priority, context switches, and starting memory address.
In addition to the information you see on the main screen, you can display the memory details for
a process. Figure 3.12 illustrates the Memory Details dialog box that is shown when you select a
process and then click Memory Detail.

84
Chapter 3

Figure 3.12: Memory details for each process are displayed by clicking Memory Detail in Process Viewer’s
main window.

When using Process Viewer, you can stop or kill a process that is running on a domain controller by
selecting it and clicking Kill Process. However, be sure you understand the function and impact of
killing a process before doing so—the process might be vital to your domain controller’s functionality.
Worse yet, by killing a process, you can irrecoverably lose or corrupt the data.

Using Task Manager to View Processes on a Domain Controller


Earlier, this chapter described a method of using Task Manager as the quickest and easiest
method of monitoring the performance of the CPU. You can also use Task Manager to see which
processes or threads are running on the Windows domain controller and to view a summary of
overall processor or CPU usage. To launch Task Manager, either right-click the taskbar and
choose Task Manager or press Ctrl+Alt+Del, then select Task List from the menu. Then click the
Processes tab. A list is displayed of the processes currently running on the domain controller.
Figure 3.13 shows an example of the processes running on Windows and displayed in Task
Manager.

85
Chapter 3

Figure 3.13: Windows’ Task Manager allows you to view and manage processes that are currently running on
the system.

This view provides a list of the processes that are running—their names, their process identifiers
(PIDs), the percentage of CPU processing they’re consuming, the amount of CPU time they’re
using, and the amount of memory they’re using. Notice the System Idle Process, which always
seems to be toward the top of the process list. This process is a special process that runs when the
domain controller isn’t doing anything else. You can use the System Idle Process to determine
how the CPU is loaded because it’s the exact opposite of the CPU Usage value on the
Performance tab. For example, if the CPU Usage value is 5, the System Idle Process value will
be 95. A high value for the System Idle Process means that the domain controller isn’t heavily
loaded, at least at the moment you checked.

86
Chapter 3

Working with the List of Processes


You can sort this list of processes according to one of the column labels mentioned earlier. For
example, if you want to view the processes in order of the amount of memory used, simply click
that column’s label. The display or list will change accordingly.
Task Manager also allows you to customize the columns, thereby receiving additional
information about the processes and being able to assess as many as 23 parameters. To customize
the columns, on the Processes page, choose View, Select Columns. As Figure 3.14 shows, notice
that many additional columns of information can be displayed. These additional columns will
help you monitor and tune each process more completely. For more information about each of
the additional columns, refer to the Help menu in Task Manager.

Figure 3.14: The Select Columns dialog box in Task Manager allows you to monitor additional important
statistics about the processes that are running on your domain controller.

In addition, you can see which of these processes belong to an application. To do so, click the
Applications tab, right-click one of the applications on the Applications page, then click Go To
Process. Doing so will take you to the associated application’s process on the Process tab. This
feature helps you associate applications with their processes.

Highlighting a process in Task Manager, then clicking End Process, stops that process from running.
This feature is useful because it allows you to stop processes that don’t provide any other means of
being stopped. However, use this method only as a last resort because the process stops
immediately and doesn’t have a chance to clean up its resources. Using this method to stop
processes may leave domain controller resources unusable until you restart. It may also cause data
to be lost or corrupted.

87
Chapter 3

Viewing Information About Processes


If you view the list of processes from either Process Viewer or Task Manager and don’t know
what a process is, you can use the Computer Management utility to view the processes and their
associated file paths or locations. Computer Management also lets you view the version, size,
and date of the application or module for each process.
To view the Computer Management utility, choose Start, Programs, Administrative Tools,
Computer Management. Once the utility has loaded, select System Tools, System Information,
Software Environment, Running Tasks. Figure 3.15 displays processes and their associated
information in the Computer Management utility.

Figure 3.15: The Computer Management utility allows you to view the processes that are running on your
domain controller as well as the path and file name information associated with each process.

Using the Performance Console to View Processes on a Domain Controller


You can use the System Monitor application in the Performance console to view the values of
the performance counters for the processor, processes, and threads. This utility allows you to
graphically display these counters over time. In the next few sections, I’ll discuss some of the
counters and how you can use them.

88
Chapter 3

% Processor Time Counter


The first counter that you should check when monitoring the domain controller is % Processor
Time. This counter gauges the activity of the computer’s CPU. It shows the percentage of time
that all processors in the domain controller are busy executing code other than the System Idle
Process. Acceptable processor activity ranges between 1 percent and 85 percent, although the
actual amount depends on the type of applications loaded on your domain controller.
The % Processor Time counter is a primary indicator of processor activity. This counter is
calculated by measuring the time that the processor spends executing the idle process, then
subtracting that value from 100 percent. It can be viewed as the percentage of useful work that
the processor executes. To view the % Processor Time counter, you use System Monitor. Figure
3.16 shows the % Processor Time counter in System Monitor.

Figure 3.16: The % Processor Time counter give you the ability to view the amount of time that the processor
is doing real work.

If the % Processor Time counter is consistently high, there may be a bottleneck on the CPU. I
recommend that this counter consistently stay below 85 percent. If it pushes above that value,
you need to find the process that is using a high percentage of the processor. If there is no
obvious CPU “hog,” consider adding another processor to the domain controller or reducing that
domain controller’s workload. Reducing the workload might involve stopping services, moving
databases, removing directory services, and so on.

89
Chapter 3

Interrupts/sec Counter
The Interrupts/sec counter measures the rate of service requests from the domain controller’s I/O
devices. This counter is the average number of hardware interrupts that the processor is receiving
and servicing each second. If this value increases without an associated increase in system
response, there could be hardware problems on one of the I/O devices. For example, a network
interface card (NIC) installed in the domain controller could go bad and cause an excessive
amount of hardware interrupts. To fix the problem, you need to replace the offending network
card’s driver or the physical card.
The Interrupts/sec counter doesn’t include deferred procedure calls; they’re counted separately.
Instead, this counter tracks the activity of hardware devices that generate interrupts, such as the
system clock, mouse, keyboard, disk drivers, NICs, and other peripheral devices. (For example,
the system clock interrupts the CPU every 10 milliseconds—ms.) When an interrupt occurs, it
suspends the normal thread execution until the CPU has serviced the interrupt.
During normal operation of the domain controller, there will be hundreds or thousands of
interrupts per second. System Monitor displays the counter as a percentage of the real number.
Thus, if the domain controller has 560 interrupts in one second, the value is shown as 5.6 on the
graph. Figure 3.17 displays the Interrupts/sec counter using System Monitor.

Figure 3.17: The Interrupts/sec counter allows you to view the impact the hardware I/O devices have on the
performance of the domain controller.

90
Chapter 3

In System Monitor, you can make changes to the graphic display. To do so, right-click anywhere on
the graph, then choose Properties from the shortcut menu. The System Monitor Properties dialog box
appears, containing several tabs to change the display and effect the graph and data. For example, if
you want to change the graph’s scale, click the Graph tab and change the Vertical Scale parameters.
To confirm the change, click Apply, then OK.

Unfortunately, it’s difficult to suggest a definite threshold for this counter because this number
depends on the particular processor type in use and the exact role and use of the domain
controller. I therefore recommend that you establish your own baseline for this counter and use it
as a comparison over time. Doing so will help you know when a hardware problem occurs.

Processor Queue Length Counter


The Processor Queue Length counter under the System object displays the number of processes
or threads waiting to be executed in the run queue that is shared among all processors on the
server. This counter displays the last observed value only; it’s not an average. If there are too
many threads waiting for the CPU and the CPU cannot keep up, the system is processor-bound
and starts to slow. Figure 3.18 illustrates how System Monitor shows Processor Queue Length.

Figure 3.18: The Processor Queue Length counter indicates how congested the processor is.

91
Chapter 3

I recommend that your domain controller not have a sustained Processor Queue Length of
greater than two threads. If the number of threads goes above two, performance slows, as does
responsiveness to the users. The domain controller shown in the figure could be in trouble,
especially if this type of activity is sustained. There are several ways to alleviate the domain
controller slow. You can replace the CPU with a faster processor, add more processors, and
reduce the workload. In some situations, the Processor Queue Length counter will increase if the
system is paging heavily; adding memory or RAM could be the solution in this case. To
determine whether you need more RAM, monitor the paging counters.

Monitoring the Disk


Because the hard drives in the domain controller have moving parts, they’re always the slowest
subsystem in the computer. In fact, the hard drive subsystem will typically be more than 100,000
times slower than the memory subsystem. As a result, the architects of Windows designed the
file-system caching service. Its sole responsibility is to move data off the hard drives and into the
faster memory subsystem. This process minimizes the performance penalty of retrieving data
from the domain controller.
By nature, a hard drive is massive and cheap. The disk subsystem can contain hundreds of
gigabytes that store millions or billions of files. In turn, memory is relatively small and
expensive. Therefore, the architects of Windows designed the virtual memory system to store
pieces of memory on the hard drive, thereby allowing more room for users and applications.
However, as discussed earlier, you pay a performance price for paging.

Using the Performance Console to Monitor the Disk Subsystem


Because system performance depends so heavily on the disk subsystem, it’s important that you
understand how to monitor it. To properly monitor the disk subsystem, you need to monitor disk
usage and response time, which includes the number of actual reads and writes plus the speed
with which the disk accomplishes each request. The primary utility you use to monitor these
attributes is System Monitor in the Performance console. Using System Monitor, you can view
key counters that apply to physical device usage and the logical volumes on the drives.

% Disk Time and % Idle Time Counters


The % Disk Time counter under the PhysicalDisk object allows you to view how busy the
domain controller’s hard drive is. This counter is a percentage of elapsed time that the hard drive
is busy servicing read and write requests. The % Idle Time counter under the PhysicalDisk object
reports the percentage of time the hard drive is sitting idle.
Using these counters, you can monitor the physical activity of the hard drives in each computer.
Figure 3.19 illustrates the % Disk Time and % Idle Time counters in System Monitor.

92
Chapter 3

Figure 3.19: The % Disk Time counter allows you to view how busy a physical disk drive is, and the % Idle
Time counter tracks the percentage of time a drive is idle.

Figure 3.19 shows that, as you might expect, % Disk Time and % Idle Time basically mirror
each other. I recommend that if the value for % Disk Time is consistently above 70 percent, you
consider reorganizing the domain controller to reduce the load. However, if the domain
controller is a database server, the threshold can go as high as 90 percent. The threshold value
depends on the type of server that has been implemented and what has caused the disk I/O. For
example, if VMM is paging heavily, it can drive up the % Disk Time counter. The simplest
solution is to add memory.

Disk Reads/sec and Disk Writes/sec Counters


In addition to the percentage of time the disk is busy, you can also see what the disk is doing.
You can monitor this activity by using two counters under the PhysicalDisk object in System
Monitor:
• Disk Reads/sec counter—Tracks the rate of read operations on the disk
• Disk Writes/sec counter—Tracks the rate of write operations on the disk
Normally, a domain controller will perform twice as many (if not more) read operations than
write operations; it can also service a read request at least twice as fast. The reason is that the
write request has to write the data, then verify that it was written. You can see the Disk
Reads/sec and Disk Writes/sec counters in System Monitor in Figure 3.20.

93
Chapter 3

Figure 3.20: The Disk Reads/sec and the Disk Writes/sec counters show how the domain controller is
handling disk requests.

Using these counters, watch for spikes in the number of disk reads when your domain controller
is busy. If you have the appropriate amount of memory on your domain controller, most read
requests will be serviced from the system cache instead of hitting the disk drive and causing disk
reads. You want at least an 80 percent cache hit rate, which means that only 20 percent of read
requests are forced to the disk. This recommendation is valid unless you have an application that
reads a lot of varying data at the same time—for example, a database server is by nature disk-
intensive and reads varying data. Obtaining a high number of cache hits with a database server
may not be possible.

Current Disk Queue Length Counter


The Current Disk Queue Length counter represents the number of requests outstanding on the
disk at any one time. The disk has a queue, or list, that can hold the read and write requests in
order until they can be serviced by the physical device. This counter shows the number of
requests in service at the time the sample is taken. Most disk devices installed on your domain
controller are single-spindle disk drives. However, disk devices with multiple spindles, such as
some Redundant Array of Independent Disks (RAID) disk systems, can have multiple reads and
writes active at one time. Thus, a multiple-spindle disk drive can handle twice the rate of
requests of a normal device. Figure 3.21 displays System Monitor tracking the length of the disk
queue.

94
Chapter 3

Figure 3.21: The Current Disk Queue Length counter represents the number of outstanding read and write
requests. Using this counter, you can monitor the performance of the queue for the disk drives.

If the disk drive is under a sustained load, this counter will likely be consistently high. In this
case, the read and write requests will experience delays proportional to the length of this queue
divided by the number of spindles on the disks. For decent performance, I recommend that the
value of the counter average less than 2.

Because gathering disk counters can cause a modest increase in disk-access time, WS2K3 doesn’t
automatically activate all the disk counters when it starts up. By default, the physical disk counters are
on, and the logical disk counters are off. The physical disk counters monitor the disk driver and how it
relates to the physical device. The logical disk counters monitor the information for the partitions and
volumes that have been established on the physical disk drives.
To start the domain controller with the logical disk counters on, you use the DISKPERF utility. At the
command prompt, type DISKPERF –YV. This sets the domain controller to gather counters for both
the logical disk devices and the physical devices the next time the system is started. For more
information about using the DISKPERF utility, type DISKPERF /? at the command prompt.

% Free Space Counter


An example of a logical disk counter is the % Free Space counter. This counter is the percentage
of the free space available on the logical disk or volume. Free space is calculated as a ratio of the
total useable space provided on the volume of the logical disk drive. This counter is obviously an
important one to monitor because it allows you to view the amount of disk space that is left for
user and application requests.

95
Chapter 3

This counter allows you to monitor the performance of the disk drives as they start to fill. This
task is important because as a disk drive starts to run out of space, each write request becomes
tougher to perform and slows overall disk performance. The reason is that as the drive fills, each
write takes longer to search for space. The longer it takes the disk to write the data, the less it
does, so performance slows. Thus, as the drive fills, it works harder to service requests; this is
often called thrashing. To minimize thrashing, leave at least 10 percent of the disk free.

Monitoring the Network


Each Windows domain controller depends on the network to move information to its users and to
other servers. However, if the network becomes too crowded and traffic exceeds capacity,
performance for all users and domain controllers will suffer. You need to monitor the network
components for each domain controller on your network to help eliminate bottlenecks.
Monitoring the network typically consists of observing usage on network components and
measuring the amount of traffic on the network.

Using Task Manager to Watch Network Traffic


In WS2K3, Task Manager also provides a quick and easy look at network utilization, as Figure
3.22 shows. This view allows you to quickly see current activity in a System Monitor-like view
without opening and configuring System Monitor.

Figure 3.22: Task Manager provides a quick look at network utilization for each installed network adapter.

96
Chapter 3

Using Network Monitor to Watch Network Traffic


Network Monitor (Netmon.exe) allows you to analyze in-depth, low-level network traffic and
enables you to detect and analyze problems on your local networks and WAN connections.
Network Monitor captures and displays the network packets that the WS2K3 domain controller
receives from users and other servers to provide real-time traffic monitoring and analysis. You
can also display the traffic in a post-capture mode to help you analyze it after the fact.
In real-time mode, Network Monitor allows you to monitor and test network traffic for a specific
set of conditions. If the conditions are detected, it displays the events and prompts you for the
appropriate action. In post-capture analysis, network traffic is saved in a proprietary capture file
and can be parsed by protocol to pick out specific network frame types. Network Monitor does
the following:
• Captures network data in real-time or delayed mode
• Provides filtering capabilities when capturing network packets
• Uses parsers for detailed post-capture analysis

Using the Performance Console to Monitor Network Components on a Domain


Controller
You can use System Monitor in the Performance console to monitor the domain controller’s
network performance. Specific performance counters allow you to watch the computer’s network
throughput and network interfaces.

Domain Controller Network Throughput


The easiest way to measure domain controller throughput and the bandwidth of each network
component is to determine the rate at which the computer sends and receives network data.
Several performance counters under the Server object in System Monitor can help you measure
the data transmitted through your domain controller’s network components. These counters
represent all the network traffic sent to and received from the domain controller and all the NICs
installed on it:
• Bytes Total/sec—This counter shows the number of bytes the domain controller has sent
and received from the network each second. This value provides an overall indication of
how busy the domain controller is, servicing network requests. It can help you determine
whether any network components are creating bottlenecks for network traffic.
• Bytes Transmitted/sec—This counter shows the number of bytes that the domain
controller has sent on the network. It indicates the network traffic that has been sent.
• Bytes Received/sec—This counter shows the number of bytes that the domain controller
has received from the network. It indicates the network traffic that has been received.

97
Chapter 3

The advantage of the last two counters is that they break out the values for traffic sent and
received. I recommend that once you’ve monitored these counters, you compare the results with
your domain controller’s total network throughput. To do so, establish a baseline of data rates
and averages. Establishing a baseline allows you to know what to expect from the domain
controller. If a potential problem or bottleneck in network throughput occurs, you can recognize
it immediately because you can compare it against the baseline you’ve established.
You can also make some estimates as to where a bottleneck exists if you know the network and
bus speeds of the domain controller. If the data rate through the card is approaching the network
limit, segmenting and adding a card may help. If the aggregate data rate is approaching the bus
speed, it may be time to split the load for the domain controller and add another one or go to
clustering.

Network Interface Throughput


If you want to break down the amount of traffic to each individual network adapter or interface
card, you’ll want to use the Network Interface object in System Monitor. The counters that
display the amount of traffic processed by each NIC are Bytes Total/sec, Bytes Sent/sec, and
Bytes Received/sec. The counters in the previous section have similar names, but they display
the amount of traffic for the entire domain controller, regardless of the actual number of interface
cards installed. Using the counters assigned to each network adapter allows you to drill down and
see how each performs individually:
• Bytes Total/sec—This counter shows the number of bytes the NIC has sent and received
from the network each second. This value measures the rate at which bytes are both sent
and received on the NIC, including all frame and media types. This value also provides
an overall indication of how busy the network adapter is.
• Bytes Sent/sec—This counter shows the rate at which bytes are sent on the network
interface. This value breaks down the amount of traffic being sent.
• Bytes Received/sec—This counter shows the rate at which bytes are received. This value
breaks down the amount of traffic being received.
Figure 3.23 illustrates how you can use the Bytes Total/sec, Bytes Sent/sec, and Bytes
Received/sec counters in System Monitor to monitor the domain controller’s network adapter.

98
Chapter 3

Figure 3.23: The Bytes Total/sec, Bytes Sent/sec, and Bytes Received/sec counters allow you to monitor the
domain controller’s network adapter.

All That Monitoring…So Little Information


All of these great tools—Task Manager, System Monitor, Network Monitor, and so forth—aren’t
going to provide you with everything you need to troubleshoot AD. Certainly, these tools
provide valuable data, but as I’ve already said, data is just numbers. What these tools don’t do a
good job of providing is information. Simply put, information is data that has been put into some
kind of useful context, translated into terms you can understand, and presented in a meaningful
way. Another problem with these tools is that they’re completely separate—there’s no one place
you can go to for all of this data, and because AD is such a complex, highly integrated system,
you’ll find yourself jumping between tools trying to correlate everything you’re looking at.
MOM offers one solution for this problem. MOM does two important things: It aggregates all of
this data into one place so that you can see it on one screen. It also does some health monitoring;
it knows what Microsoft thinks constitutes a healthy server, and it matches your servers’
performance data to those known health thresholds. Doing so places context on the performance
data, making it into information. Rather than looking at a screen telling you that you have four
hard page faults per second, you’re looking at a screen that tells you that your server is
overworked and needs more memory installed. Information, then, is much more useful and
actionable than mere data.

99
Chapter 3

Other companies have taken a somewhat different approach. For example, NetPro offers
DirectoryTroubleshooter. Although this tool starts by aggregating an entire domain of
performance data into one place and goes one step further by placing some basic health
information on it, DirectoryTroubleshooter also looks beyond real-time performance statistics
and goes into configuration items that you might not otherwise be exposed to. For example,
DirectoryTroubleshooter can let you know if certain domain configuration parameters are set
outside ranges typically used for best performance. The tool can also run a number of “best
practices” reports to highlight areas of your domain controllers that might be performing
acceptably but that could be reconfigured to run better or provide more server capacity.
DirectoryAnalyzer performs a related set of services, helping to filter through the general mass
of performance data and event log entries and create specific alerts that highlight what you need
to focus on to prevent and repair AD problems.
Other tools, such as Winternals’ Insight for Active Directory, come at troubleshooting from the
other direction. Insight provides a way to look at low-level AD activity, almost serving as a
“network monitor” for what’s going on inside of AD. Using Insight, you can see each and every
operation that AD performs in real-time. When troubleshooting complex problems such as
replication, for example, this ability is invaluable.
Make no mistake: You need to be looking at performance information, especially when
something goes wrong with AD. However, looking at performance information, rather than data,
is crucial. Third-party tools can often provide a level of aggregation and intelligence on top of
Windows’ built-in, data-driven tools to help you spot problems more quickly and solve them
more efficiently.

Auditing as a Troubleshooting Tool


Earlier, this chapter mentioned that auditing needs to gain more recognition for its value in
troubleshooting, above and beyond its value as a security tool. Being able to answer the
questions “What changed?” and “Who changed it?” are perhaps the most important first steps in
any troubleshooting exercise. Rather than spend hours staring at System Monitor graphs trying to
figure out what the problem is, you can use auditing to instantly tell that a fellow administrator
added 50 new site link bridges in AD—telling you right away why your domain controllers were
slowing.

Planning for Auditing


Auditing is an intensive activity. Windows’ auditing can be configured to a fairly granular level,
allowing you to, for example, configure auditing only for failed logons, successful domain
configuration attempts, and so forth. However, for troubleshooting purposes, you never know
what might cause a problem, so you pretty much need to turn on auditing full-blast on every
domain controller. Doing so is going to create quite a load on those machines; most sources will
recommend that you scale back and only capture those events that are vital to your operations. If
you’re planning to use Windows’ built-in auditing architecture, consider sizing your domain
controllers with the assumption that full-blast auditing will be in use. If this consideration means
buying more domain controllers or specifying bigger servers, so be it; the security implications
of not auditing are severe, and troubleshooting and operational issues are even more important.
With auditing, you can solve problems quickly and efficiently; without it, you’re basically
stumbling in the dark.

100
Chapter 3

Windows’ built-in auditing functionality logs events to the Windows event logs (specifically, the
Security Log, because security is what auditing is designed to address). Thus, all of the earlier-
mentioned caveats about the event logs hold true, particularly the inability to easily pull all of those
events into one place.

Other Auditing Techniques


Windows’ built-in auditing has, as I’ve mentioned, two problems: First, it’s resource-intensive,
meaning you’re less likely to use it. Second, the information is too scattered, meaning you’ll
have to connect to too many event logs in order to find the information you want. Filtering and
searching through those logs can be a nightmare. Products such as Insight for Active Directory
provide a different sort of real-time auditing by tapping into AD’s own event processing queue,
grabbing activity as it occurs. This method is more resource-friendly, but Insight doesn’t provide
long-term logging and analysis capabilities; it’s designed for real-time viewing.
Microsoft’s ACS aggregates log data (security log data, at least) and provides filtering and
searching capability, but it still relies on the Windows event log architecture. Thus, each domain
controller will have to be in full-on auditing mode at all times to be sure you’re capturing
everything.
NetPro ChangeAuditor for Active Directory basically combines the Insight and ACS approaches.
The tool doesn’t rely on the event log architecture, and instead taps into AD’s own programming
interfaces to retrieve auditing information. This setup makes it a more resource-friendly solution.
It also includes long-term logging capabilities as well as built-in, intelligent reports, which can,
for example, quickly filter through the mass of event log entries you’ve acquired and display all
those relating to administrative group administration. ChangeAuditor also captures “before and
after” information for many events as well as “whodunit” information. This information can
allow you to quickly see what’s changed in AD, who changed it, what it was changed from, and
what it was changed to—everything you need to quickly spot the source of a problem and correct
it rather than spending hours or days working on the symptoms.

The whole concept of change auditing—looking at what’s changed in a network’s configuration to


detect and prevent problems—is becoming more and more important to many companies. For more
information about change auditing and configuration management, check out The Definitive Guide to
Enterprise Network Configuration Management (Realtimepublishers.com) available from a link at
http://www.realtimepublishers.com.

Other tools exist to help answer the “Who changed what?” question. One is TripWire, a
configuration management tool that handles some basic AD configuration information. TripWire
is an excellent tool for determining who changes what on a single server; it’s not as capable
when it comes to the highly distributed AD. However, because many domain controller issues
result not from AD-specific changes but rather from per-server changes (someone removing a
memory module, for example), tools such as TripWire can be a valuable troubleshooting tool. It
can even alert you when key configuration elements change, helping you spot problems before
they actually occur. Similar tools, such as Configuresoft Enterprise Configuration Manager
(ECM), can also help you better manage the configuration on single servers and can even enforce
a desired configuration state on those machines, helping to ensure that problems don’t occur due
to misconfigurations.

101
Chapter 3

Summary
As a network administrator, a critical part of your job is making sure that each and every domain
controller hosting AD is functioning properly. To accomplish this task, you need to properly
monitor each of these Windows domain controllers, which, in turn, means watching over the
critical OS components and hardware subsystems.
To help you monitor a domain controller and its subsystems, Windows provides several utilities,
and this chapter discussed the most important ones: Task Manager, the Performance console, and
Event Viewer. Using these utilities and third-party tools, you can watch server resources and
subsystems in real time while they work to support the requests by users, applications, and other
servers.

102
Chapter 4

Chapter 4: Monitoring and Auditing Active Directory


Troubleshooting Active Directory (AD) and AD-based networks requires that you become
familiar with AD constructs and monitoring tools and techniques. Monitoring AD allows you to
determine whether problems are occurring in any part of the directory. However, it’s sometimes
difficult to accurately determine the cause of a problem because AD is distributed across domain
controllers and interacts with several external services and protocols, such as:
• Domain Name System (DNS) for name resolution
• Lightweight Directory Access Protocol (LDAP) for directory lookups
• Transmission Control Protocol/Internet Protocol (TCP/IP) for transport
AD also has a complex infrastructure that contains many components. To ensure the health of the
directory as a system, you must monitor all of these components. You also need to understand
AD’s internal processes, such as replication.
As I described in the previous chapter, auditing changes to AD can also be a useful
troubleshooting tool, as auditing event messages can help you more quickly determine what has
recently changed in your AD configuration. Problems are most often the result of a change of
some kind; thus, knowing what changed can point you toward a solution.
This chapter will describe which infrastructure components you need to continually monitor to
ensure AD availability as well as some of the built-in and third-party utilities that are available to
help you do so. It’s always a good idea to have a sound understanding of one’s tools before using
them, so I’ll start by introducing the tools in an effective monitoring tool set.

Using the Monitoring and Auditing Tools


You can use several tools to monitor the individual areas of AD and AD as a service or system.
These tools include built-in Windows utilities and support tools and resource kit utilities as well
as those available from third-party independent software vendors (ISVs). This chapter will give
you an overview of all of these utilities and describe how they can help you monitor the
directory. Rather than provide an exhaustive list of all the utilities built-in to Windows and
available on the market, this chapter will focus on the most useful of the built-in and third-party
tools.

Third-Party Tools
It’s a shame that Windows doesn’t come with every tool you could possibly need to troubleshoot
AD. However, Windows’ lacking offerings in this area offer third-party software vendors a rich
field to work with, and several vendors have created products designed to make troubleshooting
easier and more effective.

103
Chapter 4

DirectoryAnalyzer
DirectoryAnalyzer from NetPro was one of the first AD monitoring tools on the market, and it
performs real-time monitoring and alerting on all aspects of the AD infrastructure. Instead of
monitoring individual domain controllers, it monitors the directory as a whole. It does so by
monitoring all domain controllers and directory processes at once as a background process. If a
problem occurs at any level in the directory, DirectoryAnalyzer alerts, or notifies, users. If the
problem is critical, the tool’s integrated knowledge base contains descriptions and
troubleshooting methods that will help you solve it.
DirectoryAnalyzer monitors the individual structures and components of AD—replication,
domains, sites, Global Catalogs (GCs), operations master roles, and DNS (inasmuch as it relates
to AD). Each of these components is vital to the operation of AD. DirectoryAnalyzer can
monitor and alert based on specific conditions and problems in each of the individual structures.
The alerts are then recorded at the DirectoryAnalyzer client or console for viewing.
Alerts have two levels of severity—warning and critical. Warning alerts indicate that a
predetermined threshold has been met in one of the directory structures. Warning alerts help you
identify when and where problems may occur. Critical alerts indicate that a predetermined error
condition has been met. Critical alerts are problems that need your immediate attention; if you
ignore them, AD could lose functionality or the directory altogether.
By clicking Current Alerts under View Status in the left pane, you can display all of the alerts
with their associated type, time, and description. Figure 4.1 shows the Current Alerts screen in
DirectoryAnalyzer. The alerts have been recorded for the AD domain controllers, directory
structures, and directory processes.

Figure 4.1: DirectoryAnalyzer allows you to monitor the entire directory for problems.

104
Chapter 4

You can also send alerts to enterprise management systems using Simple Network Management
Protocol (SNMP). Doing so allows you to integrate DirectoryAnalyzer alerts with management
consoles such as Hewlett-Packard’s HP OpenView and Tivoli. Alerts can also be recorded in the
event logs of the Windows system and viewed using the Event Viewer utility.
DirectoryAnalyzer logs all alert activity to a history database. You can export the database and
analyze alert activity over time using a variety of formats, such as Microsoft Excel, Hypertext
Markup Language (HTML), Dynamic HTML (DHTML), and Rich Text Format (RTF). You can
also identify trends in the data, finding cycles or periods of high and low alert activity.

ChangeAuditor for Active Directory


Answering the questions who, what, when, and where, ChangeAuditor from NetPro starts out by
capturing AD’s own internal traffic into a log. Rather than interpreting that log and displaying it,
however, ChangeAuditor analyzes that information and builds a detailed log file of every change
that occurs within AD. You can’t view the AD traffic real-time; instead, you can run a series of
pre-created reports (or you can create your own reports) that display changes to various aspects
of AD. For example, you can run a report that displays changes made to built-in groups, quickly
showing you any membership changes to, say, the Domain Admins group. Figure 4.2 shows
ChangeAuditor in action, and as you can see, most of the time the tool is able to show you not
only what has changed but also what the original value was, who made the change, when the
change was made, and so forth.

Figure 4.2: ChangeAuditor displays recent changes to AD.

ChangeAuditor needs to be installed on each domain controller for maximum effectiveness, but
events are collected into a central database for reporting and analysis.

105
Chapter 4

Auditing for Troubleshooting


The previous chapter made a case for auditing as a troubleshooting tool. ChangeAuditor is quite definitely
an auditing tool: several of its bundled reports, for example, focus on considerations such as HIPAA
compliance, security issues, and so forth. This type of tool represents an important new way of thinking
about both auditing and troubleshooting.
For example, corporate auditing often relies too heavily on snapshot audits. In other words, an auditor
reviews the environment, as it looks today, for compliance with a set of corporate standards. The auditor
doesn’t usually care what the environment looked like yesterday, or what it will look like tomorrow; they’re
only interested in what it looks like right now. That’s shortsighted, because any administrator who knows
the audit is coming can whip their environment into shape in time, then set things back to more
comfortable settings after the auditor is gone. By capturing and storing changes in a database, however,
ChangeAuditor allows auditors to look back in time. Auditors can see what changes were made to critical
groups, security settings, and so forth, and effectively audit the environment as it was
Of course, for troubleshooting purposes, knowing what changes occurred and when they occurred is
invaluable. By quickly reviewing all changes in the past 24 hours (or however long you need), you can
quickly spot a change that is related to a current problem. By seeing both the “before” and “after” values
of that change (which ChangeAuditor is able to display in many circumstances), you can quickly undo a
change and return your environment to working order.

DirectoryTroubleshooter
NetPro’s DirectoryTroubleshooter is a kind of super-performance monitor with built-in
intelligence. It monitors literally hundreds of AD-related configuration settings, performance
values, and other aspects of AD, and reports to you on potential problem areas (see Figure 4.3).
This functionality allows you to quickly focus your troubleshooting efforts on areas with a
known problem rather than shooting blind and spending hours analyzing areas of AD that aren’t
having a problem.

106
Chapter 4

Figure 4.3: DirectoryTroubleshooter displays a great deal of AD configuration information in one window.

DirectoryTroubleshooter can also help fix problems. It includes a set of jobs, which can perform
tasks such as configure recovery options on a server, start AD defragmentation, troubleshoot the
File Replication System (FRS), and so forth. These jobs can be targeted to run on multiple
servers, helping automate troubleshooting and repair.

Insight for Active Directory


Insight for Active Directory from Winternals Software is a sort of low-level event viewer for
AD. Essentially, Insight captures AD’s LDAP communications for an entire domain, displaying
that information in a log, which you can analyze. As Figure 4.4 shows, Insight displays the log in
a color-coded format, helping you visually spot related information. A built-in translator of sorts
helps translate the somewhat-cryptic messages generated by AD into English-readable text.

107
Chapter 4

Figure 4.4: Insight for Active Directory displays real-time diagnostic information.

Pop-up “tool tips” display further explanations for messages, making AD’s otherwise difficult
LDAP traffic easier to translate. Using Insight, you can see exactly what AD is doing at any
given moment, and often spot problems in operations such as replication by interpreting AD’s
own internal traffic.

AppManager Suite
The AppManager Suite from NetIQ Corporation is a suite of management products that manages
and monitors the performance and availability of Windows. One of these management products
allows you to monitor the performance of AD. For example, AppManager verifies that
replication is occurring and is up-to-date for the directory by monitoring the highest Update
Sequence Number (USN) value for each domain controller. In addition, inbound and outbound
replication statistics are tracked, as are failed synchronization requests for the directory.

The USN is discussed in more detail later in this chapter.

108
Chapter 4

AppManager also allows you to monitor the number of directory authentications per second and
the cache hit rate of name resolution. Using this tool, you can monitor and track errors and
events for trust relationships. You can also log errors and events to enterprise management
systems using SNMP. Thus, SNMP traps are generated and routed to a configured network
manager.
In addition, you can use or run a set of prepackaged management reports that allow you to
further analyze current errors and events. You can also set up this utility to send email and pager
alerts when an event is detected.

Microsoft Operations Manager


Although not strictly a third-party product, Microsoft Operations Manager (MOM) is an
additional purchase from Microsoft (it is not a built-in tool). MOM is designed to provide health
monitoring services for Microsoft server products, including Windows and AD. The idea behind
MOM—much like AppManager—is to collect performance data and compare it with known
thresholds, translating raw performance data into more useful health information. For example,
knowing that your domain controllers’ processor utilization is at 70 percent may be interesting
data, but it’s not useful. Is 70 percent good or bad? MOM is designed to quickly compare that
data to a range of values known to represent good and bad server health conditions, and creates a
graphical view of services and server components that are operating at levels that may represent
a problem. MOM also checks several configuration parameters to help spot problems.

Built-In Tools
In this section, I’ll discuss System Monitor, Event Viewer, and REPADMIN. These tools are
included with Windows and provide basic monitoring for key aspects of the OS and AD.

System Monitor
For the domain controller in AD, one of the main monitoring utilities is System Monitor. This
utility allows you to watch the internal performance counters that relate to the directory on the
domain controller. The directory performance counters are software counters that the developers
of AD have programmed into the system.
Using System Monitor, you can monitor current directory activity for the domain controller.
Once you’ve installed AD on a server, several performance counters—for replication activity,
DNS, address book, LDAP, authentication, and the database itself—measure the performance of
the directory on that computer.
Chapter 3 discussed how to launch and use System Monitor, so there is no need to repeat that
information. Instead, I’ll focus on how to use some of the more important performance counters
that are available for AD. Remember, System Monitor tracks all of its counters in real time. For
this reason, it is a good practice to establish a baseline of normal operation that you can compare
the real-time values against. When adding AD counters to System Monitor, if you don’t
understand the meaning of any counter, highlight it, then click Explain. The Explain Text dialog
box appears and provides a description of the counter.
You can also graph the performance counters and set alerts against them. The alerts will appear
in the Event Viewer.

109
Chapter 4

Event Viewer
To view and analyze the events that have been generated by a Windows domain controller, you
can use the Event Viewer. This utility allows you to monitor the event logs generated by
Windows. By default, there are three event logs: the application log, the system log, and the
security log.

These three logs are described in detail in Chapter 3.

In addition, after you install AD, three more logs are created:
• Directory service log—Contains the events that are generated by AD on the domain
controller. You can use this log to monitor activity or investigate any directory problems.
By default, the directory records all critical error events.
• DNS server log—Contains the events generated by the DNS service installed on your
domain controller. For example, when the DNS service starts or stops, it writes a
corresponding event message to this log. More critical DNS events are also logged—for
example, if the service starts but cannot locate initializing data, such as zones or other
startup information stored in the domain controller’s registry or AD. The DNS log exists
only if the DNS service is running on the server. The DNS service typically runs on only
a few domain controllers in the forest.
• FRS log—Contains events generated by file replication on the domain controller. FRS is
a replication engine used to replicate files among different computers simultaneously. AD
uses this service to replicate Group Policy files among domain controllers.
Depending on how you configure your AD installation, you may have one or all of these logs on
your domain controller. Figure 4.5 shows the Event Viewer startup screen on a domain controller
AD with DNS has been installed.

110
Chapter 4

Figure 4.5: The Event Viewer startup screen lists additional event logs that have been created for AD.

Replication Diagnostics
The Replication Diagnostics tool is simply referred to as REPADMIN. It’s a command-line
utility that allows you to monitor and diagnose the replication process and topology in AD. It
also provides several switches that you can use to monitor specific areas of replication. For
example, you can force replication among domain controllers and view the status.
During normal replication, the Knowledge Consistency Checker (KCC) manages and builds the
replication topology for each naming context on the domain controller. The replication topology
is the set of domain controllers that share replication responsibility for the domain. REPADMIN
allows you to view the replication topology as seen by the domain controller. If needed, you can
use REPADMIN to manually create the replication topology, although doing so isn’t usually
beneficial or necessary because the replication topology is generated automatically by the KCC.
You can also view the domain controller’s replication partners, both inbound and outbound, and
some of the internal structures used during replication, such as the metadata and up-to-date
vectors.

111
Chapter 4

You can install the REPADMIN.EXE utility from the support tools folder on the Windows
installation CD-ROM. Running the SETUP program launches the Support Tools Setup wizard,
which installs this tool along with many other useful support tools to the Program Files\Support
Tools folder. Figure 4.6 shows the interface for REPADMIN (the Win2K version is shown; the
WS2K3 version works identically).

Figure 4.6: The REPADMIN utility allows you to view the replication process and topology.

The Replication Topology


It’s useful to remember how AD builds its replication topology. AD starts by having each domain controller
replicate with two partner domain controllers so that the domain controllers form a giant “ring,” replicating
changes in both directions around the ring. However, in a large domain, this topology would result in
significant replication latency.
For example, with a domain containing just a dozen domain controllers, a single change made on one
domain controller would have to take six “hops” in order to replicate to every other domain controller. For
that reason, AD is designed to shortcut the ring to prevent replication from taking more than three hops to
reach any given domain controller. These shortcuts create a third replication partner for specific domain
controllers, allowing replication to travel more quickly throughout the domain. Thus, no domain controller
should ever have more than three replication partners, which helps to minimize replication overhead on
any one domain controller.

112
Chapter 4

Monitoring the AD Infrastructure


An important aspect of any AD deployment is monitoring the environment and infrastructure.
The infrastructure of AD is the set of processes and data structures that the directory service uses
to function properly. By constantly monitoring the infrastructure, you can detect issues that arise
in the environment and correct them before they affect your users. For example, users will be
affected if there is an intermittent failure of a bridgehead server or if a Flexible Single Master
Operation (FSMO, pronounced “fizmo”) role-holding server goes down.
The first task in troubleshooting AD is to constantly monitor critical areas of the directory
deployment. I recommend that you continuously monitor at least the following directory
structures and components:
• Domain controllers—Servers that are critical to the proper operation of AD. If one
domain controller isn’t functioning properly, the directory and some users will lose
performance and possibly functionality. If the domain controller that is having problems
has also been assigned additional vital roles (such as being a DNS or GC server), the
directory may become unavailable to all users. Thus, it’s critical to monitor and track the
performance of all domain controllers on the network at all times.
• Domain partition—Partition that stores AD objects and attributes that represent users,
computers, printers, and applications. The domain partition is also used to accomplish a
number of management roles, which include administration and replication. You must
monitor the performance and availability of the domain partition so that the services it
supports are constantly available.
• GC partition—Specialized domain controllers whose availability is necessary for clients
to be able to log on to the network. Only a few domain controllers store a copy of the GC
partition, and they need to be monitored for the GC. The GC streamlines directory
searches because it contains all of the objects in the forest but only a few of their key
attributes.
• Operations masters—FSMO role holders are single-master domain controllers that
perform special roles for AD. It’s important that you monitor and track the performance
of each operations master so that the service it performs is maintained. If any operations
master stops functioning, its functionality is lost in the directory.
• Replication process and topology—Are critical to the operation of AD. If changes have
been made to a directory object on one domain controller, the replication process needs to
propagate the changes to all of the other domain controllers that have replicas of that
object. If replication isn’t functioning, different portions of the directory get out of sync.
This confuses users, and they lose access to directory resources.
For example, if an administrator has changed a Group Policy but the change hasn’t been
synchronized to all copies, users using the older copies may access the wrong
information. In addition, once the synchronization among directory replicas is lost, it’s
very difficult and time-consuming to get back. Thus, it’s critical to constantly monitor the
replication process and topology for problems.

113
Chapter 4

Monitoring the Domain Controllers


Because AD can be distributed across many domain controllers, you need to constantly monitor
individual domain controllers. If one domain controller isn’t functioning properly, the directory
and your users will lose performance and possibly functionality. If multiple domain controllers
aren’t functioning properly, the network can become unusable. For this reason, always check or
monitor that the domain controller’s hardware and subsystems are operating correctly. After
you’re confident that the hardware is performing well, you need to monitor the AD services
running on the domain controllers for errors and other problems.

For details about monitoring the hardware components of the domain controller, refer to Chapter 3.

Using DirectoryAnalyzer
Many third-party tools, such as those I discussed earlier, provide you with an easy way to
monitor all of the domain controllers in your forest from one management console. For example,
in DirectoryAnalyzer, click Browse Directory By Naming Context; the directory hierarchy is
displayed. If you expand the naming contexts, you see all of the associated domain controllers.
To see the alerts for just one domain controller, select a domain controller object, then click
Current Alerts. The alerts that are displayed have exceeded a warning or critical threshold and
show the severity, subject, associated type, time, and description. Figure 4.7 shows an example
of using DirectoryAnalyzer to view all alerts for each domain controller.

Figure 4.7: DirectoryAnalyzer allows you to monitor all the domain controllers in your forest for problems
and see the alerts that have been recorded for each domain controller.

114
Chapter 4

To see the alerts and other information for each domain controller, you can also use the Browse
Directory By Site option. It allows you to browse the directory layout according to sites and their
associated domain controllers. In addition, it permits you to view the status of each site and the site
links.

DirectoryAnalyzer is an extremely useful utility because it monitors all of the domain controllers
in the AD forest as a background process and allows you to periodically view the results. It also
monitors the most critical directory structures and processes—for example, the configuration and
activity for the domain partitions, GC partitions, FSMO roles, sites, DNS, the replication
process, and the replication topology.
In addition to viewing the alerts from the domain controllers, you can click any alert and see a
more detailed description of the problem. If you don’t understand the alert, you can double-click
it; the Alert Details dialog box will appear and provide more description, as Figure 4.8 shows.

Figure 4.8: DirectoryAnalyzer provides more information about an alert in the Alert Details dialog box.

Once you’ve been notified of the alert and viewed more information about it in the Alert Details
dialog box, you can use the integrated knowledge base to help resolve the problem. The
knowledge base provides you with a detailed explanation of the problem, helps you identify
possible causes, then helps you remedy or repair the problem. To access the knowledge base,
click More Info in the Alert Details dialog box or choose Help, Contents in the console. Figure
4.9 shows an example of the information available in the knowledge base.

115
Chapter 4

Figure 4.9: DirectoryAnalyzer’s in-depth knowledge base helps you find solutions to problems in AD.

Domain controllers are the workhorses of AD. They manage and store the domain information
and accept special functions and roles. For example, a domain controller can store a domain
partition, store a GC partition, and be assigned as a FSMO role owner. Domain controllers, in
turn, allow the directory to manage user interaction and authentication and oversee replication to
the other domain controllers in the forest.
In addition to displaying alerts for each domain controller, DirectoryAnalyzer displays detailed
configurations. For example, when you choose Browse Directory By Naming Context, you see
several icons for each domain controller. An icon that includes a globe indicates that the domain
controller stores a GC partition. When an icon displays small triangles, it indicates that the
domain controller is also providing the DNS service. An icon that displays both a globe and
small triangles indicates that the domain controller has both a GC and a DNS.

116
Chapter 4

If you select a domain controller and then click the DC Information tab, you can view detailed
information about how the domain controller is operating and handling the directory load. Figure
4.10 shows the DC Information pane in DirectoryAnalyzer.

Figure 4.10: You can view detailed information about a domain controller using the DC Information pane in
DirectoryAnalyzer.

DirectoryAnalyzer provides a high-level summary of how each domain and its associated
domain controllers are functioning. Click Browse Directory By Naming Context to see a high-
level status of all the domain controllers in a domain. To view the status for a particular domain,
select it, then click the DC Summary tab. Figure 4.11 shows the DC Summary pane, which uses
green, yellow, and red icons to indicate the status of each domain controller in a domain.

117
Chapter 4

Figure 4.11: The DC Summary pane in DirectoryAnalyzer provides a high-level status of all domain
controllers in a domain.

You can also quickly view where the domain controller resides, whether it is a GC server, and
who manages the computer. If any of the domain controllers aren’t showing a green (clear) status
icon, there is a problem that you need to investigate and fix.

Using NT Directory Service Performance Counters


NT Directory Service (NTDS) performance counters are internal domain controller counters
used by multiple aspects of AD. Once AD has been installed on the domain controller, these
directory counters are added to the system. These counters allow you to monitor and track the
domain controller’s replication activity, LDAP traffic, and authentication traffic. Table 4.1
describes the more useful NTDS performance counters and how to use them to track AD activity
on a domain controller.

118
Chapter 4

Counter Function Description


DRA Inbound Bytes Tracks the total number of Indicates the total amount of inbound
Total/sec bytes per second received on replication traffic over time. If a small
the server during replication number of bytes are being sent, either the
with other domain controllers. network or the server is slow. Other issues
that might limit the number of bytes being
sent include few changes being made to the
naming contexts hosted by the domain
controller, replication topology problems,
and connectivity failures. Of course, you
need to check this value against a baseline
of activity.
DRA Inbound Object Tracks the number of object Indicates that the server is receiving
Updates Remaining in updates received in the AD changes but is taking a long time to apply
Packets replication update packet but them to the AD database. The value of this
not applied to the local domain counter should be as low as possible. A
controller. high value indicates that the network is slow
during replication or the domain controller is
receiving updates faster than it can apply
them. Other issues that can affect speed of
update are high domain controller load,
insufficient hardware (memory, disk, or
CPU), the disk becoming full or fragmented,
other applications using too many
resources, and so on.
DRA Outbound Bytes Tracks the number of bytes that Indicates the total amount of outbound
Total/sec are sent from the server during replication traffic over time. If this value
replication to other domain remains low, it can indicate a slow server or
controllers. network or few updates on this domain
controller. In the latter case, it can mean
that clients are connecting to other domain
controllers because this one is slow or that
there are topology problems. For best
results, test the current value against an
established baseline value.
DRA Pending Tracks the number of pending Indicates the backlog of directory
Replication requests from replication synchronizations for the selected server.
Synchronizations partners for this domain This value should be as low as possible. A
controller to synchronize with high value could indicate a slow server or a
them. Synchronizations are problem with the server’s hardware.
queued, ready for processing
by the domain controller.
DS Threads in Use Tracks the current number of Indicates how the directory service on the
threads that are being used by server is responding to client requests.
the directory service running on When a client requests information, AD
the domain controller. spawns a thread to handle the request. If
the number of threads remains constant,
Win2K clients may experience a slow
response from the domain controller.
Kerberos Tracks the current number of Indicates how the domain controller is
Authentications/sec authentications per second for responding to client requests for
the domain controller. authentications. If this counter doesn’t show
activity over time, clients could be having a
problem contacting the domain controller.

119
Chapter 4

Counter Function Description


LDAP Bind Time Tracks the amount of time (in The value of this counter should be as low
milliseconds) required to as possible to indicate that the domain
process the last LDAP bind controller was quick to authenticate the
request from the client. A bind LDAP client. If the value is high, the domain
is described as authenticating controller was slow to authenticate LDAP. A
the LDAP client. This counter high value can indicate a server problem,
tracks only the last successful the domain controller is too busy,
bind for an LDAP client. insufficient hardware (memory or CPU), or
other applications using too many
resources.
LDAP Client Sessions Tracks the current number of If your domain controller has LDAP clients
LDAP sessions on the selected trying to connect, the value of this counter
domain controller. should show activity over time. If the value
remains constant, the server or client may
have problems, the domain controller may
be too busy running other applications, or
there is insufficient hardware (memory or
CPU).
LDAP Searches/sec Tracks the number of LDAP Indicates how many LDAP search requests
search operations that were the domain controller is servicing per
performed on the selected second. You typically view different search
domain controller per second. rates depending on the domain controller’s
LDAP clients connecting to the hardware, the number of clients connected
server perform the LDAP to the domain controller, and what sorts of
search operations. things the clients are doing.
LDAP Successful Tracks the number of LDAP Indicates how the domain controller
Binds/sec binds per second that occur responds to authentications from the clients.
successfully. This value allows you to view the number of
successful binds per second for LDAP
clients. Again, if this value remains constant
over time, there can be a network, client, or
server problem. For example, there is a bad
network component, the client is too busy,
or the server is too busy.
NTLM Authentications Tracks the total number of Allows you to see whether there are
Windows NT LAN Manager authentications from Windows 98 and NT
(NTLM) authentications per clients for this domain controller. If you’re
second serviced by the domain supporting Windows 98 and NT and the
controller. value remains constant over time, there is a
network problem. For example, the network
could have a bad or poorly configured
component, or the client could be too busy.

Table 4.1: A few of the NTDS performance counters that allow you to track how a domain controller is
responding to replication traffic, LDAP traffic, and authentication traffic.

120
Chapter 4

NTDS counters enable you to monitor the performance of AD for the selected domain controller.
You can view these counters under the NTDS object in System Monitor (see Figure 4.12). By
default, System Monitor is started when you choose Start, Administrative Tools, Performance
Console.

Figure 4.12: NTDS performance counters allow you to monitor and track load and performance of the AD
implementation on each domain controller.

Monitoring the Domain Partitions


Domain partitions in AD are often referred to as naming contexts, and they provide a security
and replication boundary. Each domain partition exists in the NTDS.DIT database on the domain
controllers that participate in the domain. The domain partition stores all the users, printers,
servers, computers, and application data. Because users depend on the domain to access other
network resources, it’s important that you constantly monitor the state of the domain partition.

121
Chapter 4

Using DirectoryAnalyzer
DirectoryAnalyzer allows you to monitor the alerts for each domain in AD and the associated
domain controllers. These alerts monitor the domain controllers, replicas, Group Policies, trust
relationships, DNS, and other activity for a domain. If you see any critical alerts, you need to
investigate and fix the problems.
To view the alerts for a domain, click Browse Directory By Naming Context. Select a domain,
then click the Current Alerts tab. The display shows the current alerts for that domain (see Figure
4.13).

Figure 4.13: DirectoryAnalyzer allows you to monitor each domain partition for problems.

In addition to displaying alerts for each domain, DirectoryAnalyzer allows you to view
configuration information. Using the Naming Context Information tab, you can view the current
number of alerts that are active for the following areas: Naming Context (or Domain), Replica,
DNS Server, and DC Server.
The Naming Context Information tab also displays the number of domain controllers for the
domain and whether the domain supports mixed mode. When a domain supports mixed mode, it
allows replication and communication with down-level domain controllers and clients to occur.
In addition, you can see which domain controllers in the domain are performing the FSMO roles
and perform a FSMO consistency check. And finally, you can view all the trust relationships that
exist for the domain. Figure 4.14 shows the Naming Context Information pane in
DirectoryAnalyzer.

122
Chapter 4

Figure 4.14: The Naming Context Information pane in DirectoryAnalyzer allows you to see detailed
information for a domain.

To further monitor the domain, DirectoryAnalyzer provides a high-level summary of each


domain controller. Click Browse Directory By Naming Context, then click the DC Summary tab.
(The DC Summary pane is shown in Figure 4.11 earlier in this chapter.)

Using Domain Database Performance Counters


In AD, the database for the domain has been implemented as an indexed sequential access
method (ISAM) record or table manager. This table manager is often referred to as the
Extensible Storage Engine (ESE) and is implemented by ESENT.DLL on the server. By default,
the associated database file is stored on the Windows server as
<drive>\WINNT\NTDS\NTDS.DIT.

If necessary, you can relocate the NTDS.DIT database on a domain controller using the NTDSUTIL
utility, which is pre-installed.

Using this database engine, AD provides a set of database performance counters that allow you
to monitor the domain in depth. These counters provide information about the performance of
the database cache, database files, and database tables, and they help you monitor and determine
the health of the database for the domain controller. By default, database performance counters
aren’t installed on the domain controllers.

123
Chapter 4

You can view and monitor database counters using the System Monitor utility. Table 4.2 gives
you a general description of the more useful database performance counters and how to use them
to track the activity of the low-level database for each domain.
Counter Function Description
Cache % Hits Tracks the percentage of Indicates how database requests are
database page requests in performing. The value for this counter
memory that were successful. A should be at least 90 percent. If it’s
cache hit is a request that is lower than 90 percent, the database
serviced from memory without requests are slow for the domain
causing a file-read operation. controller, and you should consider
adding physical memory to create a
larger cache.
Cache Page Faults/sec Tracks the number of requests Indicates how the database cache is
(per second) that cannot be performing. I recommend that the
serviced because no pages are computer have enough memory to
available in cache. If there are no always cache the entire database. Thus,
pages, the database cache the value of this counter should be as
manager allocates new pages for low as possible. If the value is high, you
the database cache. need to add more physical memory to
the domain controller.
File Operations Pending Tracks the number of pending Indicates how the OS handles the
requests issued by the database read/write requests to the AD database.
cache manager to the database I recommend that the value for this
file. The value is the number of counter be as low as possible. If the
read and write requests that are value is high, you need to add more
waiting to be serviced by the OS. memory or processing power to the
domain controller. This condition can
also occur if the disk subsystem is
bottlenecked.
File Operations/sec Tracks the number of requests Indicates how many file operations have
(per second) issued by the occurred for the AD database. I
database cache manager to the recommend that this value be
database file. The value is the appropriate for the purpose of the
read and write requests per domain controller. If you think that the
second that are serviced by the number of read and write operations is
OS. too high, you need to add memory or
processing power to the computer.
However, adding memory for the file
system cache on the computer reduces
file operations.
Table Open Cache Tracks the number of database Indicates how the AD database is
Hits/sec tables opened per second. The performing. The value for this counter
database tables are opened by should be as high as possible for good
the cached schema information. performance. If the value is low, you
may need to add more memory.

Table 4.2: Some of the more useful database performance counters, which allow you to monitor the database
for the domain partition that stores all of the AD objects and attributes.

124
Chapter 4

Installing the Counters


By default, database performance counters aren’t installed on the domain controller. To install
them, you must use the dynamic-link library (DLL) file called ESENTPRF.DLL. The
instructions for installing the counters are as follows:
1. Copy the %System%\System32\ESENTPRF.DLL file to a different directory. For
example, you can create a directory named C:\Perfmon, then copy the file to it.
2. Run the REGEDT32.EXE or REGEDIT.EXE registry editor and create the following
registry subkeys if they don’t already exist
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ESENT and
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ESENT\
Performance
3. Under the Performance subkey that you added in Step 2, add and initialize the data of the
following registry values:
Open: REG_SZ: OpenPerformanceData
Collect: REG_SZ: CollectPerformanceData
Close: REG_SZ: ClosePerformanceData
Library: REG_SZ: C:\Performance\esentprf.dll
4. Change directory to the %SystemRoot%\System32 folder (for example,
C:\Winnt\System32).
5. Load the counter information into the registry by executing the following statement:
LODCTR.EXE ESENTPRF.INI
Once you’ve installed the database performance counters, you can use them to track and monitor
the database on the domain controller. As mentioned earlier, you can view and track each
counter using the System Monitor utility in the Performance Console.

Monitoring the Global Catalog


As previous chapters have discussed, special servers on Windows networks store a GC partition,
which is replicated in AD. The domain controllers that contain the GC partition are referred to as
GC servers. Because only the first domain controller installed in a forest is made a GC, you need
to determine and specify which subsequent domain controllers will act as GC servers. In
addition, you need to constantly monitor the GC partition to ensure that it remains healthy.
The GC has been designed to support two crucial functions in an AD forest: user logons and
forest-wide queries or searches. It does so by storing all of the objects in the forest and the key
attributes for each. It doesn’t store all the attributes for each object; instead, it stores only the
attributes it needs to perform queries and support the logon process. One of these attributes is the
distinguished name of the object.

125
Chapter 4

Once users query and retrieve the distinguished name from the GC, they can issue a search on
their local domain controller, and LDAP will chase the referrals to the domain controller that
stores the real object information. In addition, universal group membership is stored in the GC.
Because universal groups can deny access to resources, a user’s membership in this group must
be discovered during logon to build the logon access token. The requests made to the GC are
automatic and not seen by the user.
You can use DirectoryAnalyzer to monitor the GC partition and how it’s performing. It monitors
and tracks the following conditions:
• Domain Controller: Global Catalog Load Too High—Indicates that the domain controller
that stores the GC partition has too much traffic. This traffic is LDAP traffic coming from
workstations and servers.
• Domain Controller: Global Catalog Response Too Slow—Indicates that the domain
controller that stores the GC partition isn’t responding in time to queries and other traffic.
• Replica: GC Replication Latency Too High—Indicates that replication is taking too long
to synchronize the GC stored on the domain controller. If replication latency (the time it
takes to replicate changes to all GCs in the forest) is too high, an alert is generated.
• Site: Too Few Global Catalogs in Site—Indicates that there aren’t enough GC servers in
the site.
Figure 4.15 shows how DirectoryAnalyzer monitors and tracks alerts for the GC.

Figure 4.15: DirectoryAnalyzer allows you to monitor the GC partition that exists on various domain
controllers throughout the forest.

126
Chapter 4

Monitoring Operations Masters


To prevent conflicting updates in WS2K3, AD provides a single-master server to update certain
operations. In a single-master model, only one server is allowed to provide updates for the forest
or domain. When a domain controller takes on the responsibility of the single-master operation,
it’s taking on a role. Thus, this method of updates is called single-master operation roles. When
only one domain controller can take on the role at one time, it’s referred to as a FSMO role.
There are currently five types of operations masters in AD. The directory automatically elects the
FSMO role servers during the creation of each AD forest and domain.

For more detail about these FSMO roles, see Chapter 1.

Two operations masters manage forest-wide operations, so they have forest-specific FSMO
roles:
• Schema master—Responsible for schema extensions and modifications in the forest
• Domain naming master—Responsible for adding and removing domains in the forest
Three operations masters manage domain operations, so they have domain-specific FSMO roles:
• Infrastructure master—Updates group-to-user references in a domain
• RID master—Assigns unique security IDs in a domain
• PDC emulator—Provides PDC support for down-level clients in a domain.

The three domain-specific FSMO roles exist in every domain. Thus, an AD forest with a total of 3
domains would have 11 FSMO roles in all: 9 domain-specific roles and 2 forest-wide roles.

Because there is only one of each of the forest-specific FSMO roles, it’s extremely important that
you constantly monitor and track the activity and health of the operations masters. If any of them
fail, the directory loses functionality until the computer is restarted or another appropriate
domain controller is assigned the role.
To monitor operations masters, you can use DirectoryAnalyzer. It monitors, checks the status of,
and alerts on several types of conditions and situations relating to operations masters, such as
which domain controllers are holding the FSMO roles. Click Browse Directory By Naming
Context, and click the Naming Context Information tab. Under Operations Master Status, you
see which domain controller is holding which FSMO role. Figure 4.16 shows the status of the
FSMO roles in the Naming Context Information pane.

127
Chapter 4

Figure 4.16: DirectoryAnalyzer displays which domain controllers are holding which FSMO roles for the
naming context.

You can also use the Naming Context Information pane (shown in Figure 4.13) to check the
consistency of the FSMO roles across all of the domain controllers on the network.
DirectoryAnalyzer monitors what each domain controller reports for the FSMO assignments. If
not all of the domain controllers report the same values for all of the operations masters, the
word No appears beside Operations Master Consistent.
To investigate the problem, click Details. The Operations Master Consistency dialog box
appears, indicating that operations master information is inconsistent. It displays the names of
the domain controllers and which domain controller holds each role. In Figure 4.17, the domain
controller COMP-DC-04 has inconsistent information about the true owner of the PDC emulator
role because it shows domain controller COMP-DC-01 as the owner when it should be COMP-
DC-03. Thus, the owner of the PDC operations master is inconsistent.

128
Chapter 4

Figure 4.17: DirectoryAnalyzer allows you to monitor and check consistency for each operations master.

In addition to showing the status and consistency checks, DirectoryAnalyzer monitors and
displays alerts for each operations master. The alerts that are monitored and tracked provide
information about the availability of the FSMO roles. To monitor the availability of the FSMO
role holders, you can click Current Alerts in the bar to the side of the main screen. To display the
alerts for a domain or each domain controller, click Browse Directory By Naming Context.
The alerts indicate that the domain controller that holds the operations master isn’t responding.
This lack of response could mean that the domain controller and AD are down and not
responding. It could also mean that the domain controller no longer has network connectivity,
which could indicate DNS or Internet Protocol (IP) addressing problems. Finally, this alert could
simply mean that the domain controller or the directory that is installed is overloaded and
responding too slowly. Figure 4.18 shows how DirectoryAnalyzer monitors and tracks alerts for
each operations master.

129
Chapter 4

Figure 4.18: DirectoryAnalyzer monitors and tracks the availability of each FSMO role holder.

Monitoring Replication
AD is a distributed directory made up of one or more naming contexts, or partitions. Partitions
are used to distribute the directory data on the domain controllers across the network. The
process that keeps partition information up to date is called replication. Monitoring replication is
critical to the proper operation of the directory. Before I discuss how to monitor replication,
however, I need to describe what it is and how it works.
In AD, replication is a background process that propagates directory data among domain
controllers. For example, if an update is made to one domain controller, the replication process is
used to notify all of the other domain controllers that hold copies of that data. In addition, the
directory uses multimaster replication, which means that there is no single source (or master)
that holds all of the directory information. Through multimaster replication, changes to the
directory can occur at any domain controller; the domain controller then notifies the other
servers.
Because AD is partitioned, not every domain controller needs to communicate or replicate with
each other. Instead, the system uses a set of connections that determines which domain
controllers need to replicate to ensure that the appropriate domain controllers receive the updates.
This approach reduces network traffic and replication latency (the time to replicate a change to
all replicas). The set of connections used by the replication process is the replication topology.

130
Chapter 4

Using Directory Partition Replicas


A directory partition replica can be a full replica or a partial replica. A full replica contains all of
the objects and attributes of a partition and is read- and write-accessible. A partial replica
contains a subset of the objects and attributes and is read-only. Partial replicas are stored only on
a GC server. Each domain controller stores at least three full directory partitions, or naming
contexts, which include the schema partition, configuration partition, and domain partition.

Schema Partition
The schema partition contains the set of rules that defines the objects and attributes in AD. This
set of rules is used during creation and modification of the objects and attributes in the directory.
The schema also defines how the objects and attributes can be manipulated and used in the
directory.
The schema partition is global; thus, every domain controller in the forest has a copy, and these
copies need to be kept consistent. To provide this consistency, the replication process in the
directory passes updated schema information among the domain controllers to the copies of the
schema. For example, if an update is made to the schema on one domain controller, replication
propagates the information to the other domain controllers, or copies of the schema.

Configuration Partition
The configuration partition contains the objects that define the logical and physical structure of
the AD forest. These objects include sites, site links, trust relationships, and domains. Like the
schema partition, the configuration partition exists on every domain controller in the forest and
must be exactly the same on each.
Because the configuration partition exists on every domain controller, each computer has some
knowledge of the physical and logical configuration of the directory. This knowledge allows
each domain controller to efficiently support replication. In addition, if a change or update is
made to a domain controller and its configuration partition, replication is started, which
propagates the change to the other domain controllers in the forest.

Domain Partition
The domain partition contains the objects and attributes of the domain itself. This information
includes users, groups, printers, servers, organizational units (OUs), and other network resources.
The domain partition is copied, or replicated, to all of the domain controllers in the domain. If
one domain controller receives an update, it needs to be able to pass the update to other domain
controllers holding copies of the domain.
A read-only subset of the domain partition is replicated to GC servers in other domains so that
other users can access its resources. This setup allows the GC to know what other objects are
available in the forest.

131
Chapter 4

Using Directory Updates


AD updates are changes made to an object or attribute stored on a domain controller. When an
update occurs, the domain controller that receives it uses replication to notify other domain
controllers holding replicas of the same partition. The domain controller that receives the update
(called the originating domain controller) notifies its replication partners of the change first, then
the partners requesting the appropriate changes.
A write request from a directory client is called an originating write. When an update that
originates on one domain controller is replicated to another domain controller, the update is
called a replicated write. Using this approach, AD can distinguish update information during
replication.
AD replication doesn’t use date or time stamps to determine which changes need to be
propagated among domain controllers; instead, it uses Update Sequence Numbers (USNs). A
USN is a 64-bit counter that is associated with each object. It increments each time a change is
initiated, then it’s associated with the change. To view the USN of an object, use the following
command at a command prompt:
REPADMIN /showmeta <object DN>
In addition to maintaining USNs, AD maintains an up-to-dateness vector, which helps the
domain controllers involved in replication track updates. The up-to-dateness vector is a table
containing one entry per naming context, which are the high-watermark USNs for each
replication partner. During replication, the requesting domain controller sends the up-to-dateness
vector with its replication request so that the originating domain controller sends only those
updates that the requesting domain controller doesn’t already have.
The up-to-dateness vector also helps with the problems of multiple replication paths among
domain controllers. AD allows multiple replication paths to exist so that domain controllers can
use more than one path to send and receive replication traffic. When multiple replication paths
exist, you might expect redundant traffic and endless looping during replication, but the directory
allows domain controllers to detect when replication data has already been replicated. This
method is called propagation dampening.
AD prevents these potential problems by using the up-to-dateness vector and the high-watermark
vector. The up-to-dateness vector contains server-USN pairs and represents the latest originating
update. The high-watermark vector holds the USNs for attributes that have been added or
modified in the directory and that are stored in the replication metadata for that attribute. Using
both vectors, propagation dampening can occur and unnecessary directory updates avoided.

132
Chapter 4

As I’ve mentioned, the values in the up-to-dateness vector can determine which updates need to
be sent to the destination domain controller. For example, if the destination domain controller
already has an up-to-date value for an object or attribute, the source domain controller doesn’t
have to send the update for it. To view the contents of the up-to-dateness vector for any domain
controller, type the following command at a command prompt:
REPADMIN /showvector <NC name>
To help resolve conflicts during replication, AD attaches a unique stamp to each replicated value.
Each stamp is replicated along with its corresponding value. To ensure that all conflicts can be
resolved during replication, the stamp is compared with the current value on the destination
domain controller. If the stamp of the value that was replicated is larger than the stamp of the
current value, the current value (including the stamp) is replaced. If the stamp is smaller, the
current value is left alone.

Using the Replication Topology


As I mentioned earlier, the replication topology is the set of connections used by the domain
controllers in a forest to synchronize the directory partition replicas. The replication topology is
created automatically on the basis of information in AD by the Knowledge Consistency Checker
(KCC), a built-in process that runs on all domain controllers. By default, the KCC runs at 15-
minute intervals and designates the replication routes among domain controllers on the basis of
the most favorable connections available at that time.
The KCC automatically generates replication connections among domain controllers in the same
site. This local replication topology is called an intra-site topology. If you have multiple wide
area network (WAN) locations, you can configure site links among the sites, then the KCC can
automatically create the respective replication connection objects. The replication topology that
is created among remote locations is called an inter-site topology. The sets of domain controllers
that replicate directly with each other are called replication partners. Each time the KCC runs,
these replication partners are automatically added, removed, or modified.

Although you can disable the KCC and create connection objects by hand, I strongly recommend
that you use the KCC to automatically generate the replication topology. The reason is that the KCC
simplifies a complex task and has a flexible architecture, which reacts to changes you make and any
failures that occur.
However, if your organization has more than 100 sites, you may need to manually create the
replication topology; in environments that have more than 100 sites, the KCC doesn’t scale well. In
extremely large organizations, the KCC will often spend too much time and processing power trying
to calculate the replication topology, with the result that the topology will never be properly generated
and replication won’t work properly between sites.

133
Chapter 4

The KCC uses the following components to manage the replication topology:
• Connections—The KCC creates connection objects in AD that enable the domain
controllers to replicate with each other. A connection is defined as a one-way inbound
route from one domain controller to another. The KCC manages the connection objects
and reuses them where it can, deletes unused connections, and creates new connections if
none exist.
• Servers—Each domain controller in AD is represented by a server object. The server has
a child object called NTDS Setting. This setting stores the inbound connection objects for
the server from the source domain controller. Connection objects are created in two
ways—automatically by the KCC or manually by an administrator.
• Sites—The KCC uses sites to define the replication topology. Sites define the sets of
domain controllers that are well connected in terms of speed and cost. When changes
occur, the domain controllers in a site replicate with each other to keep AD synchronized.
If the domain controllers are local (intra-site topology), replication starts as needed—with
no concern for speed or cost—within 5 minutes of an update occurring. If the two domain
controllers are separated by a low-speed network connection (inter-site topology),
replication is scheduled as needed. Inter-site replication occurs only on a fixed schedule,
regardless of when updates occur.
• Subnets—Subnets assist the KCC to identify groups of computers and domain
controllers that are physically close or on the same network.
• Site links—Site links must be established among sites so that replication among sites can
occur. Unless a site link is placed, the KCC cannot automatically create the connections
among sites, and replication cannot take place. Each site link contains the schedule that
determines when replication can occur among the sites that it connects.
• Bridgehead servers—The KCC automatically designates a single server for each naming
context, called the bridgehead server, to communicate across site links. You can also
manually designate bridgehead servers when you establish each site link. Bridgehead
servers perform site-to-site replication; in turn, they replicate to the other domain
controllers in each site. Using this method, you can ensure that inter-site replication
occurs only among designated bridgehead servers. Thus, bridgehead servers are the only
servers that replicate across site links, and the rest of the domain controllers are updated
within the local sites.

134
Chapter 4

Using DirectoryAnalyzer
DirectoryAnalyzer allows you to monitor replication among domain controllers and report any
errors or problems. It allows you to track the following problems and issues:
• Replication Cycle—The time during which the requesting domain controller receives
updates from one of its replication neighbors. You can view the successful replication
cycle as well as any errors that occurred during that time.
• Replication Latency—The elapsed time between an object or attribute being updated
and the change being replicated to all the domain controllers that hold copies. If
replication latency is too high, DirectoryAnalyzer issues an alert.
• Replication Topology—The paths among domain controllers used for replication. If the
replication topology evaluates that the topology is transitively closed (meaning that it
doesn’t matter on which domain controller an update occurs), the topology will provide
for that update to be replicated to all other domain controllers.
• Replication Failures—Occur when a domain controller involved in replication doesn’t
respond. Each time there are consecutive failures from the same domain controller, an
alert is issued. Many things can cause failures—for example, a domain controller may be
too busy updating its own directory information from a bulk load.
• Replication Partners—Sets of domain controllers that replicate directly with each other.
DirectoryAnalyzer monitors domain controllers and pings them to make sure that each is
still alive and working. If a replication partner doesn’t respond, an alert is issued.
• Replication Conflict—Occurs when two objects or attributes are created or modified at
exactly the same time on two domain controllers on the network. AD resolves this
conflict automatically, and DirectoryAnalyzer issues an alert so that you’ll know that one
of the updates was ignored by replication.
DirectoryAnalyzer is a unique utility because it allows you to browse AD for information on, for
example, the replication cycle and replication partners. Figure 4.19 shows the Replication
Information pane, which displays the last successful replication cycle for each domain controller,
replication partners, and any errors that occurred during replication.

135
Chapter 4

Figure 4.19: DirectoryAnalyzer allows you to view the replication cycle and replication partners for each
domain controller.

Using DirectoryAnalyzer, you can monitor and track the replication process for errors. If a
problem occurs, the utility will issue an alert to indicate what type of problem has occurred. You
can double-click the alert to see more detailed information, then use the knowledge base to find
troubleshooting methods to help you solve the problem. The Current Alerts screen displays the
more recent alerts that have been logged for replication (see Figure 4.20).

136
Chapter 4

Figure 4.20: The Current Alerts screen in DirectoryAnalyzer allows you to view the most recent alerts for the
replication process.

You can also view the replication-related alerts that have been stored in the Alert History file in
DirectoryAnalyzer. To display these alerts, on the Current Alerts screen, choose Reports, Alert
History. On the Report page, select one of the report options to specify what alerts you want to
include. Then select Preview to display the report on the screen. You can print the report or
export it to a file. Figure 4.21 illustrates an Alert History report.

137
Chapter 4

Figure 4.21: Using DirectoryAnalyzer, you can produce a report of replication-related alerts.

Monitoring via Auditing


I’ve already mentioned the important role that auditing can play in the troubleshooting process.
Windows includes built-in auditing mechanisms, which log auditing events to the Windows
event logs.

Setting up Auditing
Auditing is controlled on a per-object basis within the domain. To view this feature, you can, for
example, open Active Directory Users and Computers, and right-click a domain. Select
Properties from the context menu, select the Security tab, then click Advanced, and select the
Auditing tab.

If the Security tab isn’t visible, ensure that Advanced Features is selected on the console’s View
menu, then try again.

As Figure 4.22 shows, you can define which actions generate audit messages. To fully enable
auditing, audit all success and failure events for the special Everyone group. Although doing so
will produce the maximum amount of useful information for troubleshooting, it will create a log
entry for practically every event that occurs within AD. This volume of events can quickly
overfill the Security event log and create more data than you can readily utilize—a downside of
AD’s auditing capabilities.

138
Chapter 4

Figure 4.22: Configuring auditing in AD.

Reviewing Auditing Messages


Use the Security event log to review audit messages. Figure 4.23 shows the log and the various
messages that it has accumulated. You can use the log viewer’s normal tools to filter or search
for specific types of messages, although in many cases, finding messages related to a particular
problem can be challenging.

139
Chapter 4

Figure 4.23: Viewing auditing events in the Security event log viewer.

Using ChangeAuditor for Active Directory


There are several problems with AD’s built-in auditing capabilities. First, the audit log can be
erased by an administrator, leading to the possibility of a rogue administrator making changes
and then deleting the evidence. Second, AD’s auditing occurs on a per-domain controller basis;
thus, to troubleshoot a problem, you must examine the logs on every domain controller in your
environment to search for clues. This process is too inefficient for most administrators to bother
with, which is perhaps why auditing has never become a popular troubleshooting tool. Finally,
not every possible change within AD can be effectively audited using AD’s built-in capabilities,
which were designed for security auditing rather than troubleshooting.
ChangeAuditor collects events in real-time and forwards them to a central database; thus, the
inefficiencies in AD’s built-in auditing capabilities can be bypassed. ChangeAuditor also
organizes events and makes filtering for specific events much easier.

140
Chapter 4

For example, suppose you were having a problem with replication to a specific site. There are, of
course, a number of potential causes for this problem. However, if replication used to work, then
you can expect that one or more changes have occurred in order to create the problem. This
change(s) might simply be a change in network connectivity (such as a WAN link that’s down),
or it could be a configuration change. To begin the troubleshooting process, ChangeAuditor
gives you a good first place to look; as Figure 4.24 shows, you can quickly spot configuration
changes—such as the addition of a new site link—which might be having an impact on your
environment’s AD replication.

Figure 4.24: Spotting configuration changes can lead to a solution more quickly.

ChangeAuditor can also spot changes to the registry and file system—changes that Windows’
built-in auditing functionality can log, but can also be incredibly difficult to detect in the
enormous mass of data that the Security log would contain were you to audit events related to
those OS components.

141
Chapter 4

Summary
Before you can accurately troubleshoot AD, you must be able to effectively monitor it for
problems. Thus, you must be able to monitor the directory that has been distributed across
domain controllers on the network. You can do so by using the monitoring tools described in this
chapter. These tools allow you to watch the directory components individually and as they
interact with each other. For example, you can monitor the domain controllers, the domain
partition, the GC partition, the operations masters, and the replication process and topology.
Monitoring these components ensures the health of the directory as a system. In addition, you
can use auditing tools to perform effective and efficient troubleshooting of AD problems—
saving time and energy that can be better spent on other administrative tasks.

142
Chapter 5

Chapter 5: Troubleshooting Active Directory and


Infrastructure Problems
Troubleshooting AD means analyzing and identifying problems that occur in your AD network
and subsequently repairing them. Troubleshooting a production AD environment can often be
difficult because it’s dynamic and complex by nature, but there are techniques and tools
available to make the job easier. In this chapter, you’ll learn how to apply these techniques and
tools and develop an AD network-troubleshooting methodology.
The troubleshooting process primarily involves isolating and identifying a problem. Few
problems are difficult to solve once you know exactly what is going wrong and where.
Troubleshooting, in general, is more an art born out of experience than an exact science. Your
approach to solving a problem can depend largely on the specifics of your directory, system, and
network. This chapter outlines some common techniques and approaches that you can use to help
troubleshoot and maintain your AD implementation.

Following a Specific Troubleshooting Methodology


When you troubleshoot AD, follow a specific methodology of diagnosing and troubleshooting
problems in the system. This methodology is a set of steps you can follow to identify situations,
diagnose problems, and repair AD components. The first step of this methodology is a set of
questions that you can use to identify particular situations or problems:
• Is network communication working?
• Does the name resolution work?
• Are the domain controllers responding?
• Are the operations masters working?
• Is the replication topology working?
When a problem doesn’t exhibit the characteristics of a typical failure, and when monitoring
tools fail to provide enough information to isolate the problem, the next step is to try to eliminate
pieces of the system until you end up with a small, predictable failure. As mentioned earlier, use
the process of elimination to rule out as many technologies and dependencies as possible. Even if
the problem seems overly complex at first, you can simplify it by eliminating all of the
possibilities—one by one.

143
Chapter 5

Building a Wolf-Proof Fence


Experienced troubleshooters often follow, even unconsciously, an excellent methodology that is a good
practice to emulate. Simply ask yourself “How can I find a wolf in Siberia?” The wolf is the problem that is
occurring in AD, and Siberia—a large, uncharted territory—is the vast array of things that could be wrong.
The answer to the question, of course, is to build a wolf-proof fence. Down the middle of Siberia, to be
specific: You know the wolf is on one side or the other. In AD troubleshooting terms, you implement this
methodology by asking yourself one question that will definitively rule out one class of problems or the
other.
For example, let’s suppose that replication isn’t working on a particular domain controller. Broadly
speaking, two things could be wrong: Network connectivity is messed up, or AD itself is messed up.
Pinging another domain controller by name will eliminate or confirm the entire category of network
connectivity: If the ping works, you can focus on AD itself, having eliminated half of your potential
problems; if the ping doesn’t work, you can focus on network connectivity. The ping becomes your wolf-
proof fence, narrowing the problem to one side of Siberia or the other.
With one side out of the way, you divide and conquer the remaining half. If you’re down to network
connectivity, you might try pinging another domain controller by IP address, thus eliminating or confirming
the realm of name resolution as the problem. You continue breaking the problem into halves until you
come to the right answer. The trick is in knowing one test, in each case, that will eliminate or confirm
roughly half the potential problems.

Troubleshooting Network Connectivity


You can troubleshoot network connectivity in a number of ways. For example, you can
• Test that the hardware you’re using has network connectivity
• Test that IP addresses are correct by using the IP Configuration (IPCONFIG) utility
• Test that TCP/IP connections are working using the PING utility
• Perform other troubleshooting tests by using DirectoryAnalyzer

Testing for Network Connectivity


The first step toward identifying and diagnosing AD problems is to verify that each domain
controller and user workstation has network connectivity. At a minimum, you need to check that
your domain controller’s hardware is functioning correctly, including the computer’s local area
network (LAN) adapters, drivers, cables, and network hub. For example, if you look in the
Network and Dial-up Connections screen under Control Panel, and the Local Area Connection
icon is marked with a red X, the network cable isn’t connected.
Figure 5.1 shows that the domain controller has a local area connection problem. Because the
domain controller’s cable isn’t connected to the network, there is a simple solution to the
problem: reconnect the cable.

144
Chapter 5

Figure 5.1: A red X on the Local Area Connection icon indicates that the network cable is disconnected from
your domain controller.

Testing the IP Addresses


Another method of checking network connectivity on the LAN is to make sure that the IP
addresses are correct. To perform an IP check, use the IPCONFIG utility. IPCONFIG allows you
to view and modify the domain controller’s IP configuration details on the command line. It also
checks that the default gateway is on the same subnet as the local computer’s IP address. For
Domain Name System (DNS) dynamic updates, you can use IPCONFIG to register the
computer’s entries in the DNS service.
To view a computer’s TCP/IP configuration, type the following command in a Command Prompt
window on the domain controller or workstation:
ipconfig /all
The default display shows only the IP address, subnet mask, and default gateway for each
adapter bound to TCP/IP. Figure 5.2 shows an unsuccessful TCP/IP configuration and network
connection.

145
Chapter 5

Figure 5.2: An unsuccessful TCP/IP configuration and network connection, shown by using IPCONFIG.

Listing 5.1 shows a well-connected LAN. Notice that the IP addresses are displayed with
appropriate values.
C:> ipconfig /all
Windows Server 2003 IP Configuration
Host Name . . . . . . . . . . . . : cx266988-S
Primary DNS Suffix. . . . . . . . : company.com
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
Search List . . . . . . . . . . . : company.com

Ethernet adapter Local Area Connection:


Connection-specific DNS Suffix. . : company.com
Description . . . . . . . . . . . : Netelligent 10/100TX PCI Embedded
UTP Coax Controller
Physical Address. . . . . . . . . : 00-80-5F-A9-C0-74
IP Address. . . . . . . . . . . . : 10.0.0.10
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . : 10.0.0.1

Listing 5.1: A well-connected LAN, shown by using IPCONFIG.

If you want to save the results of running IPCONFIG for further analysis, you can capture the
results in a text file. At the command line, enter the following command:
ipconfig /ALL > <local_drive>:\<text_file.txt>

There are many advanced features and switches available with IPCONFIG. To view the available
switches, enter the following command at the command line:
IPCONFIG /?

If everything looks normal when you run IPCONFIG, go on to test the TCP/IP connection.

146
Chapter 5

Testing the TCP/IP Connection


You can test the TCP/IP connection among connected servers and workstations by using the
PING utility. PING lets you determine whether the LAN adapter and TCP/IP are working and
whether you have network connectivity to the default gateway or the Dynamic Host
Configuration Protocol (DHCP) server. In this case, you can use the PING command to test
TCP/IP connectivity among the domain controllers that support AD or on a workstation that uses
AD. You can start the PING command on one domain controller to test the connectivity to
another. When a domain controller fails to connect to the targeted computer, the PING utility
returns a Request timed out or Destination host unreachable message. This message is repeated
at least four times as PING retries the connection. In addition, the utility shows statistics
gathered during the test.
You can use the PING utility to perform a series of steps to troubleshoot connectivity problems
among domain controllers. The first test is called the loop-back address test, which verifies
whether TCP/IP is working on the local computer. To perform this test on the local computer,
type the following command in the Command Prompt window. (Instead of using 127.0.0.1, you
can use the keyword localhost.)
PING 127.0.0.1
If the PING command fails on the loop-back address test, check the TCP/IP configuration
settings and restart the local domain controller.
After you verify that TCP/IP is configured properly and the PING loop-back address test
succeeds, you need to test the local TCP/IP address of the local domain controller. To do so, type
the following command:
PING <local_TCP/IP_address>
If the PING test for the local address fails, restart the domain controller and check the routing
tables by using the ROUTE PRINT command at a command prompt on the computer. The
ROUTE PRINT command displays the current IP address assigned to the local computer plus all
of the active and persistent network routes. This command allows you to view and troubleshoot
the network configurations that exist at the time that the command is executed.
After you’ve verified that the local address is working properly, use the PING command to check
the communication to the other domain controllers in the same location or subnet. For example,
you can check connectivity to the other domain controllers on the same subnet as follows:
PING <domain_controller1_address>
PING <domain_controller2_address>
PING <domain_controller3_address>
In the PING statements, the domain controller address is represented as the domain name (that is,
COMPANY.COM) or the IP address of the domain controller (that is, 10.0.0.10). If
communication among the domain controllers on the local subnet fails, you need to check that
each computer is operational and that the network hubs and switches are working properly. If the
domain controllers are separated by a wide area network (WAN) connection, you need to ping
the default gateways that route the TCP/IP traffic among WAN locations.
Start by pinging the IP address of your default gateway. If the PING command fails for the
gateway, you need to verify that the address for the default gateway is correct and that the
gateway (router) is operational.

147
Chapter 5

Next, ping the IP address of the remote domain controllers on the remote subnet as follows:
PING <remote_domain_contoller1_address>
PING <remote_domain_contoller2_address>
PING <remote_domain_contoller3_address>
In the PING statements, the remote domain controller address is represented as the domain name
(that is, REMOTE.COMPANY.COM) or the IP address of the domain controller (that is,
20.0.0.20). If the PING command fails, verify the address of each remote domain controller and
check whether each remote domain controller is operational (generally, you’ll also want to ping
the IP address directly, to ensure that name resolution isn’t causing the ping to fail). In addition,
check the availability of all of the gateways or routers between your domain controller and the
remote one.
In addition to pinging the domain controllers, you need to ping the IP address of the DNS server.
If this command fails, verify that the DNS server is operational and the address is correct.

Performing Other Troubleshooting Tests Using DirectoryAnalyzer


You can perform several other tests of network connectivity in AD using DirectoryAnalyzer
from NetPro. You can test the network connection, view the current status and name, query IP
addresses, and perform server lookups. You can use DirectoryAnalyzer to perform the following
troubleshooting tests:
• Domain controller connectivity test
• Domain connectivity test
• Site connectivity test
Each performs a different type of connectivity test among the domain controllers in specific AD
domains and sites.

Domain Controller Connectivity Test


The domain controller connectivity test allows you to test the connectivity between a selected
domain controller in the forest and one or more target domain controllers. This test is useful for
testing communications among any domain controllers in the forest. To perform this test, choose
Troubleshoot, DC Connectivity. The Test Domain Controller Connectivity dialog box appears, in
which you can select the domain controllers involved in the test.
First, select the source domain controller from the Source list. Next, select the destination
domain controller(s) that the source will communicate with during the test by selecting the check
box to the left of each domain controller in the Destination list. Then click Start Test. Figure 5.3
shows the results of running the domain controller connectivity test.

148
Chapter 5

Figure 5.3: Running the domain controller connectivity test to troubleshoot the communication path among
domain controllers in the forest.

After the test is completed, the results are displayed at the bottom of the dialog box:
• Destination—Shows the name of each destination domain controller you selected.
• Test—Shows the type of test that was performed. The type of test varies according to the
services that have been assigned to the domain controller.
• Time—Shows the amount of time (in milliseconds) it took to perform each test. If a test
is performed in less than 10 milliseconds, it’s displayed as < 10 ms; otherwise, the actual
time is displayed.
• Result—Shows whether a test was successful. If the test failed, this column displays a
brief description of why.

Domain Connectivity Test


The domain connectivity test allows you to test the connectivity of a domain controller in a
selected domain against domain controllers in the destination domain(s). To perform this test,
choose Troubleshoot, Domain Connectivity. The Test Domain Connectivity dialog box appears,
in which you can select the domains involved in the test.
First, select the source domain/domain controller from the Source list. Next, select the
destination domain(s) that the source domain/domain controller will communicate with during
the test by selecting the check box to the left of each domain in the Destination list. Then click
Start Test. Figure 5.4 shows the results of running the domain connectivity test.

149
Chapter 5

Figure 5.4: Running the domain connectivity test to troubleshoot the communication between the domain
controller in the source domain and the domain controllers in the destination domain.

After the test is completed, the results are displayed at the bottom of the dialog box:
• Destination—Shows the name of each destination domain/domain controller you
selected.
• Test—Shows the type of test that was performed. The type of test varies according to the
services that have been assigned to the domain controller.
• Time—Shows the amount of time (in milliseconds) it took to perform each test. If a test
is performed in less than 10 milliseconds, it’s displayed as < 10 ms; otherwise, the actual
time is displayed.
• Result—Shows whether a test was successful. If the test failed, this column displays a
brief description of why.

Site Connectivity Test


The site connectivity test allows you to test the connectivity of a domain controller in a selected
site against domain controllers in the destination site. To perform this test, choose Troubleshoot,
Site Connectivity. The Test Site Connectivity dialog box appears, in which you can select the
site and domain controllers involved in the test.
First, select the source site/domain controller from the Source list. Next, select the destination
site(s) that the source domain/domain controller will communicate with during the test by
selecting the check box to the left of each name in the Destination list. Then click Start Test.
Figure 5.5 shows the results of running the site connectivity test.

150
Chapter 5

Figure 5.5: Running the site connectivity test to troubleshoot the communication between a site/domain
controller and the domain controllers in the destination site.

After the test is completed, the results are displayed at the bottom of the dialog box.
• Destination—Shows the name of each destination site/domain controller you selected.
• Test—Shows the type of test that was performed. The type of test varies according to the
services that have been assigned to the domain controller.
• Time—Shows the amount of time (in milliseconds) it took to perform each test. If a test
is performed in less than 10 milliseconds, it’s displayed as < 10 ms; otherwise, the actual
time is displayed.
• Result—Shows whether a test was successful. If the test failed, this column displays a
brief description of why.

151
Chapter 5

Troubleshooting Name Resolution


DNS is the name resolution system used to locate computers and domain controllers in AD. For
example, a workstation or member server finds a domain controller by querying DNS. If you
have problems connecting to AD and you’ve successfully tested network connectivity, a name-
resolution problem may exist. For example, if you cannot find domain controllers or network
resources when you perform queries, it might mean that DNS domain names aren’t being
resolved to IP addresses.

Understanding Name Resolution


The first step in identifying and diagnosing AD name-resolution problems is to review how the
Windows computer registers names and locates domain controllers. For example, whenever you
start a domain controller, it can register two types of names:
• A DNS domain name with the DNS service
• If the computer has Network Basic Input/Output System (NetBIOS) enabled, a NetBIOS
name with Windows Internet Name Service (WINS) or with another transport-specific
service
The DNS resource records (RRs) registered by the domain controllers in AD include multiple
service (SRV) records, address (A) records, and CNAME (canonical name) records, all of which
identify the domain controllers’ location in a domain, site, and forest. When a domain controller
is started, the Netlogon service registers these records. It also sends DNS dynamic-update
queries for the SRV records, A records, and CNAME records every hour to ensure that the DNS
server always has the proper records.
When you use AD-integrated zones, the DNS server stores all of the records in the zone in AD.
To run AD-integrated zones, the DNS service must be running on the domain controller. It’s
possible that a record is updated in AD but hasn’t replicated to all DNS servers hosting the zone.
This occurrence might cause consistency problems. By default, all DNS servers that load zones
from AD poll the directory at set intervals (every 5 minutes, but you can change this setting) to
update the directory’s representation of the zones.

In WS2K3, not every domain controller contains your AD-integrated DNS zone. DNS can be
configured to replicate this information only to those domain controllers that are actually acting as
DNS servers. Doing so reduces the amount of replication required to keep DNS current on every
domain controller.

Checking that DNS Records Are Registered


If DNS records for a domain controller aren’t registered on the DNS server, no other domain
controller or workstation can locate the domain controller. There are a few ways that you can
check for this.

152
Chapter 5

Using Event Viewer


If DNS records aren’t registered in DNS—for example, if the DNS client has problems
dynamically updating DNS records—errors are recorded in the System Log in Event Viewer.
Figure 5.6 shows how the System Log tracks DNS errors in Event Viewer.

Figure 5.6: Using Event Viewer to track DNS errors that occur on the selected domain controller.

If the domain controller is a DNS server, an additional log tracks all of the DNS basic events and
errors for the DNS service on the server. For example, the DNS Server log monitors and tracks
the starts and stops for the DNS server. It also logs critical events, such as when the server starts
but cannot locate initializing data—for example, zones or boot information stored in the registry
or (in some cases) AD. Figure 5.7 shows how you can access the DNS Server log in Event
Viewer.

153
Chapter 5

Figure 5.7: Using the DNS Server log in Event Viewer to track the errors for all DNS events that occur on a
domain controller that supports a DNS server.

Using PING
Another simple method for checking whether DNS records have been registered is to determine
whether you can look up the names and addresses of network resources by using the PING
utility. For example, you can check the names using PING as follows:
PING COMPANY.COM
If this command works, the DNS server can be contacted by using this basic network test. Even
if the command doesn’t work, the PING utility will show you the results of the name resolution
process. For example, typing:
PING SERVER2
Might create the output:
Pinging server2 [192.168.0.103] with 32 bytes of data:
Proving that the name Server2 was resolved to an IP address; you can check to make sure the IP
address is correct.

Using NSLOOKUP
Next, you need to verify that the DNS server is able to listen to and respond to basic client
requests. You can do so by using NSLOOKUP, a standard command-line utility provided in
most DNS-service implementations, including Windows. NSLOOKUP allows you to perform
query testing of DNS servers and provides detailed responses as its output. This information is
useful when you troubleshoot name-resolution problems, verify that RRs are added or updated
correctly in a zone, and debug other server-related problems.

154
Chapter 5

To test whether the DNS server can respond to DNS clients, use NSLOOKUP as follows:
NSLOOKUP
Once the NSLOOKUP utility loads, you can perform a test at its command prompt to check
whether the host name appears in DNS. Listing 5.2 shows the output you can receive.
> company.com
Server: ns1.company.com
Address: 250.45.87.13

Name: company.com
Address: 250.65.123.65
Listing 5.2: A sample command and output received by using NSLOOKUP.

The output of this command means that DNS contains the A record and the server is responding
with an answer: 250.65.123.65. Next, verify whether this address is the actual IP address for your
computer. You can also use NSLOOKUP to perform DNS queries, examine the contents of zone
files on the local and remote DNS servers, and start and stop the DNS servers. If the record for
the requested server isn’t found in DNS, you receive the following message:
The computer or DNS domain name does not exist

Checking the Consistency and Properties of the DNS Server


You can check the consistency and view the properties of DNS servers, zones, and RRs by using
another command-line utility called DNSCMD. WS2K3 provides DNSCMD as a command-line
interface for managing DNS servers. You can use this tool to script batch files, help automate the
management and updating of existing DNS server configurations, and set up and configure new
DNS servers on your network. DNSCMD also allows you to manually modify DNS server
properties, create zones and RRs, and force replication between a DNS server’s physical memory
and the DNS database and data files. You can use DNSCMD for most tasks that you can perform
from the DNS console, such as:
• Creating, deleting, and viewing zones and records
• Resetting server and zone properties
• Performing routine administrative operations, such as updating, reloading, and refreshing
the zone
• Writing the zone back to a file or to AD
• Pausing and resuming the zone
• Clearing the cache
• Stopping and starting the DNS service
• Viewing statistics
You can install DNSCMD by copying it from the \Support\Tools folder located on the OS CD-
ROM. For help in using the command, enter the following at a command prompt:
DNSCMD /?

155
Chapter 5

When the DNS Server Doesn’t Resolve Names Correctly


Windows includes a caching DNS-resolver service, which is enabled by default. For
troubleshooting purposes, this service can be viewed, stopped, and started like any other
Windows service. The caching resolver reduces DNS network traffic and speeds name resolution
by providing a local cache for DNS queries.

How the Caching DNS-Resolver Service Works


When a name is submitted to DNS, if the resolver is caching names, it first checks the cache. If
the name is in the cache, the data is returned to the user. If the name isn’t in the cache, the
resolver queries the other DNS servers that are listed in the TCP/IP properties for each adapter. It
does this in the following order:
6. The resolver checks the local hosts file (located by default in
C:\Windows\System32\drivers\etc) to see whether the required name is listed. The
“localhost” address, for example, resolves to 127.0.0.1 through use of the hosts file, not a
DNS server.
7. If the name isn’t in the hosts file, the resolver sends the query to the first server on the
preferred adapter’s list of DNS servers and waits one second for a response.
8. If the resolver doesn’t receive a response from the first server within one second, it sends
the query to the first DNS servers on all adapters that are still under consideration and
waits 2 seconds for a response.
9. If the resolver doesn’t receive a response from any server within 2 seconds, it sends the
query to all DNS servers on all adapters that are still under consideration and waits
another 2 seconds for a response.
10. If it still doesn’t receive a response from any server, it sends the query to all DNS servers
on all adapters that are still under consideration and waits 4 seconds for a response.
11. If it still doesn’t receive a response from any server, the resolver sends the query to all
DNS servers on all adapters that are still under consideration and waits 8 seconds for a
response.
12. If the resolver receives a positive response, it stops querying for the name, adds the
response to the cache, and returns the response to the client. If it doesn’t receive a
response from any server by the end of the 8 seconds, it responds with a time-out. Also, if
it doesn’t receive a response from any server on a specified adapter, it responds for the
next 30 seconds to all queries destined for servers on that adapter with a time-out and
doesn’t query those servers.
The resolver also keeps track of which servers answer queries more quickly, and it might move
servers up or down on the search list based on how quickly they respond. In addition, the
resolver also caches negative responses. If the resolver is able to successfully reach a domain
controller, but that domain controller is unable to resolve the requested name to an IP address,
the result is a negative response. So long as that negative response remains in the cache, the
resolver will not try to resolve the address again. You can clear the cache by running the
following from a command-line:
IPCONFIG /FLUSHDNS
Doing so forces the resolver to start over the next time any name needs to be resolved.

156
Chapter 5

Using Other Techniques


A typical problem occurs when a DNS server doesn’t resolve names correctly and provides
incorrect data for queries. For example, if an administrator changed the IP address on a domain
controller but DNS wasn’t properly updated, DNS would supply the incorrect IP address to
clients.
When working with Windows and DNS entry changes, you may notice that the DNS server has
stale RRs because they haven’t been updated recently. Thus, if there have been previous lookups
or name-resolution activity, the DNS server doesn’t see the changes to the RRs. (The server
caches DNS information from previous lookups so that subsequent lookups are fast.) The typical
method of fixing this problem is to restart the server.
You can also fix this problem by using the IPCONFIG command. Entering the following
command allows you to view the current list of DNS entries that the server has cached:
IPCONFIG /displayDNS
Entering the following command allows you to refresh all DHCP leases and re-register DNS
names. (Wait 5 minutes for the DNS entries in the cache to be reset and updated with the RRs in
the server’s database.)
IPCONFIG /registerDNS
You can also use the IPCONFIG command to dump all of the DNS cache entries.
IPCONFIG /flushDNS
It’s worth noting that the DNS server should eventually refresh the cache because each entry has
a Time-To-Live (TTL) associated with it. TTL indicates a length of time used by other DNS
servers to determine how long to cache information for a record before discarding it. For
example, most RRs created by the DNS server service inherit the minimum (default) TTL of 1
hour from the start-of-authority (SOA) RR; this prevents overly long caching by other DNS
servers. TTL is automatically decremented and eventually expires and disappears or is flushed
from the cache.
For an individual RR, you can specify a record-specific TTL that overrides the minimum
(default) TTL inherited from the SOA RR. You can also use TTL values of zero (0) for RRs that
contain volatile data not to be cached for later use after the current DNS query is completed.
Another problem that may occur is that the DNS server doesn’t resolve names for computers or
services outside your immediate network. For example, the DNS server may not resolve names
for computers located on an external network or the Internet. If a DNS server fails to resolve a
name for which it’s not authoritative, the cause is usually a failed recursive query. Recursion is
used in most DNS configurations to resolve names that aren’t located in the configured DNS
domain.
For recursion to work correctly, all DNS servers used in the path of the recursive query must be
able to respond to and forward correct data. If the DNS server fails a recursive query, you need
to review the server’s configuration. By default, all Win2K DNS servers have recursion enabled.
You can disable recursion using the DNS console to modify advanced server options. In
addition, recursion might be disabled if the DNS server is configured to use forwarders.

157
Chapter 5

Troubleshooting the Domain Controllers


You have several options for troubleshooting domain controllers. Before I discuss them, though,
it’s important to review the AD database and its associated files.

Understanding the AD Database and Its Associated Files


AD is stored on each domain controller in a local database. The database exists as a domain
database and, married with the directory services, performs authentication services to users and
applications. The domain controllers replicate their data with each other to ensure that copies of
the domain database on other domain controllers are current and accurate.
The AD database is implemented on an indexed sequential access method (ISAM) table manager
that has been referred to as “Jet.” The table manager is called the Extensible Storage Engine
(ESE). The ESE database is managed on each domain controller by the ESE.DLL file. The
database is a discrete transaction system that uses log files to ensure integrity; it uses support
rollback to ensure that the transactions are committed to the database.
The following files are associated with AD:
• NTDS.DIT—The main database file, ntds.dit grows as the database fills with objects and
attributes. However, the log files have a fixed size of 10 megabytes (MB). Any changes
made to the database are also made to the current log file and to the DIT file in the cache.
Eventually the cache is flushed. If a computer failure occurs before the cache is flushed,
ESE uses the log file to complete the update to the DIT file.
By default, the AD database is stored in <DRIVE>\WINNT\NTDS\NTDS.DIT. The log files for
the directory database are stored in the same directory by default. Their purpose is to track the
changes in the directory database, and they can grow to be quite large. Give all the room you can
to the log files; for example, you can place the log files on different disk drives than the database
file to reduce disk contention on a single drive.
• EDB.LOG and EDBXXXXX.LOG—EDB.LOG is the current log file for AD. When a
change is made to the database, it’s written to this file. When EDB.LOG becomes full of
database transactions, it’s renamed to EDBXXXXX.LOG, where XXXXX starts at 00001
and continues to increment using hexadecimal notation. AD uses circular logging, which
constantly deletes old log files. If you view the directory files at any time, you’ll notice
the EDB.LOG file and at least one or more EDBXXXXX.LOG files.
• EDB.CHK—Stores the database checkpoint, which identifies the point at which the
database engine needs to replay the logs. This file is typically used during recovery and
initialization.
• RES1.LOG and RES2.LOG—Placeholders designed to reserve the last 20MB of disk
space on the disk drive. Saving disk space gives the log files sufficient room to shut down
gracefully if other disk space is consumed.

158
Chapter 5

To manage the database, Windows provides a garbage-collection process designed to free space
in the AD database. This process runs on every domain controller in the enterprise with a default
lifetime interval of 12 hours. The garbage-collection process first removes “tombstones” from
the database. Tombstones are remains of objects that have been deleted. (When an object is
deleted, it’s not actually removed from the AD database. Instead, it’s marked for deletion at a
later date. This information is then replicated to other domain controllers. When the time expires
for the object, the object is deleted.) Next, the garbage collection-process deletes any
unnecessary log files. Finally, it launches a defragmentation thread to claim additional free
space.
Above the directory database is a database layer that provides an object view of the database
information by applying the schema to the database records. The database layer isolates the
upper logical layers of the directory from the underlying database system. All access to the
database occurs through this layer instead of allowing direct access to the database files. The
database layer is responsible for creating, retrieving, and deleting the individual database records
or objects and associated attributes and values.
In addition to the database layer, AD provides a directory service agent (DSA), an internal
process in Windows that manages the interaction with the database layer for the directory. AD
provides access using the following protocols:
• Lightweight Directory Access Protocol (LDAP) clients connect to the DSA using LDAP
• Messaging Application Programming Interface (MAPI) clients connect to the directory
through the DSA using the MAPI remote procedure call (RPC) interface
• Windows clients that use NT 4.0 or earlier connect to the DSA using the Security
Account Manager (SAM) interface
• AD domain controllers connect to each other during replication using the DSA and a
proprietary RPC implementation

Comparing Directory Information


When you want to compare directory information on domain controllers or directory partitions,
you can use the DSASTAT utility. DSASTAT detects and examines the differences among a user-
defined scope of objects on two domain controllers. It retrieves capacity statistics such as
megabytes per server, objects per server, and megabytes per object class. For example, you can
use DSASTAT to compare all users in the SALES Organizational Unit (OU) in the
COMPANY.COM domain with those in another directory partition by specifying the following:
DSASTAT -S:Company1;Company2 -B:OU=SALES,DC=COMPANY,DC=COM -
GCATTRS:ALL -SORT:TRUE -T:FALSE -P:16 -FILTER:
“(&(OBJECTCLASS=USER)(!OBJECTCLASS=COMPUTER))”
In this example, you can determine whether both domain controllers agree on the contents of the
OU=SALES,DC=COMPANY,DC=COM subtree. DSASTAT detects objects in one domain and
not the other (for example, if a creation or deletion hasn’t replicated) as well as differences in the
values of objects that exist in both. This example specifies a base search path at a subtree of the
domain. In this case, the OU name is SALES. The filter specifies that the comparison is
concerned only with user objects, not computer objects. Because computer objects are derived
from user objects in the class hierarchy, a search filter specifying OBJECTCLASS = USER
returns both user and computer objects.

159
Chapter 5

DSASTAT also allows you to specify the target domain controllers and additional operational
parameters by using the command line or an initialization file. DSASTAT determines whether
domain controllers in a domain have a consistent and accurate image of their own domain.
In addition, DSASTAT compares the attributes of replicated objects. You can use it to compare
two directory trees across replicas in the same domain or, in the case of a Global Catalog (GC),
across different domains. You can also use it to monitor replication status at a much higher level
than monitoring detailed transactions. In the case of GCs, DSASTAT checks whether the GC
server has an image that is consistent with the domain controllers in other domains. DSASTAT
complements the other replication-monitoring tools, REPADMIN and REPLMON, by ensuring
that domain controllers are up to date with one another.

First: What Changed?


The first thing you probably always ask yourself when a problem occurred is “what changed?”
After all, left to itself, AD will almost always continue working just fine. Change is required to
cause a problem. That change might be an incorrect configuration implemented by another
administrator, or it might be a change caused by corrupted data within the database (corruption
that might result from dropped network information or even corrupted disk files). Although
errors such as corruption are difficult to find, most problems are caused by configuration
changes, which is where auditing of some kind comes into play.
By configuring your Windows domain controllers to perform auditing—both success and
failure—of as many actions as possible, you’ll have a list of everything that has changed on each
domain controller. When a problem occurs, you can start by reviewing what has changed and
evaluating how each change might have resulted in the problem you’re troubleshooting. The
problem with Windows auditing, as I’ve mentioned before, is that it produces a torrent of
information that can be difficult to sift through.
Tools such as Microsoft Operations Manager (MOM) can help sift through the event logs and
call your attention to entries that have relevance to a current problem. Third-party tools such as
NetPro ChangeAuditor for Active Directory can help, too, by pinpointing certain types of
changes—like changes to AD’s configuration—and showing you both the “before” and “after”
values of a configuration. This “before and after” view can be critical for determining whether a
given change had something to do with a particular problem.

Analyzing the State of the Domain Controllers


The next step in troubleshooting and repairing problems with AD on the domain controller is to
verify that the directory portion is running without errors. The Domain Controller Diagnostic
(DCDIAG) utility allows you to analyze the current state of the domain controllers in a domain
or forest. It automatically performs the analysis and reports any problems with a domain
controller. DCDIAG requires a separate installation of the Support Tools from the Windows CD-
ROM; by default, it’s installed in \Program Files\Support Tools.

DCDIAG is intended to perform a fully automatic analysis with little user intervention. Thus, you
usually don’t need to provide too many parameters to it on the command line. DCDIAG doesn’t work
when run against a Windows workstation or server—it’s limited to working only with domain
controllers.

160
Chapter 5

DCDIAG consists of a set of tests that you can use to verify and report on the functional
components of AD on the computer. You can use this tool on a single domain controller, a group
of domain controllers holding a domain partition, or across a site. When using DCDIAG, you can
collect either a minimal amount of information (confirmation of successful tests) or data for
every test you execute. Unless you’re diagnosing a specific problem on only one domain
controller, I recommend that you collect only the severe errors for each one.
DCDIAG allows you to run the following tests to diagnose the status of a domain controller:
• Connectivity test—Verifies that DNS names for the domain controller are registered. It
also verifies that the domain controller can be reached by using TCP/IP and the domain
controller’s IP address. DCDIAG checks the connectivity to the domain controller by
using LDAP and checks that communications can occur by using an RPC.
• Replication test—Checks the replication consistency for each of the target domain
controllers. For example, this test checks whether replication is disabled and whether
replication is taking too long. If so, the utility reports these replication errors and
generates errors when there are problems with incoming replica links.
• Topology integrity test—Verifies that all domain controllers holding a specific partition
are connected by the replication topology.
• Directory partition head permissions test—Checks the security descriptors for proper
permissions on the directory partition heads, such as the schema, domain, and
configuration directory partitions.
• Locator functionality test—Verifies that the appropriate SRV RRs are published in DNS.
This test also verifies that the domain controller can recognize and communicate with
operations masters. For example, DCDIAG checks whether the locator can find a primary
domain controller (PDC) and GC server.
• Inter-site health test—Identifies and ensures the consistency of domain controllers among
sites. To do so, DCDIAG performs several tests, one of which identifies the inter-site
topology generator and identifies the bridgeheads for each site. This test determines
whether a bridgehead server is functioning; if not, the utility identifies and locates
additional backup bridgeheads. In addition, this test identifies when sites aren’t
communicating with other sites on the network.
• Trust verification test—Checks explicit trust relationships—that is, trusts between two
domain controllers in the forest. DCDIAG cannot check transitive trusts (Kerberos V5
trust relationships). To check transitive trusts, you can use the NETDOM utility.

For more information about the NETDOM utility, refer to the resource kit documentation or The
Definitive Guide to Windows 2000 and Exchange 2000 Migration (Realtimepublishers), a link to which
can be found at http://www.realtimepublishers.com. You can download the WS2K3 resource kit from
http://www.microsoft.com/downloads/details.aspx?FamilyID=9d467a69-57ff-4ae7-96ee-
b18c4790cffd&displaylang=en.

161
Chapter 5

• Diagnose replication latencies test—Analyzes incoming replications and watches for


delays or preemption of a higher-priority job. If the replication process is delayed or
preempted, latencies have occurred that slow the process. This problem typically occurs
because a higher-priority task hasn’t relinquished the computer’s processor or because a
large number of replication requests or tasks are pending. New replication tasks are
delayed because the domain controller is overloaded with replication requests.
• Replication of trust objects test—Checks whether the computer account object has been
replicated to all additional domain controllers in the domain. It also checks whether the
DSA object has been replicated to all replicas of the configuration directory partition.
• File Replication Service (FRS) test—Verifies that FRS has started successfully on all
domain controllers. If it hasn’t, this test delays the NETLOGON service from advertising
that domain controller.
• Critical services check test—Verifies that these key services are running: FRS, Inter-site
Messaging Service, Kerberos Key Distribution Center Service, Server Service,
Workstation Service, Remote Procedure Call Locator Service, Windows Time Service,
Distributed Link Tracking Client Service, Distributed Link Tracking Server Service, and
NETLOGON service. You can also use DCDIAG with the /repairmachineaccount
command-line switch, which re-creates the domain controller’s machine account if it has
been accidentally deleted.

Using NTDSUTIL
The Directory Services Management utility (NTDSUTIL.EXE) is a command-line utility included
in Windows that you can use to troubleshoot and repair AD. Although Microsoft designed the
utility to be used interactively via a command-prompt session (launched simply by typing
NTDSUTIL at any command prompt), you can also run it by using scripting and automation.
NTDSUTIL allows you to troubleshoot and maintain various internal components of AD. For
example, you can manage the directory store or database and clean up orphaned data objects that
were improperly removed.
You can also maintain the directory service database, prepare for new domain creations, manage
the control of the FSMOs, purge meta data left behind by abandoned domain controllers (those
removed from the forest without being uninstalled), and clean up objects and attributes of
decommissioned or demoted servers. At each NTDSUTIL menu, you can type help for more
information about the available options (see Figure 5.8).

162
Chapter 5

Figure 5.8: Viewing a list of available commands in the utility and a brief description of each.

Locating the Directory Database Files


Before you use the NTDSUTIL utility to carry out troubleshooting and integrity checking on the
AD database, you can use its Info command to determine the location and size of the directory
database files. The Info command:
• Reports the free space for all disks installed on the domain controller
• Reads the registry keys and associated location of the AD database files
• Reports the size of each of the database files, log files, and other associated files
Before you perform this check, you must either run NTDSUTIL after having booted the domain
controller via the special Directory Service Restore mode Safe Boot option or set the
environment variable SAFEBOOT_OPTION to a value of DSREPAIR under a normal boot of
Windows (for example, via the command SET SAFEBOOT_OPTION=DSREPAIR).
To execute the Info command, select Start, Programs, Accessories, Command Prompt. In the
Command Prompt window, type
NTDSUTIL
then press Enter.

163
Chapter 5

At the ntdsutil prompt, enter the word


files
The utility responds by displaying a file maintenance prompt. The following commands have
been entered and displayed to this point:
C:\>SET SAFEBOOT_OPTION=DSREPAIR
C:\>NTDSUTIL
ntdsutil: files
file maintenance:
At the file maintenance prompt, enter the word
info
to display the location and sizes of AD database files, log files, and other associated files. Figure
5.9 shows the output of this command on a domain controller.

This command works identically on Win2K and WS2K3, and can be used in mixed-version
environments with no problems.

Figure 5.9: Using the info command in NTDSUTIL to display the location and size of AD database files.

Using NTDSUTIL, you can relocate or move AD database files from one location to another on the
disk or move the database files from one disk drive to another in the same domain controller. You can
also move just the log files from one disk to another to free space for the data files (see “Moving the
AD Database or Log Files” later in this chapter).

164
Chapter 5

Checking for Low-Level Database Corruption


One of the first items you need to check when troubleshooting a domain controller in AD is that
the underlying database is functioning properly. To do so, you can use NTDSUTIL’s Integrity
option to detect any low-level database corruption of the directory files. The Integrity option
checks that the headers for the database are correct and that all of the internal database tables are
functioning and consistent with each other.
Before you perform a low-level database-integrity check, you need to start the domain controller
in Directory Service Restore mode. To do so, restart the domain controller. When you’re
prompted, press F8 to display the Advanced Options menu. Select Directory Service Restore
mode and press Enter, then log on using the Administrator account and password that you
assigned during the DCPROMO process.
To run the NTDSUTIL Integrity option, select Start, Programs, Accessories, Command Prompt.
In the Command Prompt window, type
NTDSUTIL
then press Enter.
At the ntdsutil prompt, enter the word
files
The utility responds by showing you the file maintenance category. The commands to this point
appear in the Command Prompt window as follows:
I:>NTDSUTIL
ntdsutil: files
file maintenance:
At the file maintenance prompt, enter the word
integrity
to start the low-level database check on the domain controller. (The Integrity command reads
every byte of the directory data file and displays the percentage of completion as a graph.
Depending on the size of your database and the type of hardware you’re using for the domain
controller, this process can take a considerable amount of time.) Figure 5.10 shows the results of
examining the low-level database structures in AD.

165
Chapter 5

Figure 5.10: Using the Integrity option in NTDSUTIL to examine the AD database on a domain controller.

To troubleshoot and repair the AD database, you can use the Integrity option only while the domain
controller is in Directory Service Restore mode.

Checking for Inconsistencies in the Database Contents


In addition to using NTDSUTIL to verify that the AD database is functioning properly, you can
use it to help you check the consistency of the contents of the AD database. The option in
NTDSUTIL that performs a contents check is the Semantic Checker. The Semantic Checker
option differs from the Integrity option in that the Semantic Checker addresses the contents
(objects and attributes) of the directory database, not just its low-level structures.

166
Chapter 5

When you run the Semantic Checker, it performs the following checks:
• Reference Count Check—Counts the number of references in the database tables and
matches the results with the values that are stored in the data file. This operation also
ensures that each object has a globally unique identifier (GUID) and distinguished name
(DN). For a previously deleted object, this operation ensures that the object has a deleted
time and date but doesn’t have a GUID or DN.
• Deleted Object Check—Ensures that the object has a time and date as well as a special
relative distinguished name (RDN), given when the object was originally deleted.
• Ancestor Check—Ensures that the DN tag is equal to the ancestor list of the parent—
could also be stated as a check that the DN of the object minus its RDN is equal to its
parent’s DN.
• Security Descriptor Check—Ensures that there is a valid descriptor and that the
discretionary access control list (DACL) isn’t empty.
• Replication Check—Verifies that there is an up-to-dateness vector in the directory
partition and checks to see that every object has meta data.
Like the Integrity option described earlier, you can run the Semantic Checker option only when
the domain controller is in Directory Service Restore mode. To run in this mode, restart the
domain controller. When you’re prompted, press F8 to display the Advanced Options menu.
Select Directory Service Restore mode and press Enter, then log on using the administrator
account and password that you assigned during the DCPROMO process.
To run the Semantic Checker option, select Start, Programs, Accessories, Command Prompt. In
the Command Prompt window, type
NTDSUTIL
then press Enter. At the ntdsutil prompt, type
semantic database analysis
then press Enter. Next, type
verbose on
This command displays the Semantic Checker. To start the Semantic Checker without having it
repair any errors, type
go
To start it and have it repair any errors that it encounters in the database, enter
go fixup
The commands to this point appear in the Command Prompt window as follows:
I:>NTDSUTIL
ntdsutil: semantic database analysis
semantic checker: verbose on
Verbose mode enabled.
semantic checker: go
Figure 5.11 shows the results of using the NTDSUTIL Semantic Checker.

167
Chapter 5

Figure 5.11: Using the NTDSUTIL Semantic Checker option to check the consistency of the contents of the
directory database.

Again, the output of this command will be identical on Win2K and WS2K3, making it easier for
administrators working in mixed-version environments.

Cleaning Up the Meta Data


The NTDSUTIL program allows you to clean up the meta data that is left behind after a domain
controller is demoted. The utility that you use to demote a domain controller is the DCPROMO
utility (DCPROMO.EXE). This utility is used to promote a server to a domain controller and
demote a domain controller to a member server.
As part of the demotion process, DCPROMO removes the configuration data for the domain
controller from AD. This data takes the form of an NTDS Settings object, which exists as a child
to the server object in the Active Directory Sites and Services Manager and is located in AD as
the following object:
CN=NTDS
Settings,CN=<server_name>,CN=Servers,CN=<site_name>,CN=Sites,CN=C
onfiguration,DC=<domain>...
The attributes of the NTDS Settings object contain values about the domain controller’s
replication partners, naming contexts, whether the domain controller is a GC server, and the
default query policy. The NTDS Settings object is also a container that may have child objects
that represent the replication partners. This data is required for the domain controller to
synchronize quickly but is retired upon demotion. If the NTDS Settings object isn’t properly
removed when the domain controller is demoted, you can use the NTDSUTIL utility to manually
remove the NTDS Settings object.

168
Chapter 5

Before you manually remove the NTDS Settings object for any server, check that replication has
occurred after the domain controller has been demoted. Using the NTDSUTIL utility improperly can
result in partial or complete loss of AD functionality. (For a description of how to check whether
replication has occurred, see Chapter 4.)

To clean up the meta data, select Start, Programs, Accessories, Command Prompt. At the
command prompt, type
NTDSUTIL
then press Enter. At the ntdsutil prompt, type
metadata cleanup
then press Enter. Based on the options returned to the screen, you can use additional
configuration parameters to ensure that the removal occurs correctly.
Before you clean up the metadata, you must select the server on which you want to make the
changes. To connect to a target server, type
connections
then press Enter. If the user who is currently logged on to the computer running NTDSUTIL
doesn’t have administrative permissions on the target server, alternative credentials need to be
supplied before making the connection. To supply alternative credentials, type the following
command, then press Enter:
set creds <domain_name user_name password>
Next, type
connect to server <server_name>
then press Enter. You should receive confirmation that the connection has been successfully
established. If an error occurs, verify that the domain controller you specified is available and
that the credentials you supplied have administrative permissions on the server. When a
connection has been established and you’ve provided the right credentials, type
quit
then press Enter, to exit the Connections menu in NTDSUTIL. When the Meta Data Cleanup
menu is displayed, type
select operation target
and press Enter. Type
list domains
then press Enter. A list of domains in the forest is displayed, each with an associated number. To
select the appropriate domain, type
select domain <number>
and press Enter (where <number> is the number associated with the domain of which the
domain controller you’re removing is a member). The domain you select determines whether the
server being removed is the last domain controller of that domain.

169
Chapter 5

Next, type
list sites
then press Enter. A list of sites, each with an associated number, is displayed. Type
select site <number>
and press Enter (where <number> is the number associated with the site of which the server
you’re removing is a member). You should receive a confirmation, listing the site and domain
you chose. Once you receive a confirmation, type
list servers in site
and press Enter. A list of servers in the site, each with an associated number, is displayed. Type
select server <number>
and press Enter (where <number> is the number associated with the server you want to remove).
You receive a confirmation, listing the selected server, its DNS host name, and the location of
the server’s computer account that you want to remove.
After you’ve selected the proper domain and server, type
quit
to exit the current NTDSUTIL submenu. When the Meta Data Cleanup menu is displayed, type
remove selected server
and press Enter. You should receive confirmation that the server was removed successfully. If
the NTDS Settings object has already been removed, you may receive the following error
message:
Error 8419 (0x20E3)
The DSA object couldn’t be found
Type
quit
at each menu to quit the NTDSUTIL utility. You should receive confirmation that the connection
disconnected successfully.

170
Chapter 5

Moving the AD Database or Log Files


There are several common problems that occur with AD that all stem from the same source: low
disk space. These problems may surface as any of a number of error messages in the Windows
event logs. The following list highlights the most common of these errors along with their
associated symptoms and solutions.
• The following error message may occur when you start AD on a domain controller:
Lsass.exe - System Error
Directory Services could not start because of the following
error: There is not enough space on the disk. Error Status:
0xc000007f. Please click OK to shutdown this system and reboot
into Directory Service Restore Mode, check the event logs for
more detailed information.
When this error occurs, the following events are recorded in the event logs for the directory
service on the domain controller and can be viewed by using Event Viewer:
Event ID: 1393
Attempts to update the Directory Service database are failing
with error 112. Since Windows will be unable to log on users
while this condition persists, the Netlogon service is being
paused. Check to make sure that adequate free disk space is
available on the drives where the directory database and log
files reside.
Event ID: 428
NTDS (272) The database engine is rejecting update operations due
to low free disk space on the log disk.
• The following warning message is recorded in the System Log of the domain controller
and can be viewed by using Event Viewer:
Event ID 2013:
The D: disk is nearing Capacity. You may need to delete some
files.
If the disk drive runs out of disk space, AD won’t start up. Windows attempts to avoid this
situation, but it can occur if you ignore warnings about low disk space in the System Log or if
you run large scripts against AD for mass directory imports. To resolve the problem of having no
disk space, you can either make space available on the same disk drive or move AD to a separate
drive. The first method requires you to simply reduce the number of files or folders on the same
disk drive as the directory database.
If you want to move the AD database to another drive on the domain controller, you can use the
NTDSUTIL utility to move either the database file or the database log files. This method is ideal
when you cannot move data to another drive to free space. If all drives are at capacity, you might
need to install an additional hard disk in the domain controller.
Before you move the directory database file or log files, you need to start the domain controller
in Directory Service Restore mode. To do so, restart the domain controller. When you’re
prompted, press F8 to display the Advanced Options menu. Select Directory Service Restore
Mode and press Enter, then log on using the administrator account and password that you
assigned during the DCPROMO process.

171
Chapter 5

To move the directory database file or log files, locate the drive containing the directory and log
files. The directory database (NTDS.DIT) and log files are located in the NTDS folder on the
root drive by default. (However, the administrator may have changed their locations during the
DCPROMO process.) Next, select Start, Programs, Accessories, Command Prompt. In the
Command Prompt window, type
NTDSUTIL
then press Enter. At the ntdsutil prompt, enter the word
files
The utility displays the file maintenance category. The commands to this point should appear as
follows:
I:>NTDSUTIL
ntdsutil: files
file maintenance:
At the file maintenance prompt, enter the word
info
to display the location of the AD database files, log files, and other associated files. Note the
location of the database and log files.
To move the database files to a target disk drive, type the following command at the ntdsutil
prompt:
MOVE DB TO %s (where %s is the target folder on another drive)
To move the log files to a target disk drive, type the following command at the ntdsutil prompt.
(The target directory where you move the database file or log files is specified by the %s
parameter. The Move command moves the files and updates the registry keys on the domain
controller so that AD restarts using the new location.)
MOVE LOGS TO %s (where %s is the target folder on another drive)
To quit NTDSUTIL, type
quit
twice to return to the command prompt, then restart the domain controller normally.

Completely back up AD on the domain controller before you execute the Move command. In addition,
back up AD after you move the directory database file and log files; restoring the directory database
will then retain the new file location.

Repairing the AD Database


You can use the NTDSUTIL Repair feature to repair the AD database file. However, you should
use it only as a last resort for recovering the database—if a valid backup is available, always use
it first to restore the data. The reason is that repairing the directory database doesn’t always work
correctly. For example, if a database file is corrupt, using the NTDSUTIL Repair feature may not
restore all objects and attributes. In fact, in some cases, there is a risk that using the Repair
feature will cause further data to be lost.

172
Chapter 5

To repair the AD database file, select Start, Programs, Accessories, Command Prompt. In the
Command Prompt window, type
NTDSUTIL
then press Enter. At the ntdsutil prompt, enter the word
files
The utility displays the file maintenance category. At the file maintenance prompt, enter the
word
repair
The commands to this point should appear as follows:
I:>NTDSUTIL
ntdsutil: files
file maintenance: repair
As soon as the repair operation has completed, run the NTDSUTIL Semantic Checker on the
database. Figure 5.12 shows the results of using the NTDSUTIL Repair option.

Figure 5.12: Using NTDSUTIL as a last resort to repair the directory database files.

173
Chapter 5

ADcheck: A Free AD and Windows Network Diagnostic Tool


Although Windows and its resource kit provide some basic tools for performing troubleshooting tasks,
they aren’t especially easy to use. NetIQ provides an excellent—and free—utility for performing a host of
AD diagnostic and troubleshooting tasks. ADcheck provides five essential categories of Windows
diagnostics:
● Test Domain Controller—Checks the availability of the domain controller, validates DNS records (for
example, SRV RRs), and binds to the domain controller to verify AD status
● List Domain Controllers—Lists each domain controller along with its name, availability, Active
Directory Service Interfaces (ADSI) scripting location, and site location
● List Operations Masters—Lists FSMO role holders, compares them with an internal best practices list,
and recommends changes when necessary
● Test Replication—Checks domain replication topology and displays diagnostic information about
replication partners
● Show Domain Controller Status—Provides summaries of the status of domain controllers, including
replication errors and partners, AD site analysis, and charts that show recommended changes to the
placement of domain controllers.
ADcheck is also capable of generating some very detailed reports, each of which shows potential causes
for problems as well as the problems themselves. For some reports, it also compares the current
configuration with an internal best practices guideline and may recommend changes. Given that it’s
completely free, this tool is something that no Windows network administrator should be without. You can
download ADcheck from http://www.netiq.com/adcheck/download.asp.

Troubleshooting Secure Channels and Trust Relationships


When a Windows system joins a domain, a computer account is created. Whenever the system
starts after that, it uses the password for that account to create a secure channel with the domain
controller for its domain. Requests sent on the secure channel are authenticated, and sensitive
information (such as passwords) is encrypted, but the channel isn’t integrity-checked, and not all
information is encrypted.
There are many reasons why a domain controller cannot communicate on a secure channel. For
example, the user or domain controller may not have the appropriate access permissions or trust
relationships. You can test the status of secure channels and trust-relationship links using the
Resource Kit’s NLTEST command-line utility.
To validate access to resources in a trusting domain, the trusting domain controller establishes a
secure channel with a domain controller in the trusted domain. Pass-through authentication then
occurs over this secure channel. However, in WAN environments, the trusted domain’s domain
controllers may be dispersed over a wide variety of fast and slow links. If a fast link is
unavailable when the trusting domain controller wants to establish a secure channel, the secure
channel may be established with a domain controller over a slow link. Even when the fast link is
reestablished, pass-through authentication may occur over the slow link to the trusted domain’s
domain controller.
The mechanism for establishing a secure channel is very similar to the normal user-logon
process. That is, the trusting domain controllers send out logon requests to all known domain
controllers in the trusted domain. The trusting domain controllers then set up a secure channel
with the first trusted domain controller that responds to this request.
Normally, this method is preferred because the first domain controller to respond to a logon
request is typically the controller that is located across the fastest communication link. However,

174
Chapter 5

if that link is down or the “fast” domain controller is unavailable, a domain controller over a
slower link may respond first, and all pass-through authentications occur over the slow link.
There is a built-in mechanism in Windows that tracks how long authentication takes over the
existing secure channel. If pass-through authentication takes longer than 45 seconds, that fact is
noted. If two such authentications exceed that limit, a rediscovery process begins, the current
secure channel is broken, and the trusting domain’s PDC once again sends out logon requests to
all known trusted domain controllers. However, because this mechanism tracks only those
communications that last longer than 45 seconds, users may see a 40-second delay every time
they attempt to use a resource without a secure-channel reset taking place.
You can run the NLTEST utility on the trusting domain controller to break and re-initialize a
secure channel (for example, when the secure-channel password was last changed) and obtain
information about an existing trust relationship. You can also use NLTEST to restart the
discovery process for a new trusted domain controller. The syntax of NLTEST is:
NLTEST /sc_query:<account_domain>
Where <account_domain> is the name of the trusted domain. This command returns the name of
the trusted domain controller with which the trusting domain controller has a secure channel. If
that domain controller is unacceptable, use the following syntax:
NLTEST /sc_reset:<account_domain>

Troubleshooting the Operations Masters


The operations masters in AD perform single-master operations for the forest and domains and
are officially called Flexible Single Master Operations (FSMOs). Several operations in the
directory have single-master operations—operations such as updating the schema, creating new
domains in a forest, issuing new blocks of relative IDs (RIDs), and supporting domains and
clients that are running NT 4.0 and earlier.
The forest has two operations masters that manage certain forest-wide single-operation activities,
and each domain has three operations masters that manage certain domain-wide activities. For
example, a forest with two domains would have eight operations masters: two for the forest and
three domain-specific operations master roles in each domain. The five FSMO roles are:
• Schema master—Forest-wide and one per forest
• Domain naming master—Forest-wide and one per forest
• Relative ID (RID) master—Domain-specific and one for each domain
• Primary domain controller (PDC) emulator—Domain-specific and one for each domain
• Infrastructure master—Domain-specific and one for each domain.
Because the operations masters are assigned to specific domain controllers in the forest and
domains and are critical to the operation of AD, your first step to troubleshoot each operations
master is to use the domain-controller troubleshooting techniques described in “Troubleshooting
the Domain Controllers” earlier in this chapter. Once you’re assured that the domain controller
itself is operating properly, you can turn your attention to the operations masters.

175
Chapter 5

When Operations Masters Fail


If a domain controller holding a FSMO (operations role) master fails, major network problems
are almost guaranteed to ensue. The following sections explore a list of the various operations
master roles, their functions, and the effects of losing them.

Schema Master
If the domain controller holding the forest-wide schema master role fails, you or your directory
administrators won’t be able to modify or extend the AD schema. Schema modifications
typically occur when you install directory-enabled applications such as management utilities that
rely on the directory for information. These applications try to modify or extend the current
schema with new object classes, objects, and attributes. If the applications being installed cannot
communicate with the domain controller that has been designated as the schema master,
installation will fail.
The schema master solely controls the management of the directory schema and propagates
updates to the schema to the other domain controllers as modifications occur. Because only
directory administrators are allowed to make changes, the schema operations master isn’t visible
to directory users and doesn’t affect them.

Domain Naming Master


If the domain naming master role holder in a forest fails, you lose the ability to add and remove
domains in the forest. When a new domain is created or deleted from the forest structure, the
domain controller that has been designated as the domain naming master is contacted and
verifies that the change operation can be completed.
The domain naming master is the only domain controller that controls the creation and deletion
of domains, and it propagates the changes to the other domain controllers as necessary. Because
only directory administrators are allowed to make structural domain changes to the forest, the
domain naming operations master isn’t visible to directory users and doesn’t affect them.

RID Master
If the domain controller that stores the RID master role fails or stops communicating, domain
controllers in the same domain cannot obtain the RIDs they need. (RIDs are unique security
IDs.) Domain controllers use RIDs when the domain controllers create users, groups, computers,
printers, and other objects in the domain; each object is assigned a RID. The RID master role
allocates blocks of RIDs to other domain controllers in its domain. As I mentioned at the
beginning of this section, there is only one RID master role per domain.

If a domain controller has remaining (unassigned) RIDs in its allocated block, the RID master role
doesn’t need to be available when new object accounts are created.

176
Chapter 5

Infrastructure Master
If the domain controller that stores the infrastructure master role fails, a portion of AD won’t
function properly. The infrastructure master role controls and manages the updates to all cross-
domain references, such as group references and security identifier (SID) entries in access
control lists (ACLs). For example, when you add, delete, or rename a user who is a member of a
group, the infrastructure master controls the reference updates. There is always only one
infrastructure master role in each domain in a forest.
Because only one domain controller is assigned to perform this role, it’s important that it doesn’t
fail. However, if it does, the failure is not visible to network users. In fact, it’s visible to only
directory administrators when they’ve recently moved or renamed a large number of object
accounts. In addition, having one domain controller assigned to this role can be a big security
problem.

If you force a transfer of the infrastructure master role from its original domain controller to another
domain controller in the same domain, you can transfer the role back to the original domain controller
after you’ve returned it to production.

It is strongly recommended that you not put the infrastructure master role on any domain controller
that is also acting as a GC server, unless you have only one domain in your forest. For more
information about FSMO placement rules and best practices, see Microsoft Product Support Services
article 223346, at http://support.microsoft.com.

PDC Emulator
If the PDC emulator fails or no longer communicates, users who depend on its service are
affected. These are down-level users from NT 4.0, Window 98, and Windows 95. The PDC
emulator is responsible for changes to the SAM database, password management, account
lockout for down-level workstations, and communications with the domain controllers.

If you force a transfer of the PDC emulator role from its original domain controller to another domain
controller in the same domain, you can transfer the role back to the original domain controller after
you’ve returned it to production.

Determining the Operations Master Role Holders Locations


An important step in the process of troubleshooting problems with operations master role holders
is identifying which domains controllers hold the various forest- and domain-wide roles. There
are actually several methods of determining the location of FSMO role holder in Windows.

177
Chapter 5

Using the DSA and Schema MMC Snap-Ins


To determine the RID master, PDC emulator, and infrastructure master FSMO role holders of a
selected domain by using Windows’ built-in tools, select Start, Run, type
dsa.msc
and press Enter or click OK. Right-click the selected Domain Object in the top left pane, then
click Operations Masters. Select the PDC tab to view the server holding the PDC master role,
then select the Infrastructure tab to view the server holding the infrastructure master role, and
select the RID Pool tab to view the server holding the RID master role.
Determining the forest schema master role holder is a bit trickier. To do so, select Start, click
Run, type
mmc
then click OK. On the Console menu, click Add/Remove Snap-in, click Add, double-click
Active Directory Schema, click Close, and then click OK. Right-click Active Directory Schema
in the top left pane, then click Operations Masters to view the server holding the schema master
role.

For the Active Directory Schema snap-in to be listed as an available, you’ll have to have already
registered the Schmmgmt.dll file. If it doesn’t appear as an option, follow these steps to register it:
select Start, Run, type
regsvr32 schmmgmt.dll
in the Open box, and click OK. A message will be displayed confirming that the registration was
successful.

Determining the forest’s domain naming master role holder requires you to select Start, Run,
type
mmc
then click OK. On the Console menu, click Add/Remove Snap-in, click Add, double-click
Active Directory Domains and Trusts, click Close, and then click OK. In the left pane, click
Active Directory Domains and Trusts. Right-click Active Directory Domains and Trust, and
click Operations Master to view the server holding the domain naming master role in the Forest.
Although these methods certainly work, they aren’t necessarily the easiest. The following
sections describe some additional methods for determining FSMO role holders on your network.

178
Chapter 5

Using NTDSUTIL
NTDSUTIL is a tool included with all editions of Windows Server—it is the only tool that shows
you all the FSMO role owners. To view the role holders, select Start, click Run, type
cmd
in the Open box, then press Enter. Type
ntdsutil
and then press Enter. Type
domain management
and then press Enter. Type
connections
and then press Enter. Type
connect to server <server_name>
where <server_name> is the name of the domain controller you want to view, then press Enter.
Type
quit
and then press Enter. Type
select operation target
and then press Enter. Type
list roles
for connected server, and then press Enter.

Using the Resource Kit’s Dumpfsmos.cmd


The resource kit contains a batch file named Dumpfsmos.cmd that you can use to quickly list
FSMO role owners for your current domain and forest. The .cmd file uses NTDSUTIL to
enumerate the role owners. Dumpfsmos.cmd takes a single argument, the name of the domain
controller to which it should connect when querying for FSMO locations. The usage of the
command is:
Dumpfsmos <server_name>

Using DCDIAG
Another method involves the use of the DCDIAG command. On a domain controller, run the
following command
dcdiag /test:knowsofroleholders /v
Note that the /v switch is required. This operation lists the owners of all FSMO roles in the
enterprise known by that domain controller.

179
Chapter 5

Using AD Replication Monitor


Another method to view the FSMO role holders is to use the AD Replication Monitor
(Replmon.exe) utility. Before you can use AD Replmon.exe, you’ll need to install it. The AD
Replication Monitor utility is part of the Support Tools, which are located on the Windows CD-
ROM in the \Support\Tools folder. Run the Setup.exe file in this folder to install the tools. Once
installed, you can start the AD Replication Monitor utility by selecting Start, Programs, Support
Tools, Tools, and selecting AD Replication Monitor. Once the utility is running, you can
determine the operations master role holders for the forest and domain by right-clicking
Monitored Servers, then adding one or more servers using the wizard. Next, right-click the
servers, then click Properties, and finally, select the FSMO Roles tab. The domain controllers
that hold the operations master roles are displayed under the Owner column. To test the
connectivity to each of the operations master role holders, click Query to the right of each role.

Using Third-Party Utilities


Certain third-party utilities, such as NetPro’s DirectoryAnalyzer and DirectoryTroubleshooter
and NetIQ’s ADcheck utility, provide features to determine the domain controllers acting as
FSMO role holder servers. Figure 5.13 shows an example of viewing the schema master using
DirectoryAnalyzer.

Figure 5.13: Using a third-party utility to determine which domain controller in your forest holds a particular
FSMO role.

180
Chapter 5

Seizing an Operations Master Role


If a domain controller holding one or more operations master roles is down during a critical time,
will be unavailable for a long time, or is permanently out of service, you’ll need to take steps to
force the transfer of the role(s) to another domain controller. You can accomplish this feat by
using NTDSUTIL. Forcing this type of operations master role transfer is also referred to as
seizing the role on a domain controller. However, before you decide whether to seize the role of
an operations master, be aware that doing so is a major step that should be taken only if the
affected domain controller will never be brought back online. Forcing a transfer in this case
should be a permanent action.
Once you’ve determined where the current operations master role holders in a domain or forest
are (using the information in the previous section), you can use the NTDSUTIL program to
transfer the operations master role from one domain controller in the forest or domain to another
in the same forest or domain. To seize an operations master role on a selected domain controller,
on the target domain controller (the domain controller that will be taking over the forest- or
domain-wide operation master role), select Start, Run. In the Open dialog box, type
NTDSUTIL
then click OK. If you’re not running NTDSUTIL on the target domain controller, you need to
select and connect to it. At the ntdsutil prompt, type
connections
then press Enter. Type
connect to server <server_name>
where <server_name> is the name of the server you want to use, then press Enter. To supply
additional credentials, type
set creds <domain_name user_name_password>
and press Enter. At the Server Connections prompt, type
quit
then press Enter again. At the ntdsutil prompt, enter the word
roles
To seize the role on the currently connected domain controller, enter
seize <role_type>
where <role_type> is one of the following: schema master, domain naming master, rid master,
infrastructure master, or pdc. (For a list of roles that you can seize, enter
?
at the FSMO Maintenance prompt or see the list of roles at the beginning of this section.)
After you seize the roles, type
quit
then press Enter to return to the previous menu in the NTDSUTIL interface. Repeat this step
until you’ve exited the utility. Reboot the domain controller that seized the operations master
role to complete the role change operation.

181
Chapter 5

If the current operations master role holder domain controller is online and accessible or can be
repaired and brought back online, it’s recommended that you transfer the role using NTDSUTIL’s
transfer command rather than the seize command. For more information about seizing and
transferring FSMO roles, see Microsoft Product Support Services articles 255504 and 223787 at
http://support.microsoft.com.

Checking for Inconsistencies Among Domain-Wide Operations Masters


Another way to troubleshoot problems on operations masters is to check for inconsistencies
among the domain controllers in a domain. If the domain controllers don’t report operations
masters consistently, long-term problems, such as replication problems, can arise. Several third-
party utilities are capable of detecting domain controller inconsistencies. For example, NetPro’s
DirectoryAnalyzer can inspect exactly what each domain controller believes are the domain-
wide master role assignments. If all domain controllers fail to report the same values for all the
operations masters, there is a problem, which the tool will report.
Figure 5.14 shows an example of using a third-party utility (NetPro’s DirectoryAnalyzer) to
check for operations master role holder inconsistencies. As the figure shows, the domain
controller COMP-DC-04 lists COMP-DC-01 as the owner of the PDC operations master, while
domain controller COMP-DC-O3 is the actual owner. Thus, the owner of the PDC operations
master is inconsistent across the domain controllers.

Figure 5.14: Checking for consistency of the operations masters on domain controllers.

182
Chapter 5

Troubleshooting the Replication Topology


When you troubleshoot replication problems and errors, it’s important to know who the
replication partners of a specific domain controller are and the status of replication with each
one. The following sections explore methods to gather this information.

Viewing the Replication Partners for a Domain Controller


You can view the replication partners for a specific domain controller by using two tools,
DirectoryAnalyzer and REPADMIN. When you use DirectoryAnalyzer to see replication
partners, you’re viewing the replication topology for the selected domain controller in a forest,
and you can check replication consistency among replication partners. In addition,
DirectoryAnalyzer constantly checks the replication topology to ensure that it’s transitively
closed. If it isn’t, DirectoryAnalyzer generates an alert.
Figure 5.15 shows the Replication Information tab in the Browse Directory By Site view. This
tab allows you to view the replication topology and the last successful replication cycle for each
replication partner. It also shows the replication partners and any errors that occurred during
replication.

Figure 5.15: Using DirectoryAnalyzer to view the replication partners for each domain controller.

183
Chapter 5

You can also use the Replication Administration (REPADMIN) utility to monitor the current
links to other replication partners for a specific domain controller, including the domain
controllers that are replicating to and from the selected domain controller. Viewing these links
shows you the replication topology as it exists for the current domain controller. By viewing the
replication topology, you can check replication consistency among replication partners, monitor
replication status, and display replication meta data. To use REPADMIN to view the replication
partners for a domain controller, enter the command
REPADMIN /SHOWREPS

Forcing Domain Controllers to Contact Replication Partners


If you detect errors when viewing the replication partners using either the DirectoryAnalyzer
utility or the REPADMIN tool, you can manually force the domain controller to contact its
replication partners and authenticate with them. Doing so is necessary to create the replication
links. You can use the following command to force contact:
REPADMIN /KCC

During normal operation, the Knowledge Consistency Checker (KCC) generates automatic replication
topology for each directory partition on the domain controllers. You don’t need to manually manage
the replication topology for normal operation.

Tracking Replicated Changes


After the replication links have been re-created, future replication processes should occur
automatically at the normal scheduled time. You can check whether replication is occurring
normally among replication partners by tracking a particular replicated change. Doing so allows
you to ensure that the target domain controller is receiving the change. To perform this check,
enter the following for a specific object in AD:
REPADMIN /SHOWMETA CN=CJOHNSON,OU=ENGINEERING,DC=COMPANY,DC=COM
<domain_controller>
In this command, <domain_controller> is the host name of the target domain controller for
which you’re tracking replicated changes for CJOHNSON in the ENGINEERING OU in the
COMPANY.COM domain. The output from this command shows the Update Sequence Number
(USN), originating DSA, date and time, version number, and replicated attribute.

In addition to tracking replicated changes, many third-party utilities constantly evaluate replication
latency across all domain controllers. If the latency exceeds the specified threshold, the utility will
generate an administrative alert and/or generate a log entry reporting the condition.

Forcing Replication Among Replication Partners


There are several methods that you can use to initiate replication among direct replication
partners in a common name context. For each of the following methods, the source domain
controller describes the domain controller that replicates changes to a replication partner. The
destination domain controller receives the changes.

184
Chapter 5

To force replication among replication partners, you can use REPADMIN to issue a command to
synchronize the source domain controller with the destination domain controller by using the
object GUID of the source domain controller. To accomplish the task of forcing replication, you
need to find the GUID of the source server. Enter the following command to determine the
GUID of the source domain controller:
REPADMIN /SHOWREPS <destination_server_name>
You can find the GUID for the source domain controller under the Inbound Neighbors section of
the output. First, find the directory partition that needs synchronization and locate the source
server with which the destination is to be synchronized. Then note the GUID value of the source
domain controller. Once you know the GUID, you can initiate or force replication by entering
the following command:
REPADMIN /SYNC <directory_partition_DN> <destination_server_name>
<source_server_objectGUID>
The following example shows how to run this command to initiate replication between DC1 and
DC2 of the domain partition called COMPANY.COM. The replication is forced from the source
domain controller, DC1, to the destination domain controller, DC2. To perform the replication,
use the following command:
REPADMIN /SYNC DC=COMPANY,DC=COM DC1 d2e3ffdd-b98c-11d2-712c-
0000f87a546b
If the command is successful, the REPADMIN utility displays the following message:
REPLICASYNC() FROM SOURCE: d2e3badd-e07a-11d2-b573-0000f87a546b,
TO DEST: DC1 IS SUCCESSFUL.
Optionally, you can use the following switches at the command prompt:
• /FORCE—Overrides the normal replication schedule
• /ASYNC—Starts the replication event without waiting for the normal replication to finish
You’ll typically force replication only when you know that the destination domain controller has
been down or offline for a long time. It also makes sense to force replication to a destination
domain controller if network connections haven’t been working for a while.

Viewing Low-Level AD Replication Status


You can troubleshoot the replication topology another way by viewing the low-level status of
AD replication. The REPLMON utility allows you to do so. Because this tool is graphically
based, you can view the replication topology in graphical form and monitor the status and
performance of replication among domain controllers.

185
Chapter 5

REPLMON provides a view only from the domain controller perspective. Like REPADMIN,
you can install it from the \Support\Tools folder on the Windows CD-ROM. REPLMON has two
options that you’ll find helpful when monitoring AD:
• Generate Status Report—Generates a status report for the domain controller. The report
includes a list of directory partitions for the server, the status of the replication partners
for each directory partition, and the status of any Group Policy Objects (GPOs). It also
includes the status of the domain controllers that hold the operations master roles, a
snapshot of performance counters, and the registry configuration of the server.
• Show Replication Topologies—Displays a graphical view of the replication topology.
This option can also display the properties of the domain controller and any intra-site or
inter-site connections that exist for the domain controllers.

Checking for KCC Replication Errors


Another method of troubleshooting replication problems is to check the entries that appear in the
Directory Service log in Event Viewer. Event Viewer lists errors that pertain to replication, such
as KCC errors. For example, you might see the entry (ID 1311 from event source NTDS KCC) in
the Directory Service log; it means that the Directory Service consistency checker has
determined that for changes to propagate across all sites, replication cannot be performed with
one or more critical domain controllers.
This error could also indicate that there isn’t enough physical connectivity to create a spanning
tree connecting all of the sites. A spanning tree is a network algorithm that most network
switches use to build tables of media-access-control-address and port-number associations. This
behavior can occur if the KCC has determined that a site has been orphaned from the replication
topology.
One domain controller in a specific site owns the role of creating inbound replication connection
objects among bridgehead servers from other sites. This domain controller is known as the Inter-
Site Topology Generator. While analyzing the site link and site link bridge structure to determine
the most cost-effective route to synchronize a naming context between two points, it might
determine that a site doesn’t have membership in any site link and therefore has no means of
creating a replication object to a bridgehead server in that site.
The first site in AD (named Default-First-Site-Name) is created automatically. It’ a member of
the default site link (DEFAULTIPSITELINK) that is also created automatically and used for
RPC communication over TCP/IP among sites. If you create two additional sites—for instance,
Site1 and Site2—you need to define a site link that each site is going to be a member of before
these sites can be written to AD. However, you can also edit the properties of a site link and
modify which sites reside in it. If you remove a site from all site links, the KCC displays the
error message listed earlier to indicate that a correction needs to be made to the configuration.

When the KCC generates this error message, it’s in a mode in which it doesn’t remove any
connections. Normally, the KCC cleans up old connections from previous configurations or redundant
connections. Thus, you might find that there are extra connections during this time. The solution is to
correct the topology problem so that the spanning tree can form.

186
Chapter 5

Troubleshooting by Using Change Management


No discussion of troubleshooting AD would be complete without mentioning one of the most
important techniques available to the network administrator: change management. In any
hardware- or software-troubleshooting endeavor, one of the most important questions an
administrator can ask is: What changed to cause this problem? As you’ve probably learned from
your own experience, most problems don’t occur in a vacuum. Rather, they develop as a result of
some change that is made or that occurs in the system. Thus, the ability to know what was
changed on the network—and when—is an invaluable troubleshooting tool.
Obtaining this type of change-management data for AD has been difficult or impossible.
Windows doesn’t record such changes automatically in its event logs, and logging AD
infrastructure changes manually is cumbersome and prone to error. Even third-party AD-
management tools have been challenged in this area. Although several tools are available to
diagnose problems with and monitor real-time events related to AD infrastructure components,
the ability to analyze the progression of changes to these components over time has remained
elusive.
This situation changed recently when NetPro released the ChangeAuditor for Active Directory
tool. ChangeAuditor is unique in that it gives administrators a wide array of information about
changes to an AD network that occur over time:
• Records of all AD infrastructure and configuration changes across the enterprise
• A historical record of AD changes, including modifications to such key infrastructure
elements as directory structure, replication, security, and schema
• An enterprise-level view of changes to the AD object population that occur over time
• Information detailing who made changes
• “Before and after” views for many changes, showing you not only that a change occurred
but also the old and new values
With this kind of information in hand, it becomes much easier to investigate the origin of a
problem and resolve it as well as to keep yourself generally “in the loop” about important
changes that are being made to your AD infrastructure. You can view at a glance events such as
additions or deletions to AD elements such as domains, OUs, domain controllers, groups and
Group Policy, and the AD schema, and you can view the changes to the population of these
elements over time (or at a particular time).
For example, when users at a particular site report suddenly slow network logons that began that
morning, you might analyze your change log and determine that the only domain controller
serving the domain on that site was removed by a site administrator the previous evening.
ChangeAuditor consolidates all of the collected data into a single, centralized database, and
provides convenient access to it by using a helpful reporting interface.

It’s a good idea to use change-management information proactively in addition to using it reactively.
For example, you might use the object-population information to analyze and plan network capacity
and to predict future trends and infrastructure needs. This information is invaluable for management
reports and IT budget planning.

187
Chapter 5

Summary
Troubleshooting AD means identifying and analyzing problems that occur and repairing them in
the various systems. The troubleshooting process is mostly about isolating and identifying a
problem. To troubleshoot AD, you first check to see whether the domain controllers in the forest
can communicate with each other. Next, you need to ensure that AD has access to DNS and that
DNS is working properly. After you verify that DNS is working, you need to check that the
individual domain controllers and operations masters are working properly and supporting the
directory functions. Last, you need to verify that replication is working and that no consistent
errors are being generated. The ability to quickly assess what is causing a problem and
effectively develop a solution will help ensure smooth IT performance that successfully supports
the business.

188
Chapter 6

Chapter 6: Creating an Active Directory Design that You Can


Audit and Troubleshoot
When you’re creating a new AD environment, you’re often focused on business goals such as
How will users be organized for administration? What OUs will you need for proper application
of GPOs? Technical goals are usually another big consideration: How many GC servers will you
need? Will every office require a domain controller? Troubleshooting is rarely a consideration
during the design phase because troubleshooting isn’t necessary when nothing is broken. Many
organizations don’t consider auditing at the initial design stage, either, because auditing often
isn’t identified as a business or technical need until the environment is up and running. Both
troubleshooting and auditing, however, should be given serious consideration during the design
phase because a properly designed AD environment can make both troubleshooting and auditing
easier and more efficient.
What if you’ve already designed your AD environment and have been living with it for a while?
Fortunately, few design decisions in AD are one-time or irreversible, meaning a minor
redesign—this time with troubleshooting and auditing more firmly in the front of your mind—
can provide significant operational advantages.

Design Goals
Every AD design has goals such as easy user and group management, proper application of
GPOs, user response time, and so forth. In addition to these goals, you simply need to include
troubleshooting and auditing. Auditing is such an important troubleshooting tool—auditing can
often tell you what has changed recently, which is often a good place to start troubleshooting—
that creating an auditable design is a big part of creating a design that lends itself more readily to
troubleshooting. Some specific design goals to consider include:
• Performance—You can overload a domain controller by placing too much of an auditing
burden on it, so your overall design needs to accommodate the level of auditing you plan
to do. If you’ll be auditing a lot, you might need more domain controllers so that domain
controllers can handle the auditing load as well as their normal duties. Monitoring is also
a concern: If you plan to monitor your domain controllers on a regular basis, expect that
monitoring to place some additional (albeit marginal) overhead on the domain
controllers, and design accordingly.
• Access to information—You will need to decide who will be performing auditing and
troubleshooting and ensure that information is easily accessible to those individuals. For
example, planning to use event log consolidation tools might help bring critical
information in front of the right people more quickly, making troubleshooting and
auditing more efficient and effective.
• Tools—You will undoubtedly turn to tools outside of Windows for many of your
auditing and troubleshooting needs because Windows isn’t well-equipped for either task
on a large scale. Tools often bring their own requirements and overhead, and your overall
AD design should acknowledge and accommodate those needs. In other words, make
your troubleshooting and auditing tools a part of your design so that they will work more
efficiently and effectively.

189
Chapter 6

Performance Considerations
Performance often takes the biggest hit when you begin to implement auditing. The reason is not
so much that auditing a single event consumes much computing power as it is the fact that a
single domain controller can easily generate hundreds or even thousands of events per minute,
especially during busy periods like the morning rush-hour login. Carefully planning your
auditing can help minimize performance impact, and implementing additional resources—such
as domain controllers—can help minimize an impact on end user response times.
For example, if you have 10,000 users and 10 domain controllers, everything might be working
great. Turn on a high level of auditing, however, and all 10 domain controllers might become
just a bit slower to respond to user requests than you would prefer. Adding another couple of
domain controllers can help pick up the slack. Each domain controller will have fewer users to
handle and will generate commensurately fewer auditing events during any given period.
Why monitor performance at all? It can be a great way to spot problems before they become
severe. For example, a server with steadily declining performance might be noticed before
performance declined to a point at which the server was useless. Performance values can also
provide obvious input into troubleshooting activities, especially where AD considerations such
as replication and database utilization are concerned.

Overauditing
Overauditing is perhaps the most common mistake administrators make when implementing
auditing. You must carefully consider exactly what you need to audit, and audit only that—and
nothing more. For example, some organizations audit login or access failures because a
significant number of failures can be a clear indicator that your systems are under attack.
However, if you’re not going to do anything more than a cursory, manual review of the event
logs from time to time, you’re not really achieving your stated goal of detecting attacks by
auditing for these types of events. Thus, the computing power going toward logging those
failures is essentially wasted. Consider whether there is a better way to obtain the information
that auditing might provide. For example, Figure 6.1 shows the Active Directory Sites and
Services console with the root container configured to audit all success and failure events for all
users, for all possible actions.

190
Chapter 6

Figure 6.1: Auditing everything, for all users, by using the Active Directory Sites and Services console.

This method is a good practice from a troubleshooting perspective because you’re going to get
very detailed information about everything that happens in the console. If someone adds a site
link, changes a site link bridge, and so on, an event will report that such an action took place. If a
problem occurs, you can jump right into the Event Viewer to see what changes have recently
been made. Figure 6.2 shows just such an event—someone has accessed a site configuration
object.

191
Chapter 6

Figure 6.2: Events showing configuration access and changes.

However, enabling this level of auditing can result in a lot of events. Although useful for
troubleshooting, the overhead created might not be worth the benefit of having these events
simply for diagnostics. Third-party tools that provide the same information can come in handy in
such situations. For example, NetPro ChangeAuditor for Active Directory collects similar
information with a bit more detail, and logs it to a separate database. The Windows event logging
system isn’t involved; thus, although there is overhead involved, it is less than that required by
the Windows event logging system and it generates useful diagnostic information in the event
that you need to troubleshoot a problem. In this fashion, third-party tools can help collect useful
auditing and troubleshooting information without the need to go overboard with Windows’ built-
in auditing capabilities.

Overmonitoring
People don’t realize how much overhead monitoring can place on a server. For example, running
System Monitor against a remote system can add a measurable amount of overhead; running it
all day, every day can reduce response times if you don’t specifically plan for the monitoring in
your design and compensate for that overhead. For example, using System Monitor configured
as Figure 6.3 shows produced about a 2 percent overhead on the system being monitored. Not a
huge amount, but definitely something you want to be aware of and plan for.

192
Chapter 6

Figure 6.3: Monitoring isn’t free. This chart produced about 2 percent processor overhead on the monitored
system.

This area is another in which tools other than those bundled with Windows may do a better job.
For example, Microsoft Operations Manager (MOM) provides round-the-clock monitoring, but it
does so by sampling performance counters at regular intervals rather than continuously
monitoring them and drawing graphical charts based on their values. In fact, most enterprise-
class monitoring tools from companies such as NetPro, Microsoft, Argent Software, and NetIQ
are all capable of gathering more monitoring information while producing less overhead than
Windows’ built-in performance monitoring tools.

193
Chapter 6

Design Considerations
Once you’ve addressed your performance concerns in your design, you can begin thinking about
how your AD design will support troubleshooting and auditing needs. Specific questions—such
as Who will use auditing information and how?—will drive specific design decisions that affect
your overall AD design.

Who Will Troubleshoot?


One of the biggest questions you’ll need to answer is who will be troubleshooting your AD
environment? More accurately, you’ll need to determine who will need access to AD
troubleshooting information.
Many organizations—particularly those subject to regulations such as the Health Insurance
Portability and Accountability Act (HIPAA), the Sarbanes-Oxley Act, Gramm-Leach-Bliley Act,
and so forth—need to be sensitive about who has access to information. In some cases, AD
troubleshooting information might contain information that is considered confidential. Simply
making troubleshooting data available to all administrators, for example, might unnecessarily
expose this confidential information, or at least present a risk for unnecessary exposure. This
consideration can create difficulty in your design because you ideally want to make
troubleshooting information available to all technical staff members but might be constrained
from doing so as a result of these types of regulatory concerns.
Physical location can also affect your design. For example, if the individuals who perform most
of your troubleshooting are located in one office and troubleshooting information is collected
from a variety of offices, you might want a design that allows for information to be consolidated
into a location that is convenient for the centralized technical staff. If, however, your technical
staff is distributed and tends to work only with their local resources, you might want the
troubleshooting information to be distributed so that it is more conveniently accessible to the
people who will be using it.
What type of troubleshooting information do you need to be concerned with? Potentially, any
data that might contribute to faster identification of a problem and its resolution:
• Auditing information can help identify recent changes that might have been responsible
for a problem.
• Basic documentation can provide reference and background information for narrowing a
problem or solution.
• Performance information can help identify additional symptoms or root causes for a
problem.

194
Chapter 6

Windows’ built-in tools for AD auditing and performance monitoring aren’t highly centralized.
Audit events, for example, are scattered across the event logs of every domain controller;
performance information can’t be readily centralized without significant overhead because
System Monitor is the only real built-in way to collect this information. Windows doesn’t
provide any built-in means of centralizing network-related documentation, particularly as it
relates to changes. Third-party tools often provide much more flexibility, allowing information to
be consolidated in central databases, made accessible through Web interfaces to distributed staff
members, and so forth. For these reasons, you’ll often find that third-party solutions will become
an indispensable part of your overall AD design.
For example, while not strictly falling into the category of “third-party,” Microsoft has a tool
called Microsoft Audit Collection Service (ACS, which has not been released by Microsoft as of
this writing). The tool is an agent-based service that collects security events from multiple
servers and funnels them into a central SQL Server database for filtering, reporting, searching,
and so forth. Many third-party manufacturers offer solutions that perform a similar function;
NetIQ, for example, has several products (notably the AppManager suite) designed to collect
event log entries into a single database; MOM performs a similar task as part of its feature set.

How Will Auditing Be Utilized?


A major part of your design—which AD events will be audited—will be determined by how
those events will be utilized. Will auditing primarily serve a troubleshooting function or is
auditing also going to be used to perform security audits, compliance audits, and so forth? If
auditing will simply serve as a troubleshooting tool, for example, simply auditing successful
events—meaning events related to changes that were implemented—might be sufficient for
troubleshooting purposes. If auditing will also be used to perform security, compliance, and other
audits, you might need to collect a wider variety of events and make those events accessible to
individuals with different goals.
For example, a security auditor might not be focused on events related to changes in the AD site
structure but wants to see how often administrative accounts are used. A troubleshooter, in
contrast, might be best served by exactly the opposite information. In cases in which you will be
collecting events that at least one person or group doesn’t need to see, you’ll need to provide
some collection and filtering capabilities to ensure that the correct events reach the right people.
Again, Windows’ built-in auditing capabilities fall short in this arena, and third-party tools can
be more effective. For example, Figure 6.4 shows NetPro’s ChangeAuditor, which can
effectively filter events by type so that both troubleshooters and auditors can access the
information that is relevant to them.

195
Chapter 6

Figure 6.4: Filtered events allow everyone who uses auditing information to focus on their specific job tasks
without being distracted by extraneous events.

Many tools store events in their own databases (either collecting events through their own
interfaces, as ChangeAuditor does, or consolidating Windows events as ACS does), your options
for providing this information to the people who need it are broader. You can, for example, make
auditing information available to individuals who aren’t administrators and might not normally
have access to, say, the security log; doing so allows you to more granularly define the security
in your environment.

196
Chapter 6

For How Long Will Data Be Maintained?


Troubleshooters won’t often need to look at data that is more than a month or so old; changes
made that long ago will generally have already caused whatever problems they are going to
cause. However, many organizations might need to maintain auditing information for a longer
period of time to satisfy regulatory or business policy requirements. Windows’ auditing tool—
Event Viewer—simply allows logs to be manually saved to a file, which isn’t an efficient means
of archiving events from multiple servers. Scripts can be used to automate this process and target
several computers; Listing 6.1, for example, is a VBScript-based script that archives the security
logs from several computers listed in a text file.

This script is designed to be saved as a file with a .WSF filename extension. You can run it from a
command line and use the /? argument to see its syntax and usage instructions. You can download
this script from the ScriptVault at http://www.ScriptingAnswers.com.

<?xml version="1.0" ?>


<package>
<comment>
PrimalCode wizard generated file.
</comment>
<job id="loghunter" prompt="no">
<?job error="false" debug="false" ?>
<runtime>
<description>
Collects security log files from a list of computers (listed in a text
file, one computer name per line) and archives those logs to a
specified folder. Optionally, collected logs can be cleared.
</description>
<named helpstring="Filename listing one computer name
per line" name="file" required="true" type="string"/>
<named helpstring="Folder name - must exist - where
logs should be archived" name="dest" required="true" type="string"/>
<named helpstring="Yes or No - clear log after
archiving" name="clear" required="true" type="string"/>
<named helpstring="Specify the path for the archive
file" name="archive" required="true" type="string"/>
<usage>
Clear log files after collection:

Loghunter /file:c:\computers.txt /dest:c:\archive /clear:yes


</usage>
</runtime>
<script id="Loghunter" language="VBScript">
<![CDATA[
'======================================================================
'
' LogHunter
'
' VBScript by Don Jones. Copyright (c)2004 BrainCore Nevada, Inc.
' All Rights Reserved. Provided without warranty of any kind; please
' test thoroughly before using in a production environment. The author
' of this script is not responsible for any damages arising from its
' use, including damages which the author has been advised of.
'

197
Chapter 6

' www.ScriptingAnswers.com - Where Windows Administrators Go To


Automate
'
' AUTHOR: Don Jones
' DATE : 7/31/2004
'
' COMMENT:
'
'======================================================================

If WScript.Arguments.Count < 3 Then


WScript.Arguments.ShowUsage
WScript.Quit
End If

Dim sFile, sFolder, bClear


sFile = WScript.Arguments.Named("file")
sFolder = WScript.Arguments.Named("folder")
If lcase(WScript.Arguments.Named("clear")) = "yes" Then
bClear = True
Else
bClear = False
End If

Dim oFSO, oTS


Set oFSO = CreateObject("Scripting.FileSystemObject")
If Not oFSO.FolderExists(sFolder) Then
WScript.Echo "Destination folder does not exist"
WScript.Quit
End If
If Not oFSO.FileExists(sFile) Then
WScript.Echo "Input file does not exist"
WScript.Quit
End If

Dim sClient, oWMIService, cLogFiles, oLogfile


Dim errBackupLog, sOutfile
Set oTS = oFSO.OpenTextFile(sFile)
Do Until oTS.AtEndOfStream
sClient = oTS.ReadLine
Set oWMIService = GetObject("winmgmts:" _
& "{impersonationLevel=impersonate,(Backup)}!\\" & _
sClient & "\root\cimv2")

Set cLogFiles = oWMIService.ExecQuery _


("Select * from Win32_NTEventLogFile where " & _
"LogFileName='Security'")

For Each oLogfile in cLogFiles


sOutfile = oFSO.BuildPath(sFolder,sClient & "=" & _
DatePart("y",Date) & "-" & DatePart("yyyy",Date) & ".evt"
errBackupLog = objLogFile.BackupEventLog(sOutfile)

If errBackupLog <> 0 Then


Wscript.Echo "Couldn't get log from " & sClient
Else
WScript.Echo "Got " & sClient
oLogFile.ClearEventLog()

198
Chapter 6

End If
Loop
oTS.Close
WScript.Echo "Complete"

]]>
</script>
</job>
</package>

Listing 6.1: A VBScript-based script that archives the security logs from several computers listed in a text
file.

Third-party tools, which rely on more flexible databases and their own storage formats, can
generally provide more flexible long-term storage options, if needed. For example, ACS is
designed to store security events for years in a SQL Server database. If long-term event storage
is a part of your organization’s needs, make sure these needs are accounted for in your AD
design.

Design Guidelines
There are specific design considerations and suggested best practices for creating an AD
environment that lends itself to troubleshooting and auditing. The following sections explore
these considerations.

Selecting Appropriate Tools


By now you should already have answered key questions that address who will need to have
access to troubleshooting information and so forth; these questions will help drive your selection
of troubleshooting tools. For example, you might decide to use tools such as MOM or an AD-
specific tool such as NetPro ChangeAuditor to collect troubleshooting information rather than
rely entirely on Windows’ built-in auditing capabilities and Event Viewer console. Regardless of
which tools you choose, the AD design phase is the appropriate time to consider what those tools
will require from your environment. Most higher-end tools will work best with an agent installed
on each domain controller so that events can be quickly collected and funneled into the tool’s
central database. You will need to test to determine how much overhead these agents require so
that you can plan your domain controller workload appropriately. If you’re going to stick with
Windows’ Event Viewer for collecting information, do some testing to see how much overhead
is created.

199
Chapter 6

In addition, make tool selections based on your troubleshooting needs. For example, tools such
as ChangeAuditor are essentially reactive tools, providing you with information to diagnose a
problem that already exists. Other tools, such as NetPro’s SecurityManager, are somewhat more
proactive in nature because they continually monitor your environment for specific types of
changes, alert you to those changes, and, in some cases, restore the environment to a
preconfigured state, undoing the change. Although primarily designed to help maintain a secure
environment, these types of tools can have a valuable troubleshooting function by catching
potentially damaging changes and calling your attention to them immediately. All of these tools
have system requirements that need to be considered in your overall AD design.
At a minimum, decide on a tool set that offers the following capabilities:
• Collects information regarding changes to the environment as well as any other auditing
information necessary to support business policies and regulatory compliance. Tools that
offer this functionality include the Windows Event Viewer, NetPro ChangeAuditor,
MOM, and NetIQ AppManager. You might employ a collection of tools to meet this
need: ChangeAuditor, for example, is very specific to AD, while MOM can be extended
through management packs to handle specific Microsoft server products such as
Microsoft Exchange Server and Microsoft SQL Server.
• Makes the relevant information available to the correct people at the right time. Generally
speaking, this capability will involve centralizing information in a single database and
providing interfaces for various users to filter and view the information. Microsoft ACS
offers security event consolidation and reporting; MOM also helps centralize and manage
Windows events. Many third-party tools, such as ChangeAuditor, utilize their own events
and database rather than relying on Windows-generated event logs. A key purchasing
decision for any tool should be the ability to give non-administrative users access to
events (or reports of events) as needed to support their job tasks.
• Collects not only audit-style events (such as the events generated by Windows for its
event logs) but also performance data. Windows lacks a robust built-in means of
collecting, consolidating, and working with performance data. Tools such as NetPro
DirectoryTroubleshooter (which is specific to AD) and NetIQ AppManager are more
adept at collecting performance-related information and making it readily accessible to
troubleshooters. When it comes to AD troubleshooting, ensure that the tool you select
presents information in a way that enhances the troubleshooting process; a tool built
specifically for AD is often able to do this better than a tool that covers Windows in a
broader sense or covers multiple server products.

200
Chapter 6

Configuring the Environment


Configuring the environment to support troubleshooting and auditing involves installing and
configuring whatever tools you’ve selected to support your troubleshooting and auditing tasks.
Properly configuring Windows is also crucial. For example, your auditing plan will probably call
for specific auditing configurations on specific areas of AD.
AD auditing is configured in three consoles: Active Directory Users and Computers, Active
Directory Domains and Trusts, and Active Directory Sites and Services. Each console allows
you to configure permissions—and auditing—at various levels; you’ll often want to configure
auditing at the highest possible level so that auditing information is generated consistently
throughout the hierarchy. Generally, you will configure this by right-clicking the top-most object
in the console’s tree view, selecting Properties, then switching to the Security tab. On the
Security tab, click Advanced, then select the Auditing tab to display the auditing configuration,
as Figure 6.5 illustrates.

If a Security tab isn’t visible, especially in Active Directory Users and Computers, select Advanced
Features from the console’s View menu and try again.

Figure 6.5: Configuring auditing on the domain in Active Directory Users and Computers.

201
Chapter 6

Windows doesn’t require much in the way of specific configurations to enhance AD


troubleshooting; simply collecting the right events and performance data and centralizing that
information into an easily accessible database will provide all the support most troubleshooting
tasks require.

Maintaining the Proper Configuration


Ensuring that your environment remains configured for optimal troubleshooting can be difficult.
After all, Windows has tens of thousands of settings, none of which are accessible through any
one uniform methodology or technology. Third-party products generally maintain their settings
through a completely different set of techniques, such as private databases as well as by through
Windows-based techniques such as the registry.
There are tools that can help ensure that your environment remains properly configured. For
example, NetPro SecurityManager can help alert you to security-sensitive changes, such as
changes in your auditing policy. Tools such as Configuresoft Enterprise Configuration Manager
(ECM) can detect a much broader variety of configuration changes across Windows and alert
you to those changes immediately, allowing you to take corrective action. In some cases, ECM
can even undo the changes and restore your desired configuration based on a flexible system of
configuration templates and responsive actions. A regular, manual audit of your configuration
settings can also help ensure that they remain properly configured.

Regardless of the tools you use, make configuration maintenance easier by documenting your
desired configuration. Once you’ve decided how your tools should be configured, which events you’ll
audit, and so forth, having this documentation will allow a junior administrator or auditor to periodically
confirm that the environment is properly configured. In addition, this documentation will enable you to
more easily reconfigure a misconfigured environment.

Monitoring Core Areas


Your organization’s security and regulatory compliance needs will tend to drive what you audit
in terms of security events; the information you collect to support troubleshooting activities will
overlap those security needs in many respects, but you’ll find yourself collecting a good deal of
additional data to support troubleshooting. For example, you’ll want to collect (and centralize
into a database, ideally) a comprehensive set of AD performance data because this data can aid
in diagnosing performance issues, replication problems, object management problems, and so
forth. NetPro DirectoryTroubleshooter has a fairly comprehensive list of AD-specific
information that it can monitor, as Figure 6.6 shows.

202
Chapter 6

Figure 6.6: Examining performance information related to AD.

Basically, any performance information related to AD is worth collecting, if your tools can do so
without creating an unnecessary or undesirable performance burden on your domain controllers.

Preventing Trouble
Troubleshooting, of course, begins the moment you learn of a problem. Auditing can provide
useful troubleshooting information by helping you quickly determine what has changed in your
environment because changes are a major source of problems. However, an even more effective
method is to avoid problems as much as possible.
One way to avoid problems is to avoid change. Changes cause problems simply because AD
environments are so complex: Making a change without fully considering the ramifications can
often break things or at least cause them to work less than optimally. Change, however, is the
one constant you’ll always have in the IT industry, so although it is a great idea to avoid
unnecessary changes, you will never be able to avoid change entirely.
The trick is to manage your change so that change is never unexpected, never ad-hoc, and never
made without being thoroughly thought through. The Information Technology Information
Library (ITIL—read more at http://www.ogc.gov.uk/index.asp?id=2261) is a library of best
practices for IT management and offers a lot of information about change management. ITIL
provides a set of best practices for managing change to help avoid the problems that often
accompany change.

203
Chapter 6

A Process for Change


Figure 6.7 illustrates a sample business process based upon ITIL best practices.

Figure 6.7: An ITIL-based business process for change management.

204
Chapter 6

The way this process works is that all changes begin with a change request being submitted and
categorized. Immediate changes are sent for immediate development rather than being reviewed,
but other changes go through a review process to consider the risks of the change, the business
benefits, and so forth. Lower-priority changes are sent through a Change Advisory Board (CAB),
which will often package approved changes for scheduled implementation (perhaps once each
month) along with a group of other approved changes. The CAB considers past change
documentation when making its analysis and tries to group changes for release so that high-risk
changes aren’t bundled together—thereby presenting fewer risks at the same time. Changes are
also reviewed by an Executive Action Board (EAB), which considers areas such as business
impact. Urgent changes might bypass the CAB and go directly to the EAB, where they can be
approved for implementation more quickly or, if the EAB considers the change to be lower-
priority, queues for the CAB’s next review meeting.
Once approved, the change is developed by a technical staff member, then reviewed for potential
problems by a peer. Once the change is approved, it is deployed into a test environment to test
for additional potential problems. If all goes well with the test, the change is scheduled for
deployment. Prior to deployment, affected systems are backed up in case of a problem, then the
change is actually deployed to the production environment. The change is immediately reviewed
for accuracy, effectiveness, and for problems. If necessary, the change is rolled back and the
problem analysis documented for future review. At that point, the change goes back to the EAB
or CAB for further consideration. Changes that don’t result in a production problem are retained,
and the environment’s documentation is updated to reflect the change as a part of the baseline
environment.
The purpose of this process is threefold:
• To manage risk up-front by considering changes from a technical and business
perspective, and categorizing changes so that higher-priority ones receive precedence.
• To manage risk by technically reviewing changes. The idea is that more eyes on the
problem will be more likely to spot potential problems before they occur.
• To maintain a well-documented environment by ensuring that changes eventually become
a part of the environment’s overall documentation. This documentation helps to drive
decisions regarding future changes.

Tools to Manage Change


Processes are useful, but without tools to help implement them, processes can quickly be put
aside in the face of day-to-day crises and business pressures. Fortunately, several companies
offer tools to help make change management more effective. Such tools often provide a way to
automate and enforce the workflow defined in your change management process and help make
mundane tasks such as recordkeeping easier and more effective. IntaChange is a Web-based
change management tool that provides Microsoft Project integration, workflow management, and
other process-based tools to make change management more automated and efficient. As Figure
6.8 shows, IntaChange offers a calendar view of scheduled changes, making it easier for
managers and technical staff to quickly see what’s coming up.

205
Chapter 6

Figure 6.8: IntaChange’s calendar view shows upcoming changes and their details.

Another tool, Elite Change Management System provides similar functionality, including
features such as file attachments (allowing you to, for example, attach new network diagrams to
a change, indicating how the network configuration will be modified by the change) and so forth.
On an AD-specific front, NetPro offers ChangeManager. This tool not only helps track and
manage the change management process but also helps to automate the actual implementation of
changes. Changes can be tracked through each phase of the process, and you can quickly see
which changes are pending approval, have been approved but not implemented, have been
implemented, and so forth. ChangeManager incorporates a review process to help prevent
changes tat might cause problems, and most importantly, allows you to quickly identify recently
made changes. If a problem occurs, you’ll know exactly what changed recently, allowing you to
focus your troubleshooting efforts on the most likely cause of the problem right away.
Perhaps the most interesting feature of ChangeManager—and a real benefit of having a change
management tool that is AD-specific—is the ability for lower-level administrators to have
ChangeManager invoke approved changes. This functionality allows a complex AD change to be
proposed, reviewed, and approved, all by senior administrators, while allowing a lower-level
administrator to actually have the tool implement the change at an appropriate time. The benefit
of this workflow is that senior administrators—the ones you trust most to design accurate
changes for your environment—can focus on design and architecture; lower-level administrators
can be trusted to implement even complex changes because ChangeManager performs the actual
implementation for them, according to the senior administrators’ designs.

206
Chapter 6

Summary
This chapter explains some of the design (or redesign) goals you should keep in mind for AD to
facilitate both troubleshooting and auditing. We’ve explored performance and recordkeeping
concerns that will be an issue for regulated companies and organizations. This chapter also
introduced change management, a process that seeks to avoid problems by carefully managing
the changes that are introduced into the environment. To aid in this process, you can employ
tools that can help automate an effective change management process and make it less of a
cumbersome business process and more of a practical tool to avoid the need to troubleshoot in
the first place. Ultimately, this scenario is your goal—keep AD from having problems, and if
problems do occur, solve them as quickly as possible.
Enterprises are relying more and more on the smooth operation of AD. Of course, even the best
designed and maintained AD environment can run into problems. This guide has shown you how
to monitor AD to spot problems early as well as how to test various aspects of AD to locate
problems when they occur. As our industry matures, we’re finding new and creative ways to
recognize problems, prevent them from happening, and fix them when they do occur. Some of
these new techniques include auditing, which isn’t immediately obvious as a troubleshooting
tool. However, the primary function of auditing is to keep track of what has changed, which is
the first step you will take in almost any troubleshooting scenario.
Other new techniques for preventing problems and reducing troubleshooting time include careful
change management and control so that only planned and tested changes are introduced into your
environment. As changes are usually the culprit, you can prevent problems by preventing
problem-causing changes.
The number of tools available to help with AD troubleshooting, change management, change
auditing, and other tasks is constantly growing. Now that AD is in its second generation, it’s a
more stable and mature product, and third-party manufacturers are producing robust, mature
tools to help keep AD humming along smoothly. Getting serious about troubleshooting means
putting the right management procedures in place to manage change, the right tools in place to
help, and the right know-how—which you’ve got, now—to quickly address any problems that
arise.

207