Previously, I described the basic architecture of the Active Directory database (NTDS.DIT), with details on how and why to perform an offline defrag. I now want to talk about how to handle database errors by looking at some common events that indicate database corruption and how to fix them.
Before getting into the nitty gritty of database repair, let me just say that in my 8+ years of working with Active Directory, I've never had to manually rebuild the database with the ESEUtil.exe tool, as I often did with Exchange Server. While there are Active Directory database errors that pop up occasionally, they are usually pretty easy to resolve.
As mentioned in the previous article, Active Directory uses a Jet database, which is a transactional database. When a change is made to the database, LSASS.exe writes the change to a page in the memory buffer, then writes it to a log file. The default log file is %SystemRoot%\NTDS\Edb.log. The Extensible Storage Engine (ESE) can create a new log file when the current log is filled. LSASS.exe then waits for the log file to be committed to the database (NTDS.DIT).
If Active Directory stops ungracefully, the uncommitted transactions in the log files will be replayed with the transactions committed to disk to make the database consistent. Note that circular logging is enabled by default, which allows data to be overwritten in the existing logs. In the %system%\NTDS directory, you will see the following files:
- Edbxxxxx.log (i.e. Edb00009.log) -- This is the log file containing transactions that could not be held in the EDB.log. These are created sequentially.
- EDB.log -- Contains the newest transactions or database changes that have not been committed to the database.
- EDB.chk -- Keeps the database checkpoint and knows which transactions have and have not been committed, so when a recovery is needed, EDB.chk keeps it all straight.
- Res1.log and Res2.log -- Placeholders, 10 MB each, to prevent the disk from being full and having no room to create more log files. If circular logging is enabled, there is no danger of this.
- NTDS.DIT -- The Active Directory database, stored independently on each domain controller.
All of the database logs are 10 MB in size, while the size of NTDS.DIT depends on the amount of objects stored in Active Directory. Note that there is no physical limit on the number of objects in the database. A colleague of mine once built an Active Directory domain with 100,000 objects (mostly users) and the performance remained pretty flat. It is unknown if there is a limit, testifying to the scalability of Active Directory.
As previously noted, an ungraceful shutdown of a domain controller will require the database to be rebuilt by LSASS.exe when the DC is rebooted. You may see a message prior to or during logon that says something like "Active Directory is rebuilding indices." This is a notice of database recovery. While it is technically possible to use ESENUTL.exe to verify the database and commit the pending transactions in the logs, I have never needed to do that, and having spent the past eight years troubleshooting Active Directory problems for clients, I've never seen nor heard of anyone else having to do it either. Active Directory is quite self-healing.
There are occasions when the NTDS.DIT will become corrupt. Usually "corrupt" is a word you use when you can't figure out what the problem is, but occasionally Active Directory will become genuinely corrupted. In my previous article, I discussed several operations in NTDSUtil.exe, when booted into Directory Service Restore Mode (DSRM). These commands are in the File Maintenance menu:
Semantic database analysis checker
This function is located in NTDSUtil.exe in the main menu, and it must be used offline. Unlike other database operations, I've used this one a lot. Basically any time you see evidence of or suspect database corruption, you can run this tool to fix the problem.
There are a number of events that will indicate database corruption. Some are obvious; some are not. Database corruption could be the cause of Event 1265 (Source: NTDS KCC) or Event 1645 (Source: NTDS Replication), which results in replication failure. Of course these could also be caused by SPN problems, DNS errors, etc., but running the Semantic database analysis is worth a try if you can't find the problem. There is a good description of how to use this option in this Microsoft KB article.
A more obvious error is Event ID 467, Category: Database Corruption. Here are the steps you can take to resolve this and similar events indicating corruption:
- Boot into DSRM and go to NTDSUtil and go to the File Maintenance menu.
- Run the Integrity command (it will probably confirm database corruption).
- Run the Recover command.
- Run Semantic database analysis with the Go Fixup option.
- In the case of Event 467, you may see a description that the index is corrupted. In order to repair this, try defragging the database (with the Compact To option in File Maintenance).
If these options fail, it would be advisable to simply demote and re-promote the domain controller. This destroys the old database and gets a fresh copy from another DC. Note that if replication is broken, then you will have to manually demote the DC using the DCPromo/ForceRemoval command.
|Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Gary is a Microsoft MVP for Directory Services and formerly for Windows File Systems.|