Microsoft Exchange Server 2003 troubleshooting and disaster recovery

A couple of nights back, I attended one of the Microsoft TechNet UK events. Not sure whether to attend John Howard’s session on automating Windows Server Administration or Eileen Brown’s Microsoft Exchange Server 2003 Troubleshooting and Disaster Recovery session, I decided to return to my Microsoft Exchange roots (which go back to the Exchange Server 4.0 launch events in April 1996). In addition, Eileen has posted some useful links relating to the session content on her blog.

Configuring recovery options and general troubleshooting tools

A common issues for an e-mail administrator is the recovery of e-mail which a user has accidentally deleted, and less commonly, the need to recover a deleted mailbox. Fortunately, Exchange has a number of options which can assist with this. There is a trade off between giving users the opportunity to recover data and additional storage on the e-mail server, but because of single instance storage, in reality this is not as big an issue as it might seem.

For a mailbox store, there are three main configuration items of interest:

  • Keep deleted items for (days) defines the time for which a deleted item is still recoverable from within Outlook, even if the deleted items folder has been emptied (for options in the use of this feature, see KC Lemson’s blog).
  • Keep deleted mailboxes for (days) is similar, but defines the number of days that a mailbox will remain (orphaned and available for reconnection to an Active Directory user account), until it is finally removed from the store.
  • Do not permanently delete mailboxes and items until the store has been backed up ensures that regardless of the deleted item retention and mailbox retention intervals, nothing is finally removed until a full backup of the store has successfully taken place.

For a public folder store, keep deleted items for (days) and do not permanently delete items until the store has been backed up are the equivalent options.

All of these options are found on the limits page of the store properties.

There are a number of troubleshooting tools available to the Exchange administrator:

  • Disaster recovery setup mode (note that this doesn’t work on a cluster), can be used to reconnect mailboxes with a store.
  • The application log in Event Viewer is useful (in particular, watch out fr 1012, 1018, 1019 and 1020 errors – the Exchange Information Store often gives information about database issues well in advance of failure).
  • Diagnostics and protocol logging, which is an overhead on the server and should only be enabled when diagnosing an issue, allows the level of logging to be tuned. Even set to none, critical events are logged, with minimum, medium and maximum corresponding to the level of events that are logged.
  • Message tracking can be used to track messages through the system, optionally recording the subject of the message in the logs. The main consideration with this is the number of days for which a message should be retained.
  • Exchange System Manager’s monitoring and status tools allow monitoring of services or resources (e.g. thresholds for queue length growth) and alert notification, sending e-mail and/or executing a script (a useful alternative if e-mail is unavailable!) to notify an administrator of issues or even to automatically take corrective action.

When backing up Exchange (whether using the Backup Utility for Windows, or a third party tool), there are a number of issues to consider:

  • A backup is not complete unless it includes the mailbox and public folder stores, with the transaction logs and the Windows system state. In addition, mailbox servers should never have circular logging enabled as storage prices have dropped considerably since the days of Exchange Server 4.0 and without a complete set of logs, recovery would still result in some lost data. For front end servers, Microsoft recommend that there are no stores present.
  • If co-existing with Exchange Server 5.5, the Site Replication Service (SRS) also needs to be considered.
  • Connectors may also contain information to be backed up.
  • Recovery will typically take twice a long as backup, so keep backup times short. If quotas are used to limit mailbox usage beware as they might just lead to users storing mail in personal folder (.PST) files, leading to unmanaged offline storage (i.e. not backed up), a loss of singe instance storage, and possibly network bandwidth issues if the personal folders are stored on a network server.

Further information (and best practice) is contained in the Disaster Recovery Operations Guide for Exchange Server 2003.

Troubleshooting Internet E-mail

One of the things to remember when troubleshooting Exchange issues is that there are so many external factors to consider. Besides the obvious areas of Exchange and the e-mail client (usually Outlook), DNS and the underlying network can create issues.

When examining the process of receiving inbound e-mail, Exchange doesn’t actually do much! The originating server looks up the IP address which corresponds to the mail exchanger (MX) record for the SMTP domain in DNS. The message is then routed across the Internet based on that TCP/IP address and it is only once the message has been received (possibly via a smart host) that Exchange routes the message within the organisation for final message delivery. For outbound e-mail, it is the reverse process and from this we can tell that the two main areas to look at are DNS and TCP/IP.

The TCP/IP troubleshooting process is well known:

  1. Check the TCP/IP properties for the Exchange server. Are they complete?
  2. Can you ping localhost (127.0.0.1)? If not, there would appear to be an issue with the network card or protocol stack.
  3. Next, can you ping the server by its own IP address? If not, there would appear to be an issue with the server’s TCP/IP address – is that configured correctly?
  4. Next, can you ping the default gateway? If not, there would appear to be either an incorrectly configured router (default gateway) address, or a physical network issue (is the cable plugged in?)
  5. Finally, can you ping other hosts on the network – e.g. the DNS server? If not, there may be a routing issue (or the DNS server addresses could be incorrect).

Additional troubleshooting steps for mail servers are:

  • Can you connect to port 25 on the mail server using Telnet? If not, then the SMTP service may not be running.
  • Are the server’s host (A) and MX records correctly recorded in both the internal and external DNS, with the correct priority (1 is the lowest cost).

Other areas to examine are:

  • Is a DNS suffix required and/or set in the TCP/IP properties?
  • Does the computer name have the correct fully qualified domain name (FQDN) in the system properties.
  • Is DNS working correctly (NETDIAG is a useful command, in particular netdiag /test:dns can be used to identify DNS issues, after which NSLOOKUP can be used to query DNS).
  • Address spaces, e.g. if an organisation hosts two or more domain names, are they all configured with MX records and do users have corresponding e-mail addresses.
  • Size restrictions – both internal and external restrictions can be set. If large messages are not being received, this could be the issue. Note that SMTP virtual server settings can be overridden by global settings.
  • If the SMTP queues have a lot of retries pending, this will often indicate a DNS issue.

Recovering messages and mailboxes
There are a number of mailbox recovery tools available to an Exchange administrator:

  • Once an Active Directory account is deleted or a mailbox removed in Exchange System Manager, the mailbox is not actually removed, but is tombstoned (shown with a red cross in System Manager and the retention time for deleted mailboxes begins). This action is carried out by the Cleanup Agent, which may be triggered manually if it has not completed its next scheduled run before a mailbox needs to be recovered.
  • The mailbox recovery center allows an administrator to mount a (recovered) store and view all the disconnected mailboxes, from where a matching user account can be found using the Exchange Mailbox Matching Wizard and the mailbox reconnected using the Mailbox Reconnection Wizard.
  • The Exchange Server Mailbox Merge Wizard (ExMerge) can be used to merge data into a mailbox; however the recover mailbox data functionality (new with Exchange Server 2003 SP1) replaces the need to use ExMerge in the majority of recovery cases.
  • Offline folder (.OST) to personal folder (.PST) conversion has now been superseded by the recovery storage group.

Tip: “object not found” errors when Outlook synchronises with the Exchange Server are often caused by invalid entries in the default address book. Rebuilding this will usually resolve such issues.

Recovery storage groups

To use the recovery storage group feature, at least one Exchange Server 2003 server must be available within the organisation. This allows an administrator to create a recovery storage group, into which a database can be mounted for mailbox recovery, avoiding the need to recover on a separate recovery server and export e-mail via a .PST file; however recovery storage groups do have some limitations (after all, they are intended to be used purely for the purposes of recovering data):

  • All protocols except MAPI (required for the Microsoft Exchange Information Store service to access the storage group) are disabled.
  • Mailboxes cannot be directly connected to user accounts (except using ExMerge).
  • No management policies are available (not necessary as no live users).
  • No Exchange maintenance procedures are available (ESEUTIL/ISINTEG).
  • Databases must be mounted manually (e.g. to run ExMerge).
  • Database locations cannot be changed (but database files are not server/location specific and can be copied manually).
  • Only private mailbox stores can be recovered (i.e. not public folder stores)

Exchange Server 2003-aware backup programs will automatically restore to a recovery storage group.

In a disaster recovery scenario, an Exchange administrator could perform what is known as a dial-tone database restoration. This involves creating an empty database and mounting this so that users can continue to send and receive e-mail whilst their original data is recovered. Meanwhile, the failed database can be restored to a recovery storage group and the recover mailbox data feature or ExMerge used to restore the data to the user’s mailbox whilst both stores are online. To save time in the recovery (albeit involving some more user downtime whilst the databases are swapped), it may be appropriate to swap the database files, remount the original store and then merge in the new data from the dial-tone database. Eileen Brown’s blog features a blogcast demonstrating the recovery storage group which explains the process in further detail.

Further information on using Exchange Server 2003 recovery storage groups is available on the Microsoft website.

Database corruption and recovery

Each Exchange Server storage group has its own set of transaction logs, which can be replayed to recover data up to the point at which failure occurred. In my recent post about Exchange Server best practice and preventative maintenance, I wrote about the ESEUTIL and ISINTEG tools. Using ESEUTIL, it is possible to examine the message headers (eseutil /mh database.edb) and examine the resulting output to check the state of the store (clean or dirty) and whether or not any logs are required to be replayed.

To replay the logs, simply ensure that they are available in the correct location and mount the store. Following this, the application event log should record a number of events indicating that it is initiating recovery steps and replaying the logs before recording a successful completion.

Leave a Reply