How to use SMTP queues to troubleshoot mail flow
Find out how you can use SMTP queues to diagnose and troubleshoot Exchange Server mail flow and email performance issues.
Aside from repairing a corrupted Exchange database, mail flow and email performance issues are the most challenging Exchange Server problems to diagnose. If users aren't receiving their email, the list of possible causes is endless.
It's usually pretty easy to tell if your spam filter is too aggressive or if an Exchange server is configured to filter out certain types of email messages. On the other hand, if the problem is related to the SMTP virtual server configuration on an Exchange server within your organization, the problem is a lot more difficult to diagnose and treat.
What you might not realize though is that you can gain a wealth of insight into mail flow and email delivery problems just by looking at which SMTP queues messages are being placed into. In this tutorial, I provide an overview of Exchange Server 2003's SMTP queues and explain how to use them to determine the source of Exchange Server mail flow and performance problems.
Part 1: How to locate an email message in the SMTP queues
Before you can use a message's location to troubleshoot the Exchange server, you must locate the email message. The message can be located in any one of a server's many different SMTP queues.
To hunt for a message, go to Exchange System Manager -> Administrative Groups -> your administrative group -> Servers -> the problematic server -> Queues. When you select the Queues container, Exchange System Manager will display all of the server's SMTP queues in the console's details pane.
When you select the Queues container, you'll initially receive some summary information that will tell you how many email messages are in each queue, the cumulative size of all of the messages within the queue, and the date and time that the oldest message was placed into the queue.
The summary information by itself can be valuable because it can give you hints as to which SMTP queues might be having problems. For example, if you see an extremely large queue or a queue that has held email for a long time, Exchange Server may be having problems processing the messages in that queue.
In Figure A above, notice that there is only one message queued on my lab server. Obviously, an active production server will have many messages in various SMTP queues at any given time, even if the Exchange server is healthy.
This brings me to my next point. If you have a particular user complaining that email is not being received or that messages that are not being delivered, then the first thing you need to do is to figure out where the email is going.
Determining where the messages are located is simple if your server happens to be like mine and only has one message queued. But if you have dozens of messages flowing through the SMTP queues at any given time, you will have to do a little detective work to find the missing email.
The easiest way of locating these email messages is to click the Find Messages button shown in Figure A. When you do, you will see a dialog box similar to the one that's shown in Figure B.
At first glance, the dialog box shown in Figure B is deceptively simple. It appears as though you can specify the sender and/or the recipient and click the Find Now button to locate the email message. While you can do this, tracking down a specific message is a little trickier than that.
In Figure B, you will notice that the title bar reads Find Messages – DSN messages pending submission. "DSN Messages pending submission" is the name of one of the SMTP queues. It's important to note that the Find Messages screen does not perform a global search; it searches each queue individually.
You might assume that you just search each SMTP queue until you find the specified message. In reality, when you search an SMTP queue and receive a result, it doesn't necessarily mean that you have found the problematic email message.
For example, imagine that a user named Fred is trying to send an email message to another user named Sue, but the message isn't being delivered for some reason. You could begin your search by looking for messages from Fred to Sue. The problem is that Fred sends email to Sue all the time. In such a case, the best approach is to check each SMTP queue and make a note of the email message's date/time and subject line. You can then get Fred to identify the exact problematic email message.
Yes, locating a specific email message within the SMTP queues can be tedious, but it is the first step in the mail-flow troubleshooting process.
Part 2: Troubleshooting the DSN Message Pending Submission queue
Exchange Server uses the DSN Message Pending Submission queue for non-delivery reports (NDRs). When Exchange Server needs to send an NDR to someone, the NDR message is placed into this SMTP queue prior to being transmitted.
Since this SMTP queue is reserved for NDR messages, you should never have an email from one user to another show up in this queue. That doesn't mean that this queue never malfunctions though. Quite often the DSN Message Pending Submission queue will clog up because NDR messages are not being transmitted.
The fact that the DSN Message Pending Submission queue fails to process messages is not the problem -- it is merely a symptom of another issue. When the DSN Message Pending Submission queue fails, it is almost always related to the message store or to the IMAIL Exchange Server store component. (IMAIL is responsible for message conversions.)
If your Exchange server's DSN Message Pending Submission queue is malfunctioning, make sure that all of the Exchange Server stores are mounted. If the stores appear to be OK, check the event logs for any store or IMAIL-related errors.
Part 3: Troubleshooting the Failed Message Retry queue
Occasionally you'll receive an email in your Inbox indicating that there has been a delay in sending a message, but that there is no need to resend the email message. This means that a message delivery failure has occurred, but this doesn't necessarily mean that there is a problem on your Exchange server.
Let's say you want to send an email to someone at another company. You send the message, but you have no idea that the other company's mail server is down. Because the server isn't working on the receiving end, your Exchange server is unable to deliver the message.
Exchange Server responds to this situation by placing the undelivered email message into the Failed Message Retry queue. Exchange then attempts to resend the message at various intervals until the message's timeout period expires.
After the email has been in this SMTP queue for a few hours, Exchange will send you a delayed delivery notification message. There is no need to resend the email message. You will only receive a message delivery failure if the message expires without being delivered.
What can cause the Failed Message Retry queue to malfunction?
It is normal to have a message sit in the Failed Message Retry queue for hours on end, but that does not necessarily mean that the queue is malfunctioning. If, however, email stays in this SMTP queue for more than two days (the default timeout period) or if an exceptionally large number of messages are deposited into the queue, this means the queue is malfunctioning.
If messages stay in the queue for more than two days, you should first check to make sure that the Exchange server isn't set to use an excessively long timeout period.
Here are the instructions to check the timeout period for messages in Exchange Server:
- Go to Exchange System Manager -> Administrative Groups -> your administrative group -> Servers -> your server -> Protocols -> SMTP -> Default SMTP Virtual Server.
- Right click on the Default SMTP Virtual Server -> Properties -> Delivery tab.
- The Delivery tab controls how often Exchange Server attempts to resend messages (i.e., the timeout period).
If a message that should have timed out is still in the queue, then the email could be corrupt. Occasionally, a corrupt message may be placed into an SMTP queue. If this happens, it can hold up everything else in the queue.
To determine if a message is corrupt, look at its properties. If there are properties that are blank or that contain gibberish, it's a pretty good indication that the message is corrupt. Simply delete the corrupted message and the queue should start flowing again.
Another sign of trouble is when excessive messages pile up in the Failed Message Retry queue. This usually indicates an external problem, such as link failures (router failure, unplugged cable, bad NIC, etc.) or DNS resolution failures. In this case, make sure that the Exchange server can communicate with the rest of the network and the Internet, and that it is able to perform DNS resolutions successfully.
Part 4: Troubleshooting the Local Delivery queue
The Local Delivery queue is used as a repository for messages that are going to be delivered to local Exchange mailboxes. You don't have to worry too much about this SMTP queue malfunctioning. If email becomes stuck in the queue, it is almost always because the Exchange server cannot accept new messages.
Conditions that would prevent an Exchange server from being able to accept email messages include:
- Dismounted Exchange database
- Denial-of-service attack
More often, messages tend to get backed up in the Local Delivery queue rather than actually getting stuck there. When this SMTP queue does become backed up, it means that the Exchange server is not performing sufficiently to keep up with the current workload.
This type of performance issue is usually related to the disk subsystem. The queue becomes backed up because the disk subsystem is not able to move email from the queue to the transaction logs as quickly as messages are coming into the queue.
Part 5: Troubleshooting the Messages Awaiting Directory Lookup queue
The Messages Awaiting Directory Lookup queue is a holding facility for email sent to recipients who have not yet been resolved against Active Directory. Likewise, if a message is sent to a distribution list, it is placed into this SMTP queue while Exchange Server determines the list's membership and resolves each recipient's address to a mail-enabled Active Directory user account.
If messages accumulate in this queue it's because the SMTP categorizer component of Exchange Server's Advanced Queuing Engine is unable to determine how to route inbound messages across your Exchange organization. This occurs when Exchange Server has trouble communicating with a global catalog server. The global catalog server may be inaccessible or can't keep up with the demands of the Exchange server and the rest of the machines in the forest (even workstations communicate with a global catalog server when users log on).
There are a couple of things that you can do to correct this:
- If the global catalog server has dropped offline or if there is a link failure between the Exchange server and the global catalog server, you need to restore communications between the two servers in order to fix the problem.
- If the global catalog server is having trouble keeping up with the demands of the Exchange organization, you can designate another domain controller to act as a global catalog server. Windows only designates one domain controller to act as a global catalog server by default, but it is a good idea to have multiple domain controllers configured to act as global catalog servers -- both for fault tolerance and load balancing.
In some rare cases, problems with the SMTP categorizer might not be related to a global catalog server issue. If everything seems to be OK with your global catalog server, I recommend increasing the diagnostic logging level for the SMTP categorizer to help determine the cause of the issue.
Part 6: Troubleshooting the Messages Waiting To Be Routed queue
The Messages Waiting To Be Routed queue stores email until the message's next stop in route to its final destination can be determined. If email gets hung up in this SMTP queue, it is always related to a routing. Unfortunately, this one's tough to troubleshoot. The issue could be related to a link failure, a corrupted routing table, or a million other things.
Try to do tracert (or traceroute) to the message's destination domain. This will help you to determine whether or not the Exchange server is able to communicate with and route packets to the destination domain.
If the tracert fails, try pinging a few other external domains. This way you can tell if the problem is specific to the destination domain or if the server is having trouble communicating with all external domains. If you are unable to ping any external domains (and firewalls are not blocking ICMP traffic), then the routing issue is occurring at either the hardware level or at the TCP/IP level.
On the other hand, if you are able to ping the destination domain and run a successful tracert, then physical connectivity exists and TCP/IP is working. Therefore, the issue is Exchange-related. In that case, I recommend increasing the Exchange server's diagnostic logging level for routing to help you to determine the cause of the problem.
Part 7: Troubleshooting the Final Destination Currently Unreachable queue
As its name implies, the Final Destination Currently Unreachable queue is used to store SMTP email messages that can not be routed to their final destination. If messages begin to appear in this SMTP queue, then obviously Exchange is having trouble routing that email to the destination domain.
First, you should check to see if all or most outbound SMTP email is ending up in this queue, or if only select messages are being placed in the queue.
If all or most of the email is being placed into the Final Destination Currently Unreachable queue, Exchange is likely having trouble communicating with external domains. This could be because of a broken or disconnected network cable, a malfunctioning router, or because your DNS server is having trouble resolving the recipient's domain names.
If only messages destined for select domains are being placed into the Final Destination Currently Unreachable queue, the problem could be DNS-related -- i.e., the DNS server might not be able to resolve that particular domain name. Another possible cause is that your router might have a corrupt routing table.
One thing to note: Fixing the SMTP queue issue may not restore mail flow. For example, if messages were accumulating in the Final Destination Currently Unreachable queue because of a broken network link and you fixed the broken link, Exchange Server will often continue to hold the messages in the queue even though the link is fixed.
To restore mail flow, Microsoft recommends restarting the SMTP virtual server that the queue services.
Part 8: Troubleshooting the Messages Pending Submission queue
Exchange Server uses the Messages Pending Submission queue to store outbound SMTP email messages that the server has not yet begun to process. For example, if a user were to send an SMTP message to an external recipient, Exchange Server would route the email through the various SMTP queues I've discussed as it resolves the recipient's IP address and transmits the message. When the Exchange server is too busy to begin processing messages, all outbound SMTP email is placed into the Messages Pending Submission queue until it can be processed.
SMTP messages normally pass through the Messages Pending Submission queue quickly. During busier parts of the day, you might see messages briefly accumulate in the queue if users are sending SMTP email faster than the Exchange server can handle it. However, individual messages should not remain in the queue for any prolonged period of time.
If messages are held in the queue for excessive lengths of time, it usually indicates an Exchange Server performance problem. Exchange's CPU may not be able to keep up with the demand being placed on the server. Likewise, the server may have insufficient memory or the server's hard disks may be too slow. The only way to really tell for sure is to use the Performance Monitor to test for bottlenecks on the Exchange server.
Hardware performance problems are not the only possible cause of messages accumulating in the Messages Pending Submission queue. Another reason is custom or third-party event sinks.
Part 9: Troubleshooting Remote Destination queues
Occasionally, you might see a queue bearing the name of a connector, followed by the name of a server and the name of a remote domain. These SMTP queues are created dynamically for the purpose of delivering email messages to a remote domain.
If you have such queues and they have messages accumulated within them, check the status of the SMTP queue to see if it is in Retry mode or if it's just slow. If the queue is in Retry mode, check the queue's properties for more information as to why it is in the retry state. Typically, the problem will either be DNS- or link-related.
To troubleshoot the problem, manually communicate with the remote domain by using tools such as tracert, ping and telnet.
About the author
Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Exchange Server, and has previously received Microsoft's MVP award for Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and was once responsible for the Department of Information Management at Fort Knox. As a freelance technical writer, Brien has written for Microsoft, TechTarget, CNET, ZDNet, MSD2D, Relevant Technologies and other technology companies. You can visit Brien's personal Web site at www.brienposey.com.