Dead Letter Queue
A Dead Letter Queue
is a Design Pattern where one moves messages to a dedicated Queue called “the Dead Letter Queue
” if the message meets one or more Exception criteria.
Exception criteria
Queue does not exist
If the message is sent to a Queue that does not exist, it could be sent to the Dead Letter Queue
.
This would have to be implemented on the Event Plane.
Queue length limit exceeded
Sometimes there is a limit to how many messages a Queue can hold. So if the Queue fills up because the messages are not being processed (or not being processed as fast as they are produced), then new messages cannot be placed in the Queue. To not lose those messages, they could be placed in the Dead Letter Queue
instead.
This would have to be implemented on the Event Plane.
Message or Queue length limit exceeded
Sometimes there are restrictions to the size a message can have, or the maximum size a Queue can have. In either of those cases, if that limit is exceeded then new messages cannot be placed on the Queue. To not lose them, they could be placed in the Dead Letter Queue
instead.
This would have to be implemented on the Event Plane. Some implementations, though, like AWS SQS for example, don’t support messages that exceed a certain size, period (256 kB
in the example of AWS SQS). That means that if a message cannot be placed in the Queue because it’s too big, that it cannot be placed in the Dead Letter Queue
either. To not lose that message, the Dead Letter Queue
cannot be a Queue, but something else should serve as a Dead Letter Queue
, like a file system. This would have to be implemented in the Application Layer, though.
Message is rejected by another Queue exchange
Some Event Plane implementations, like RabbitMQ, support explicitly rejecting a message (which could be done as part of Exception Handling). The Event Plane could be configured to send those messages to the Dead Letter Queue
instead.
Message reaches a threshold read counter
A “Read Limit” is a feature that is provided by some Event Plane implementations to address the Racetrack Problem in the Middleware Layer instead of the Application Layer: if a messages was picked up from the Queue and then put back (because an Exception occurred while processing it, for example), then it probably never will get processed, and by moving it to the Dead Letter Queue
one will not run into the Racetrack Problem.
The message expires
Many Event Plane implementations support a TTL. This should always be applied when Request-Response is used, and could be applied when Publish-Subscribe is used for notifications.
One could possibly configure the Event Plane to put expired messages to a Dead Letter Queue
. The real question is whether you should. I think they answer is usually “No”. After all, the TTL was deliberately set because after that time the message is no longer relevant. So if it is rendered irrelevant by the TTL expiring, why retain that message?
The message is not processed correctly
If a message fails to process correctly, because of an Exception for example, the default operation is to put the message back in the Queue. This, however, could easily result in the Racetrack Problem, where the message gets processed over and over again and fails every time.
Instead of putting the message back in the Queue, it could be put in the Dead Letter Queue
instead. This, however, would have to be implemented in the Application Layer, as that is the lowest level that is aware that the message fails to be processed. It is somewhat tricky to implement though:
- You are already in an Exception state.
- If Guaranteed Delivery is one of the requirements, meeting that requirement is not trivial when removing a message from one Queue and putting it in the
Dead Letter Queue
. It will be even less trivial if Exactly Once is also a requirement.
The simpler solution might be Message reaches a threshold read counter in those cases.
Handling the Dead Letter Queue
Redirecting messages to a Dead Letter Queue
is only useful if one fixes the issues that got the messages in the Dead Letter Queue
in the first place, and then reprocesses them, or deliberately decides to discard them.
Especially if there is an Incident that causes messages to be directed to the Dead Letter Queue
, there could be a lot of messages there that need to be processed. One should think carefully about how to implement that.
There are two different approaches to handling the Dead Letter Queue
.
One Dead Letter Queue for all messages
The advantage of this approach is that one would have one place where failed messages end up. This means that one also only has to Monitor one Queue for failed messages.
Because every kind of failed message now ends up in one Queue, and because there are a lot of them, Tooling is required to fix and reprocess messages efficiently. If one chooses for this approach one should pick an Event Plane implementation that provides those adequate tools.
Multiple dedicated Dead Letter Queues
In this scenario every Queue that makes use of Dead Letter Queue
s gets its own dedicated Dead Letter Queue
.
The advantage of this approach is that you always know what kind of message you’re dealing with, because it only contains one kind of messages. If the problem is fixed and the messages need to be reprocessed, that, too, is easy, because one knows what Queue the messages came from.
The disadvantage of this approach is that you will have as many Dead Letter Queue
s as there are Queues. This many Queues is harder to Monitor than just one Queue. Also if there are many Dead Letter Queue
s and many different kinds of failed messages, one has to jump back and forth between a lot of Dead Letter Queue
s to resolve all issues.
Another thing to take into account is that not all Event Plane implementations allow for configuring which Queue to use for the Dead Letter Queue
in different scenarios. This may result in that the Dead Letter Queue
Pattern has to be implemented in the Application Layer instead of in the Middleware Layer.