Skip to content

Clarification on DLQ usage and handling of user-caused delivery failures in SQS setups #571

@miekassu

Description

@miekassu

We’re running Outpost with SQS and noticed some DLQ behavior we’d like to understand better.

Context

Our SQS DLQ regularly receives messages that correspond to normal user-caused delivery failures (e.g., invalid webhook URLs and endpoints returning 4xx). These aren’t system issues, but more like expected delivery failures that Outpost retries and tracks.

However, because the worker never reaches a “successful processing” state for these events, SQS eventually moves them to the DLQ.

The question

We think DLQ should represent pipeline-level failures (parsing errors, crashes, internal outages), while user-misconfigurations should be treated as handled delivery failures within Outpost.

So we want to understand:

  1. Is it expected that user-level delivery failures end up in the DLQ?
  2. Should these failures be acknowledged as “processed” from the queue’s perspective, even if the webhook call failed?
  3. Is there a recommended approach to avoid mixing user misconfigurations with actual Outpost/system failures in the DLQ?
  4. Would enabling failure alerts (ALERT_CALLBACK_URL) change how messages are acknowledged?

Goal

We would like DLQ to signal real processing issues, not just tenants misconfiguring their webhook destinations. Any clarification or guidance would be greatly appreciated!

Thanks for the great project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions