Skip to content

Recovering from "Too many messages have been received without being deleted" #12

@danieroux

Description

@danieroux

When hammering a filled up queue with many peers to suck of messages as fast as possible, Amazon throws an exception which kills the job.

I would like to know how to recover from this. My stopgap is to use less peers:

 [{:type clojure.lang.ExceptionInfo
   :message "Too many messages have been received without being deleted.\nPlease delete your received messages or let them timeout before receiving more. (Service: AmazonSQS; Status Code: 403; Error Code: OverLimit; Request ID: f53c9b7c-76d4-57a4-9e55-cf1e77e7e885)"
   :data {:original-exception :com.amazonaws.services.sqs.model.OverLimitException}
   :at [com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1639]}]

As far as I can figure out:

  • sqs/delete-message-async-batch gets called in checkpointed!. This only happens after a 100k messages have already been read off the queue in the case of many peers.
  • Which means that poll! fails with >100k messages in flight.

Can I get some guidance on how to handle it?

  • Is it as simple as not doing a sqs/receive-messages if (< (count @processing) 100,000)?
  • Or would a separate counter be more useful/efficient?
  • What else should I be aware of before I touch the code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions