Skip to content

Conversation

@cnfait
Copy link
Contributor

@cnfait cnfait commented May 20, 2025

Issue #, if available:
would close #514 by providing a clearer indication of failure/partial failure

Description of changes:
Improve error handling for sdlf-stage-lambda and sdlf-stage-glue:

  • send processing failures to a SQS dead-letter queue (DLQ)
  • visually show processing failures with the post lambda
  • & avoid confusion about where errors happened (due to the parallel state wrapper)

To that end:

  • add sqs dead-letter queue url to sdlf-pipeline stack outputs
  • align dlq and queue content dedup configuration
  • send failures to sqs dlq in the distributed map runs
    • collecting failures and sending them to the dlq in the post-processing lambda is not scalable due to sfn output size limits (and other reasons)
  • add a message attribute containing the state machine execution id for easier debugging
  • align peh_id and sfn state machine execution name
    • both are uuid4 anyway and this helps debugging and updating the peh dynamodb table
  • remove the error lambda, subsumed entirely by the state machine itself and the post lambda
  • remove the parallel state wrapper

The lambda and glue stages are the only one updated in this PR - the others work a bit differently, I plan to align all of them in a future PR. Also this PR is not meant to be a perfect solution, but I'm keeping the changes and the workload manageable...

A couple example screenshots to show what it may look like/the testing done (I've also rerun the workshop to be sure):
image
image
image

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

cnfait added 2 commits May 20, 2025 15:08
send processing failures to a SQS dead-letter queue (DLQ)
visually show processing failures with the post lambda
& avoid confusion about where errors happened (due to the parallel state wrapper)

to that end:
add sqs dead-letter queue url to sdlf-pipeline stack outputs
align dlq and queue content dedup configuration
send failures to sqs dlq in the distributed map runs
 collecting failures and sending them to the dlq in the post-processing lambda is not scalable due to sfn output size limits
add a message attribute containing the state machine execution id for easier debugging
align peh_id and sfn state machine execution name
 both are uuid4 anyway and this helps debugging and updating the peh dynamodb table
remove the error lambda, subsumed entirely by the state machine itself and the post lambda
remove the parallel state wrapper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sdlf-legislators-main-A-sm Step Functions state machine fails due to malformed json (regions.json)

1 participant