Skip to content

feat: Add ability to configure Outbox processor to use batches or guaranteed ordering#58

Merged
effron merged 9 commits intomasterfrom
effron/main/outbox-batch-api
Nov 24, 2025
Merged

feat: Add ability to configure Outbox processor to use batches or guaranteed ordering#58
effron merged 9 commits intomasterfrom
effron/main/outbox-batch-api

Conversation

@effron
Copy link
Contributor

@effron effron commented Nov 21, 2025

Summary

Add a config option to Journaled that allows the batch outbox to be processed using :guaranteed_order (current approach) or :batch (new approach). The :batch option makes use of SKIP LOCKED in the query and the kinesis put_events batch api to increase throughput at the expense of guaranteeing events are delivered in the order they are enqueued.

@effron effron marked this pull request as ready for review November 21, 2025 20:34
@effron effron requested a review from a team as a code owner November 21, 2025 20:34
@effron effron requested a review from smudge November 21, 2025 20:34
@jmileham
Copy link
Member

Might want to tease that guaranteed_order's current single-threaded constraint is not a philosophical hard stop - we may invest in making guaranteed_order capable of batching and threading by partition key, it's just that the current implementation does not do any of those optimizations.

failed = stream_events.map do |event|
Journaled::KinesisFailedEvent.new(
event:,
error_code: error.class.to_s,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this produce "ValidationException" and not "Aws::Kinesis::Errors::ValidationException"?

Copy link
Contributor Author

@effron effron Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an error code on the event returned by kinesis in the batch response, and not an actual raised exception by the ruby SDK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. So the library just uses the same name for its Errors:: classes as the possible error codes the batch API might return.

@effron
Copy link
Contributor Author

effron commented Nov 21, 2025

@jmileham good call, updated the readme to reflect that future optimizations could make the guaranteed order mode parallelizable

README.md Outdated
bundle exec rake journaled_worker:work
```

**Note:** In `:batch` mode (the default), you can run multiple worker processes concurrently for horizontal scaling. In `:guaranteed_order` mode, only run a single worker to maintain ordering guarantees.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I'm not missing something, if you run more than one worker, the locking strategy means only one will ever be able to act at a time. So it won't break anything, it's just inefficient and entirely unnecessary to run more than one in the guaranteed order mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's worded a bit confusingly. i guess, it's designed to be processed by a single worker at a time, but would be resilient to multiple workers running concurrently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh just saw @jmileham's top-level comment -- yeah ditto.

Copy link
Member

@smudge smudge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slight differences in error-handling threw me off a little because they feel unnecessarily duplicative, but I get how in one mode Kinesis is handing you a string, and in the other the SDK is bubbling up a Ruby exception.

domain LGTM && platform LGTM on the rest -- thanks for breaking this apart into two different strategies -- I'll be eager to see how much better the batch PUTs API performs without any additional worker concurrency on our end.

Copy link
Member

@smudge smudge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain LGTM

@effron
Copy link
Contributor Author

effron commented Nov 24, 2025

After some stage testing, i changed the behavior to always sleep between cycles. We were easily hitting kinesis rate limits, and i didn't want spiky periods of job processing to unnecessarily hit the database too often

Copy link
Member

@smudge smudge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain LGTM

@effron effron merged commit 92e71c8 into master Nov 24, 2025
29 checks passed
@effron effron deleted the effron/main/outbox-batch-api branch November 24, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants