Skip to content
This repository was archived by the owner on Jul 30, 2025. It is now read-only.

Conversation

@travisdowns
Copy link
Member

Listen for SIGTERM and call all Shutdown() in the verifiers.

See commits and CORE-8470 for additional context.

We listen for SIGINT and do a graceful shutdown, but we should do the
same thing on SIGTERM too, as that's what systemd will send a
service, and we run kgo as a service (and anyway it just makes sense
to do graceful shutdown on SIGTERM, that's kind of what it's there
for).

The main reason to do graceful shutdown is to commit any cached offsets
and (especially) to leave the consumer group explicitly, otherwise the
CG will stay in the Stable state until heartbeat timeout, which causes
problems for re-joining members.
The verifiers expose a Shutdown() method but we never call it.

This means that the clients are never closed, so consumers never
explicitly leave their CG, which causes problems.

Fixes CORE-8470.
@travisdowns travisdowns changed the title Td kgo leave group Repeater should leave CG on graceful shutdown Dec 6, 2024
@travisdowns
Copy link
Member Author

Example of pre-fix and post-fix restarts:

image

Note that many of the topics have 0 batches written on the left, these are hung (lines are overlapping so it's hard to see, but it's about half the topics). The first two red lines, are two restarts without the fix. Note that all topics go non-zero (since we always write the initial number of tokens), but many go back to zero (not all, it's racy so you can see some which "cross over" and keep running).

The third restart is after the fix, and all topics have load now. Interestingly, the batch rate of the highest throughput topics drops (e.g., blue series at the top) goes down once all generators are up: I attribute this to increased latency due to more load generators which directly impacts these "repeater" loads since the throughput (if not throttled) is directly related to latency as there are a fixed number of tokens in flight.

@travisdowns
Copy link
Member Author

@StephanDollberg I added you on here for a lack of a better idea, feel free to recommend other/alternate reviewers as well.

@travisdowns travisdowns merged commit bdc41ee into main Dec 9, 2024
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants