admin: fix use-after-free in coord_request error path#5397
Open
Piotr WOLSKI (piochelepiotr) wants to merge 1 commit intoconfluentinc:masterfrom
Open
Conversation
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
When rd_kafka_admin_coord_request()'s request() call fails, the error path called rd_kafka_admin_common_worker_destroy() which freed the eonce object. However, the caller (rd_kafka_coord_req_fsm) still holds a reference to the eonce and passes it to rd_kafka_coord_req_fail(), which enqueues a dummy error response carrying the (now-freed) eonce as opaque. When rd_kafka_admin_coord_response_parse() later processes that response and calls rd_kafka_enq_once_del_source_return(), it accesses freed memory, triggering an assertion failure and abort: rd_kafka_enq_once_del_source_return: Assertion `eonce->refcnt > 0' Fix by not calling worker_destroy() in the error path of rd_kafka_admin_coord_request(). Instead, let the error propagate through the normal coord_req_fail -> coord_response_parse path, which already handles cleanup correctly. This matches the pattern used by rd_kafka_txn_send_TxnOffsetCommitRequest(), which has explicit comments documenting the same constraint. This affects all coordinator-targeted Admin API operations: DescribeConsumerGroups, DeleteConsumerGroupOffsets, ListConsumerGroupOffsets, and similar. Fixes confluentinc#4605 Fixes confluentinc#3663 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6f5f7a3 to
c6b26c7
Compare
Piotr WOLSKI (piochelepiotr)
added a commit
to DataDog/integrations-core
that referenced
this pull request
Apr 9, 2026
…or path Apply upstream fix (confluentinc/librdkafka#5397) for a use-after-free bug in rd_kafka_admin_coord_request() that causes process abort with assertion failure on eonce->refcnt. Affects DescribeConsumerGroups, DeleteConsumerGroupOffsets, ListConsumerGroupOffsets and similar coordinator-targeted Admin API operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fix a use-after-free bug in
rd_kafka_admin_coord_request()that causes aprocess abort with:
This affects all coordinator-targeted Admin API operations:
DescribeConsumerGroups,DeleteConsumerGroupOffsets,ListConsumerGroupOffsets, and similar.Root cause
When
rd_kafka_admin_coord_request()'s innerrequest()call fails(e.g., API not supported by broker, connection dropped), the error path
called
rd_kafka_admin_common_worker_destroy()which freed theeonceobject. However, the caller (
rd_kafka_coord_req_fsm) still holds areference to the
eonceand passes it tord_kafka_coord_req_fail(),which enqueues a dummy error response with the now-freed
eonceas opaque.When
rd_kafka_admin_coord_response_parse()later processes that responseand calls
rd_kafka_enq_once_del_source_return(), it accesses freed memory,triggering the assertion failure.
Fix
Remove the premature
worker_destroy()call from the error path ofrd_kafka_admin_coord_request(). The error is returned to the caller,which calls
coord_req_fail()→coord_response_parse(). That functionalready handles the error correctly: it calls
del_source_return()toretrieve the
rko, sees the error, and callsworker_destroy()itself.This matches the pattern already used by
rd_kafka_txn_send_TxnOffsetCommitRequest(), which has explicit commentson its error paths documenting the same constraint:
How to reproduce
The bug triggers under two conditions:
(e.g., OffsetDelete on broker < 2.4)
an Admin API fanout operation
Higher request frequency increases the likelihood of hitting the race.
Testing
This is a race condition on the error path of coordinator-targeted admin
requests. It requires either an API version mismatch or a connection failure
during the request send, making it difficult to reproduce deterministically
in a unit test. Happy to add a test if maintainers can suggest an approach
for reliably triggering the send failure.
Fixes #4605
Fixes #3663