-
Notifications
You must be signed in to change notification settings - Fork 0
KAFKA-17862: [buffer pool] corruption during buffer reuse from the pool #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
WalkthroughA new integration test class, Changes
Sequence Diagram(s)sequenceDiagram
participant Test as ProducerIntegrationTest
participant Producer as EvilKafkaProducer
participant Sender as CustomSender
participant BufferPool as EvilBufferPool
participant Kafka as KafkaCluster
participant Consumer as KafkaConsumer
Test->>Producer: Build producer with custom Sender and BufferPool
Producer->>Sender: Send record
Sender->>BufferPool: Allocate buffer
Sender->>Sender: Sleep to force batch expiration
BufferPool->>Sender: Deallocate (corrupt buffer)
Sender->>Kafka: Send batch
Kafka->>Consumer: Record available
Test->>Consumer: Consume record
Consumer-->>Test: Return record (verify integrity)
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java (1)
51-53: Visibility relaxation oflockwidens surface for concurrency misuseChanging
lockfromprivatetoprotectedgives every subclass full control over the internal locking mechanism.
While this enables the newEvilBufferPooltest helper, it also:
- Exposes a core synchronisation primitive to arbitrary external subclasses (the class is
public), risking accidental dead‑locks or missed unlocks that violate the pool’s invariants.- Couples the subclass implementation to the exact locking strategy—future refactors of
BufferPoolwould become breaking changes.Consider keeping
lockprivateand offering a narrower protected utility instead, e.g.:// inside BufferPool protected <T> T withLock(Supplier<T> action) { lock.lock(); try { return action.get(); } finally { lock.unlock(); } }Subclasses then wrap their critical section without needing direct access.
This keeps the contract stable while still unblocking tests.
clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java (1)
321-323: Method madeprotected; please annotate and seal extension points
sendProducerDataswitched from private to protected to allow test overrides.
Two follow‑ups help avoid unintended use in production code:
- Annotate the method with
@org.apache.kafka.common.annotation.InterfaceStability.Unstableor at least add a Javadoc note: “For test purposes only – not a public API”.- Alternatively mark
Senderitselffinaland expose a package‑private subclass used only in tests to avoid external extension of a critical component.This limits the blast‑radius while still supporting the new integration tests.
clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/producer/ProducerIntegrationTest.java (2)
171-178: Avoid hard‑codedThread.sleepin timing‑sensitive testsInjecting a real 500 ms
sleepincreases test runtime and introduces flakiness on congested CI hosts.Replace with the controllable
Timeabstraction already used elsewhere:-Thread.sleep(500); +Time.SYSTEM.sleep(500); // or Better: pass in a MockTime and advance itEven better, pass a
Timeinstance into the producer builder and advance aMockTimeto force expiry deterministically.
259-267: Double locking & potentialUnsupportedOperationExceptioninEvilBufferPool.deallocate
lock.lock()/unlock()surrounds a call tosuper.deallocate, which also locks.
Re‑entrancy works, but the extra acquire adds overhead and obscures the locking hierarchy.buffer.array()throwsUnsupportedOperationExceptionfor direct or read‑only buffers.
Today the pool only allocates heap buffers, but guarding makes the helper future‑proof.Suggested simplification:
-lock.lock(); -try { - Arrays.fill(buffer.array(), (byte) 0); - super.deallocate(buffer, size); -} finally { - lock.unlock(); -} +if (buffer.hasArray() && !buffer.isReadOnly()) { + Arrays.fill(buffer.array(), (byte) 0); +} +super.deallocate(buffer, size);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/producer/ProducerIntegrationTest.java(1 hunks)clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java(1 hunks)clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java(1 hunks)
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request addresses a critical issue of potential data corruption in the Kafka producer when expired batches are prematurely released back to the buffer pool. The changes include removing expiration handling before send and deferring deallocation to the response handling logic, along with adding an integration test to verify batch integrity. Overall, the changes seem well-reasoned and the added test is a good way to prevent regressions. However, there are a few areas that could be improved for clarity and maintainability.
Summary of Findings
- Visibility of
sendProducerData: The change in visibility of thesendProducerDatamethod fromprivatetoprotectedseems reasonable for testing purposes. However, it's important to ensure that this change doesn't inadvertently expose internal implementation details to subclasses or other parts of the codebase. Consider adding a comment explaining why the visibility was changed and how it should be used. - Redundant Batch Expiry Handling: The removal of
getExpiredInflightBatchesand the consolidation of expiry handling withinaccumulator.expiredBatchessimplifies the logic and reduces the risk of inconsistencies. This is a positive change that improves the overall clarity of the code. - EvilBufferPool Corruption: The
EvilBufferPoolclass intentionally corrupts the ByteBuffer during deallocation to simulate buffer reuse bugs. While this is a clever approach for testing, it's important to ensure that this class is only used in testing environments and doesn't accidentally make its way into production code.
Merge Readiness
The pull request addresses a critical issue and includes a new integration test, which is a significant improvement. However, before merging, it's important to address the high severity comment regarding the visibility of sendProducerData and ensure that the EvilBufferPool is strictly limited to testing environments. I am unable to approve this pull request, and recommend that other reviewers also examine this code before merging. At a minimum, the high severity issue should be addressed before merging.
| // Visible for testing | ||
| protected long sendProducerData(long now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this method protected increases its visibility. It's good that it's visible for testing, but it's important to document why this change was made and how it should be used. This is particularly important since this method is part of the core sending logic.
| // Visible for testing | |
| protected long sendProducerData(long now) { | |
| /** | |
| * Send data in the accumulator to the brokers. | |
| * Visible for testing. | |
| */ | |
| @VisibleForTesting | |
| protected long sendProducerData(long now) { |
| private final long totalMemory; | ||
| private final int poolableSize; | ||
| private final ReentrantLock lock; | ||
| protected final ReentrantLock lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the lock to protected seems reasonable to allow subclasses to access it, but consider if there are other ways to achieve the same goal without exposing the lock directly. If this is the best approach, please add a comment explaining why this change was necessary and how subclasses should use the lock safely.
| protected final ReentrantLock lock; | |
| /** | |
| * The lock protecting all mutable state in the pool. | |
| * It is protected to allow subclasses in testing to access it, but should be used with caution. | |
| */ | |
| protected final ReentrantLock lock; |
| List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now); | ||
| expiredBatches.addAll(expiredInflightBatches); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing expiredInflightBatches simplifies the logic, which is good. Can you add a comment explaining why this was removed, and how the expiry is now handled?
| List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now); | |
| expiredBatches.addAll(expiredInflightBatches); | |
| // Remove expiredInflightBatches since expiry is now handled in the accumulator directly. | |
| List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now); |
|
/do-refacto |
|
Refacto is reviewing this PR. Please wait for the review comments to be posted. |
|
/refacto-test-arvi |
|
Refacto is reviewing this PR. Please wait for the review comments to be posted. |
Code Review: Buffer Corruption Fix👍 Well Done
📌 Files Processed
📝 Additional Comments
|
| List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now); | ||
| expiredBatches.addAll(expiredInflightBatches); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer Corruption Vulnerability
The code was incorrectly handling expired batches by combining those from accumulator with in-flight batches. This could lead to buffer corruption as in-flight batches might still be in use when their buffers are returned to the pool and reused.
Standards
- CWE-672
- OWASP-A06
- NIST-SSDF-PW.1
| private final long totalMemory; | ||
| private final int poolableSize; | ||
| private final ReentrantLock lock; | ||
| protected final ReentrantLock lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lock Visibility Change
Changing lock visibility from private to protected exposes internal synchronization mechanism. This creates maintenance risk as subclasses can now directly access and manipulate the lock, potentially breaking thread-safety guarantees of the BufferPool class.
Standards
- Clean-Code-Encapsulation
- Maintainability-Quality-Thread-Safety
- Design-Pattern-Information-Hiding
| // Visible for testing | ||
| protected long sendProducerData(long now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sender Method Visibility
Changing method visibility from private to protected increases coupling by exposing internal implementation details. While the change enables testing, it creates a maintenance burden as this method now becomes part of the protected API that must be maintained for compatibility.
Standards
- Clean-Code-Encapsulation
- Maintainability-Quality-API-Design
- Design-Pattern-Information-Hiding
|
/refacto-test-arvi |
|
Refacto is reviewing this PR. Please wait for the review comments to be posted. |
Code Review: Buffer Pool Corruption Fix👍 Well Done
📌 Files Processed
📝 Additional Comments
|
| @Override | ||
| public void deallocate(ByteBuffer buffer, int size) { | ||
| // Ensure atomicity using reentrant behavior | ||
| lock.lock(); | ||
| try { | ||
| Arrays.fill(buffer.array(), (byte) 0); | ||
| super.deallocate(buffer, size); | ||
| } finally { | ||
| lock.unlock(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer Reuse Vulnerability
The test demonstrates a critical buffer reuse vulnerability where deallocated buffers are zeroed out before returning to the pool, but in-flight batches may still reference these buffers. This can lead to data corruption when buffers are reused while still referenced by in-flight requests, potentially causing message integrity issues or information disclosure.
Standards
- CWE-672
- OWASP-A04
- NIST-SSDF-PW.1
| protected long sendProducerData(long now) { | ||
| long result = super.sendProducerData(now); | ||
| try { | ||
| // Ensure the batch expires. | ||
| Thread.sleep(500); | ||
| return result; | ||
| } catch (InterruptedException e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread Sleep Injection
The test overrides sendProducerData to introduce a 500ms delay, ensuring batches expire while still in flight. This deliberate timing manipulation creates the conditions necessary to reproduce the race condition where buffers are deallocated while still being used by the network layer.
Standards
- Algorithm-Correctness-Concurrency
- Logic-Verification-Race-Condition
- Business-Rule-Timing
| // Visible for testing | ||
| protected long sendProducerData(long now) { | ||
| MetadataSnapshot metadataSnapshot = metadata.fetchMetadataSnapshot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protected Method Visibility
Changing method visibility from private to protected solely for testing purposes exposes internal implementation details. This creates tighter coupling between test and implementation, making future refactoring more difficult as tests now depend on implementation details.
Standards
- Clean-Code-Encapsulation
- SOLID-OCP
- Maintainability-Quality-Coupling
| private final long totalMemory; | ||
| private final int poolableSize; | ||
| private final ReentrantLock lock; | ||
| protected final ReentrantLock lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protected Field Access
The lock field visibility is changed from private to protected to enable the test class to properly synchronize buffer zeroing operations. This change ensures that the test can accurately simulate and detect the buffer corruption issue while maintaining thread safety.
Standards
- Algorithm-Correctness-Testability
- Logic-Verification-Access-Control
- Business-Rule-Thread-Safety
| static class EvilKafkaProducerBuilder { | ||
|
|
||
| Serializer<byte[]> serializer = new ByteArraySerializer(); | ||
| ApiVersions apiVersions = new ApiVersions(); | ||
| LogContext logContext = new LogContext("[expire Producer test ]"); | ||
| Metrics metrics = new Metrics(Time.SYSTEM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test Class Complexity
The EvilKafkaProducerBuilder test class directly constructs complex KafkaProducer internals, creating tight coupling to implementation details. This approach makes tests brittle to internal changes and increases maintenance burden when producer implementation evolves.
Standards
- Clean-Code-Test-Design
- SOLID-SRP
- Maintainability-Quality-Coupling
| static class EvilKafkaProducerBuilder { | ||
|
|
||
| Serializer<byte[]> serializer = new ByteArraySerializer(); | ||
| ApiVersions apiVersions = new ApiVersions(); | ||
| LogContext logContext = new LogContext("[expire Producer test ]"); | ||
| Metrics metrics = new Metrics(Time.SYSTEM); | ||
|
|
||
| String clientId; | ||
| String transactionalId; | ||
| ProducerConfig config; | ||
| ProducerMetadata metadata; | ||
| RecordAccumulator accumulator; | ||
| Partitioner partitioner; | ||
| Sender sender; | ||
| ProducerInterceptors<String, String> interceptors; | ||
|
|
||
| @SuppressWarnings({"unchecked", "this-escape"}) | ||
| Producer<byte[], byte[]> build(Map<String, Object> configs) { | ||
| this.config = new ProducerConfig(ProducerConfig.appendSerializerToConfig(configs, null, null)); | ||
| transactionalId = config.getString(ProducerConfig.TRANSACTIONAL_ID_CONFIG); | ||
| clientId = config.getString(ProducerConfig.CLIENT_ID_CONFIG); | ||
| return new KafkaProducer<>( | ||
| config, | ||
| logContext, | ||
| metrics, | ||
| serializer, | ||
| serializer, | ||
| buildMetadata(), | ||
| buildAccumulator(), | ||
| null, | ||
| buildSender(), | ||
| buildInterceptors(), | ||
| buildPartition(), | ||
| Time.SYSTEM, | ||
| ioThread(), | ||
| Optional.empty() | ||
| ); | ||
| } | ||
|
|
||
|
|
||
| private ProducerInterceptors buildInterceptors() { | ||
| this.interceptors = new ProducerInterceptors<>(List.of(), metrics); | ||
| return this.interceptors; | ||
| } | ||
|
|
||
| private Partitioner buildPartition() { | ||
| this.partitioner = config.getConfiguredInstance( | ||
| ProducerConfig.PARTITIONER_CLASS_CONFIG, | ||
| Partitioner.class, | ||
| Collections.singletonMap(ProducerConfig.CLIENT_ID_CONFIG, clientId)); | ||
| return this.partitioner; | ||
| } | ||
|
|
||
| private Sender buildSender() { | ||
| int maxInflightRequests = config.getInt(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION); | ||
| int requestTimeoutMs = config.getInt(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG); | ||
| ProducerMetrics metricsRegistry = new ProducerMetrics(this.metrics); | ||
| Sensor throttleTimeSensor = Sender.throttleTimeSensor(metricsRegistry.senderMetrics); | ||
| KafkaClient client = ClientUtils.createNetworkClient(config, | ||
| this.metrics, | ||
| "producer", | ||
| logContext, | ||
| apiVersions, | ||
| Time.SYSTEM, | ||
| maxInflightRequests, | ||
| metadata, | ||
| throttleTimeSensor, | ||
| null); | ||
|
|
||
| short acks = Short.parseShort(config.getString(ProducerConfig.ACKS_CONFIG)); | ||
| this.sender = new Sender(logContext, | ||
| client, | ||
| metadata, | ||
| this.accumulator, | ||
| maxInflightRequests == 1, | ||
| config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG), | ||
| acks, | ||
| config.getInt(ProducerConfig.RETRIES_CONFIG), | ||
| metricsRegistry.senderMetrics, | ||
| Time.SYSTEM, | ||
| requestTimeoutMs, | ||
| config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG), | ||
| null) { | ||
| @Override | ||
| protected long sendProducerData(long now) { | ||
| long result = super.sendProducerData(now); | ||
| try { | ||
| // Ensure the batch expires. | ||
| Thread.sleep(500); | ||
| return result; | ||
| } catch (InterruptedException e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| } | ||
| }; | ||
| return this.sender; | ||
| } | ||
|
|
||
| private RecordAccumulator buildAccumulator() { | ||
| long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG); | ||
| long retryBackoffMaxMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MAX_MS_CONFIG); | ||
| int batchSize = Math.max(1, config.getInt(ProducerConfig.BATCH_SIZE_CONFIG)); | ||
| Plugin<Partitioner> partitionerPlugin = Plugin.wrapInstance( | ||
| config.getConfiguredInstance( | ||
| ProducerConfig.PARTITIONER_CLASS_CONFIG, | ||
| Partitioner.class, | ||
| Collections.singletonMap(ProducerConfig.CLIENT_ID_CONFIG, clientId)), | ||
| metrics, | ||
| ProducerConfig.PARTITIONER_CLASS_CONFIG); | ||
| boolean enableAdaptivePartitioning = partitionerPlugin.get() == null && | ||
| config.getBoolean(ProducerConfig.PARTITIONER_ADPATIVE_PARTITIONING_ENABLE_CONFIG); | ||
| this.accumulator = new RecordAccumulator(logContext, | ||
| batchSize, | ||
| NoCompression.NONE, | ||
| (int) Math.min(config.getLong(ProducerConfig.LINGER_MS_CONFIG), Integer.MAX_VALUE), | ||
| retryBackoffMs, | ||
| retryBackoffMaxMs, | ||
| config.getInt(ProducerConfig.DELIVERY_TIMEOUT_MS_CONFIG), | ||
| new RecordAccumulator.PartitionerConfig( | ||
| enableAdaptivePartitioning, | ||
| config.getLong(ProducerConfig.PARTITIONER_AVAILABILITY_TIMEOUT_MS_CONFIG) | ||
| ), | ||
| metrics, | ||
| "producer-metrics", | ||
| Time.SYSTEM, | ||
| null, | ||
| new EvilBufferPool(config.getLong(ProducerConfig.BUFFER_MEMORY_CONFIG), batchSize, metrics, | ||
| Time.SYSTEM, "producer-metrics")); | ||
| return accumulator; | ||
| } | ||
|
|
||
| private ProducerMetadata buildMetadata() { | ||
| long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG); | ||
| long retryBackoffMaxMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MAX_MS_CONFIG); | ||
| List<MetricsReporter> reporters = CommonClientConfigs.metricsReporters(clientId, config); | ||
| ClusterResourceListeners clusterResourceListeners = ClientUtils.configureClusterResourceListeners( | ||
| List.of(), | ||
| reporters, | ||
| List.of( | ||
| Plugin.wrapInstance(serializer, metrics, ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).get(), | ||
| Plugin.wrapInstance(serializer, metrics, ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).get())); | ||
| this.metadata = new ProducerMetadata(retryBackoffMs, | ||
| retryBackoffMaxMs, | ||
| config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG), | ||
| config.getLong(ProducerConfig.METADATA_MAX_IDLE_CONFIG), | ||
| logContext, | ||
| clusterResourceListeners, | ||
| Time.SYSTEM); | ||
| metadata.bootstrap(ClientUtils.parseAndValidateAddresses(config)); | ||
| return metadata; | ||
| } | ||
|
|
||
| private KafkaThread ioThread() { | ||
| KafkaThread ioThread = new KafkaThread("test_io_thread", sender, true); | ||
| ioThread.start(); | ||
| return ioThread; | ||
| } | ||
| } | ||
|
|
||
| static class EvilBufferPool extends BufferPool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evil Class Naming
Using 'Evil' prefix in class names doesn't clearly communicate the test class purpose or behavior. More descriptive names like 'CorruptingBufferPool' or 'DelayingSenderProducerBuilder' would better communicate the test component's intended behavior and failure mode.
Standards
- Clean-Code-Naming
- Maintainability-Quality-Clarity
|
/refacto-test-arvi |
|
Refacto is reviewing this PR. Please wait for the review comments to be posted. |
Code Review: Buffer Pool Implementation👍 Well Done
📌 Files Processed
📝 Additional Comments
|
| Arrays.fill(buffer.array(), (byte) 0); | ||
| super.deallocate(buffer, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer Corruption Risk
The EvilBufferPool implementation zeroes out buffer contents before returning it to the pool, but after acquiring the lock. This creates a race condition where a buffer could be corrupted while still in use by another thread, causing data corruption.
Standards
- ISO-IEC-25010-Reliability-Fault-Tolerance
- ISO-IEC-25010-Functional-Correctness-Appropriateness
- SRE-Error-Handling
| } catch (InterruptedException e) { | ||
| throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interrupted Thread Handling
Thread interruption status is lost when InterruptedException is caught and wrapped in RuntimeException. This violates thread interruption contract and can lead to threads that cannot be properly terminated, potentially causing resource leaks.
Standards
- ISO-IEC-25010-Reliability-Fault-Tolerance
- SRE-Error-Handling
| static class EvilBufferPool extends BufferPool { | ||
|
|
||
| public EvilBufferPool(long memory, int poolableSize, Metrics metrics, Time time, String metricGrpName) { | ||
| super(memory, poolableSize, metrics, time, metricGrpName); | ||
| } | ||
|
|
||
| /** | ||
| * Override deallocate to intentionally corrupt the ByteBuffer being returned to the pool. | ||
| * This is used to simulate a scenario where an in-flight buffer is mistakenly reused | ||
| * and its contents are unexpectedly modified, helping expose buffer reuse bugs. | ||
| */ | ||
| @Override | ||
| public void deallocate(ByteBuffer buffer, int size) { | ||
| // Ensure atomicity using reentrant behavior | ||
| lock.lock(); | ||
| try { | ||
| Arrays.fill(buffer.array(), (byte) 0); | ||
| super.deallocate(buffer, size); | ||
| } finally { | ||
| lock.unlock(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve Test Documentation
While the test class has good documentation about simulating buffer corruption, it should clarify that this implementation intentionally introduces a race condition to test the system's resilience against buffer corruption scenarios.
Standards
- ISO-IEC-25010-Maintainability
- SRE-Observability
issue see https://issues.apache.org/jira/browse/KAFKA-17862
🔍 Problem Summary
When an expired batch is still part of an in-flight request, we
prematurely release the ByteBuffer back to the BufferPool. This leads to
two critical issues:
It may be re-allocated for another producer batch while still being
referenced by the in-flight request, potentially causing data
corruption.
We can tolerate Issue 1, but Issue 2 is critical — we cannot allow it to
happen.
Therefore, we remove the expiration handling of ProducerBatch before
send, and instead defer the ByteBuffer deallocation to the response
handling logic.
Summary by CodeRabbit
Tests
Refactor