Offset handling, data loss, fault tolerance added #48

visz11 · 2025-10-15T08:46:56Z

This enhancement adds configurable data loss detection to Kafka consumers to prevent silent data loss scenarios. The feature integrates with existing auto.offset.reset strategies to provide enhanced fault tolerance capabilities, including detection of silent data loss from retention policies, service disruptions from topic recreation, and sophisticated edge case handling.

…ment (apache#20286) The PR fixes following: 1. In case share partition arrive at a state which should be treated as final state of that batch/offset (example - LSO movement which causes offset/batch to be ARCHIVED permanently), the result of pending write state RPCs for that offset/batch override the ARCHIVED state. Hence track such updates and apply when transition is completed. 2. If an acquisition lock timeout occurs while an offset/batch is undergoing transition followed by write state RPC failure, then respective batch/offset can land in a scenario where the offset stays in ACQUIRED state with no acquisition lock timeout task. 3. If a timer task is cancelled, but due to concurrent execution of timer task and acknowledgement, there can be a scenario when timer task has processed post cancellation. Hence it can mark the offset/batch re-avaialble despite already acknowledged. Reviewers: Andrew Schofield <[email protected]>, Abhinav Dixit <[email protected]>

…validation (apache#20177) The bug was a duplicate parameter validation in the `DelegationTokenCommand` class. The `checkInvalidArgs` method for the `describeOpt` was incorrectly including `renewTimePeriodOpt` twice in the set of invalid arguments. This bug caused unexpected command errors during E2E testing. ### Before the fix: The following command would fail due to the duplicate validation logic: ``` TC_PATHS="tests/kafkatest/tests/core/delegation_token_test.py::DelegationTokenTest" /bin/bash tests/docker/run_tests.sh ``` ### Error output: ``` ducktape.cluster.remoteaccount.RemoteCommandError: ducker@ducker03: Command 'KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/jaas.conf -Djava.security.krb5.conf=/mnt/security/krb5.conf" /opt/kafka-dev/bin/kafka-delegation-tokens.sh --bootstrap-server ducker03:9094 --create --max-life-time-period -1 --command-config /mnt/kafka/client.properties > /mnt/kafka/delegation_token.out' returned non-zero exit status 1. Remote error message: b'duplicate element: [renew-time-period]\njava.lang.IllegalArgumentException: duplicate element: [renew-time-period]\n\tat java.base/java.util.ImmutableCollections$SetN.<init>(ImmutableCollections.java:918)\n\tat java.base/java.util.Set.of(Set.java:544)\n\tat org.apache.kafka.tools.DelegationTokenCommand$DelegationTokenCommandOptions.checkArgs(DelegationTokenCommand.java:304)\n\tat org.apache.kafka.tools.DelegationTokenCommand.execute(DelegationTokenCommand.java:79)\n\tat org.apache.kafka.tools.DelegationTokenCommand.mainNoExit(DelegationTokenCommand.java:57)\n\tat org.apache.kafka.tools.DelegationTokenCommand.main(DelegationTokenCommand.java:52)\n\n' [INFO:2025-07-31 11:27:25,531]: RunnerClient: kafkatest.tests.core.delegation_token_test.DelegationTokenTest.test_delegation_token_lifecycle.metadata_quorum=ISOLATED_KRAFT: Data: None ================================================================================ SESSION REPORT (ALL TESTS) ducktape version: 0.12.0 session_id: 2025-07-31--002 run time: 33.213 seconds tests run: 1 passed: 0 flaky: 0 failed: 1 ignored: 0 ================================================================================ test_id: kafkatest.tests.core.delegation_token_test.DelegationTokenTest.test_delegation_token_lifecycle.metadata_quorum=ISOLATED_KRAFT status: FAIL run time: 33.090 seconds ``` ### After the fix: The same command now executes successfully: ``` TC_PATHS="tests/kafkatest/tests/core/delegation_token_test.py::DelegationTokenTest" /bin/bash tests/docker/run_tests.sh ``` ### Success output: ``` ================================================================================ SESSION REPORT (ALL TESTS) ducktape version: 0.12.0 session_id: 2025-07-31--001 run time: 35.488 seconds tests run: 1 passed: 1 flaky: 0 failed: 0 ignored: 0 ================================================================================ test_id: kafkatest.tests.core.delegation_token_test.DelegationTokenTest.test_delegation_token_lifecycle.metadata_quorum=ISOLATED_KRAFT status: PASS run time: 35.363 seconds -------------------------------------------------------------------------------- ``` Reviewers: Jhen-Yung Hsu <[email protected]>, TengYao Chi <[email protected]>, Ken Huang <[email protected]>, PoAn Yang <[email protected]>, Chia-Ping Tsai <[email protected]>

…culation (apache#18526) The timeout value may be overflowed if users set a large expiration time. ``` sessionExpirationTimeNanos = authenticationEndNanos + 1000 * 1000 * sessionLifetimeMs; ``` Fixed it by throwing exception if the value is overflowed. Reviewers: TaiJuWu <[email protected]>, Luke Chen <[email protected]>, TengYao Chi <[email protected]> Signed-off-by: PoAn Yang <[email protected]>

This patch updates the code and the dependency with the latest namespace and version. Signed-off-by: Federico Valeri <[email protected]> Reviewers: Mickael Maison <[email protected]>

## Changes: - Replaced all references to boundPort with brokerBoundPort. ## Reasons - boundPort and brokerBoundPort share the same definition and behavior. Reviewers: TaiJuWu <[email protected]>, Ken Huang <[email protected]>, Chia-Ping Tsai <[email protected]>

apache#20263) OffsetFetchResponses can have three different error structures depending on the version. Version 2 adds a top level error code for group-level errors. Version 8 adds support for querying multiple groups at a time and nests the fields within a groups array. Add a test for the errorCounts implementation since it varies depending on the version. Reviewers: Dongnuo Lyu <[email protected]>, Chia-Ping Tsai <[email protected]>

Along with the change: apache#17952 ([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)), the semantics of `min.insync.replicas` config has small change, and add some constraints. We should document them clearly. Reviewers: Jun Rao <[email protected]>, Calvin Liu <[email protected]>, Mickael Maison <[email protected]>, Paolo Patierno <[email protected]>, Federico Valeri <[email protected]>, Chia-Ping Tsai <[email protected]>

Add exclusions for new warnings to allow this upgrade. Reviewers: Chia-Ping Tsai <[email protected]>

This flag also skips control records, so the description needs to be updated. --------- Signed-off-by: Federico Valeri <[email protected]> Reviewers: Luke Chen <[email protected]>, Jhen-Yung Hsu <[email protected]>, Vincent Potucek

…e#20269) The `state-change.log` file is being incorrectly rotated to `stage-change.log.[date]`. This change fixes the typo to have the log file correctly rotated to `state-change.log.[date]` _No functional changes._ Reviewers: Mickael Maison <[email protected]>, Christo Lolov <[email protected]>, Luke Chen <[email protected]>, Ken Huang <[email protected]>, TengYao Chi <[email protected]>, Chia-Ping Tsai <[email protected]>

Add missing metrics for document tiered storage - kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs：Introduced in [KIP-1018](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests) - kafka.server:type=DelayedRemoteListOffsetsMetrics,name=ExpiresPerSec,topic=([-.\w]),partition=([0-9])：Introduced in [KIP-1075](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1075%3A+Introduce+delayed+remote+list+offsets+purgatory+to+make+LIST_OFFSETS+async) Reviewers: Chia-Ping Tsai <[email protected]>, Lan Ding <[email protected]>, Kamal Chandraprakash <[email protected]>

When using a connector that requires a schema, such as JDBC connectors, with JSON messages, the current JSONConverter necessitates including the schema within every message. To address this, we are introducing a new parameter, schema.content, which allows you to provide the schema externally. This approach not only reduces the size of the messages but also facilitates the use of more complex schemas. KIP : https://cwiki.apache.org/confluence/display/KAFKA/KIP-1054%3A+Support+external+schemas+in+JSONConverter Reviewers: Mickael Maison <[email protected]>, TengYao Chi <[email protected]>, Edoardo Comar <[email protected]>

…0275) The compiler warning is due to a lack of import. This patch imports the ApiException to fix it. Reviewers: TengYao Chi <[email protected]>, Yung <[email protected]>

Now that Kafka support Java 17, this PR makes some changes in tools module. The changes in this PR are limited to only some files. A future PR(s) shall follow. The changes mostly include: - Collections.emptyList(), Collections.singletonList() and Arrays.asList() are replaced with List.of() - Collections.emptyMap() and Collections.singletonMap() are replaced with Map.of() - Collections.singleton() is replaced with Set.of() Some minor changes to use the enhanced switch. Sub modules targeted: tools/src/test Reviewers: Chia-Ping Tsai <[email protected]>

This commit updates CI to test against Java 24 instead of Java 23 which is EOL. Due to Spotbugs not having released version 4.9.4 yet, we can't run Spotbugs on Java 24. Instead, we are choosing to run Spotbugs, and the rest of the compile and validate build step, on Java 17 for now. Once 4.9.4 has released, we will switch to using Java 24 for this. Exclude spotbugs from the run-tests gradle action. Spotbugs is already being run once in the build by "compile and validate", there is no reason to run it again as part of executing tests. Reviewers: Chia-Ping Tsai <[email protected]>

based on https://github.com/apache/kafka/graphs/contributors?from=2024%2F8%2F3 Reviewers: PoAn Yang <[email protected]>, Jhen-Yung Hsu <[email protected]>, TengYao Chi <[email protected]>, Ken Huang <[email protected]>, Nick Guo <[email protected]>

Add new group coordinator metrics section to TOC. Reviewers: Chia-Ping Tsai <[email protected]>

cleanup in QuotaFactory. Reviewers: Chia-Ping Tsai <[email protected]>

….html (apache#20313) The link for the heading Errant Record Reporter is missing the # symbol, which is causing it to redirect to a 404 Not Found page. Please refer to the updated preview. <img width="665" height="396" alt="kafka-site-preview" src="https://github.com/user-attachments/assets/1c6f3ea9-de9b-4b2c-a4d6-919199a6ff6f" /> Reviewers: PoAn Yang <[email protected]>, Ken Huang <[email protected]>, Chia-Ping Tsai <[email protected]>

Implements a timeout mechanism (using maxPollTimeMs) that waits for missing source topics to be created before failing, instead of immediately throwing exceptions in the new Streams protocol. Additionally, throw TopologyException when partition count mismatch is detected. Reviewers: Lucas Brutschy <[email protected]>, Alieh Saeedi <[email protected]>, Matthias J. Sax <[email protected]>

…#20299) The default value of `num.recovery.threads.per.data.dir` is now 2 according to KIP-1030. We should update config files which are still setting 1. --------- Signed-off-by: Federico Valeri <[email protected]> Reviewers: Luke Chen <[email protected]>

…in ProducerConfig (apache#20317) Fixes a typo in ProducerConfig: Renames `PARTITIONER_ADPATIVE_PARTITIONING_ENABLE_CONFIG` → `PARTITIONER_ADAPTIVE_PARTITIONING_ENABLE_CONFIG` The old key is retained for backward compatibility. See: [KIP-1175: Fix the typo `PARTITIONER_ADPATIVE_PARTITIONING_ENABLE` in ProducerConfig](https://cwiki.apache.org/confluence/x/KYogFQ) Reviewers: Yung <[email protected]>, TengYao Chi <[email protected]>, Ken Huang <[email protected]>, Nick Guo <[email protected]>, Ranuga Disansa <[email protected]>

…ache#20315) The PR removes unnecessary updates for find next fetch offset. When the state is in transition and not yet completed then anyways respective offsets should not be considered for acquisition. The find next fetch offset is updated finally when transition is completed. Reviewers: Manikumar Reddy <[email protected]>, Abhinav Dixit <[email protected]>

) Minor PR to move persister call outside of the lock. The lock is not required while making the persister call. Reviewers: Manikumar Reddy <[email protected]>, Abhinav Dixit <[email protected]>

Now that Kafka support Java 17, this PR makes some changes in `trogdor` module. The changes mostly include: - Collections.emptyList(), Collections.singletonList() and Arrays.asList() are replaced with List.of() - Collections.emptyMap() and Collections.singletonMap() are replaced with Map.of() - Collections.singleton() is replaced with Set.of() Some minor cleanups around use of enhanced switch blocks and conversion of classes to record classes. Reviewers: Ken Huang <[email protected]>, Vincent Jiang <[email protected]>, Chia-Ping Tsai <[email protected]>

…timeout (apache#20310) Fixing max delivery check on acquisition lock timeout and write state RPC failure. When acquisition lock is already timed out and write state RPC failure occurs then we need to check if records need to be archived. However with the fix we do not persist the information, which is relevant as some records may be archived or delivery count is bumped. The information will be persisted eventually. The persister call has failed already hence issuing another persister call due to a failed persister call earlier is not correct. Rather let the data persist in future persister calls. Reviewers: Manikumar Reddy <[email protected]>, Abhinav Dixit <[email protected]>

…_ENABLE_CONFIG (apache#20322) Document deprecation of PARTITIONER_ADPATIVE_PARTITIONING_ENABLE_CONFIG in `upgrade.html`, which was missed in apache#20317 Reviewers: Ken Huang <[email protected]>, Chia-Ping Tsai <[email protected]>

…ose (apache#20290) *What* https://issues.apache.org/jira/browse/KAFKA-19572 - If a `ShareConsumer` constructor failed due to any exception, then we call `close()` in the catch block. - If there were uninitialized members accessed during `close()`, then it would throw a NPE. Currently there are no null checks, hence we were attempting to use these fields during `close()` execution. - To avoid this, PR adds null checks in the `close()` function before we access fields which possibly could be null. Reviewers: Apoorv Mittal <[email protected]>, Lianet Magrans <[email protected]>

Improve RLMM doc: 1. Distinguish RLMM configs from other tiered storage configs, all RLMM configs need to start with a specific prefix, but the original documentation miss description. 2. Added description of additional configs for client, which is required when configuring authentication information. This can confuse users, for example: Aiven-Open/tiered-storage-for-apache-kafka#681 Reviewers: Luke Chen <[email protected]>, TengYao Chi <[email protected]>, Chia-Ping Tsai <[email protected]>

The previous URL http://lambda-architecture.net/ seems to now be controlled by spammers Co-authored-by: Shashank <[email protected]> Reviewers: Mickael Maison <[email protected]>

… user commit requests (apache#20234) Repartition topic records should be purged up to the currently committed offset once `repartition.purge.interval.ms` duration has passed. Reviewers: Matthias J. Sax <[email protected]>

Partially addresses KAFKA-15873. When filtering and sorting, we should be applying the filter before the sort of topics. Order that unauthorizedForDescribeTopicMetadata is added to not relevant as it is a HashSet. Reviewers: TaiJuWu <[email protected]>, Calvin Liu <[email protected]>, Matthias J. Sax <[email protected]>

…nd RemoveRaftVoterOptions (apache#20555) Improves the documentation of the clusterId field in AddRaftVoterOptions and RemoveRaftVoterOptions. The changes include: 1. Adding Javadoc to both addRaftVoter and removeRaftVoter methods to explain the behavior of the optional clusterId. 2. Integration tests have been added to verify the correct behavior of add and remove voter operations with and without clusterId, including scenarios with inconsistent cluster ids. Reviewers: TengYao Chi <[email protected]>, Chia-Ping Tsai <[email protected]>

## Summary Quota test isn't testing anything on the client side, but rather enforcing server-side quotas, so moving it out of the clients directory into the core directory. Reviewers: Lianet Magrans <[email protected]>

…he#20606) Update KIP-1147 changes (renaming --property to --formatter-property) in the ops and streams documentation. Reviewers: Andrew Schofield <[email protected]>

Fixed a tiny javadoc typo. Reviewers: Apoorv Mittal <[email protected]>

…tagged fields (apache#20536) # Description The [protocol guide](https://kafka.apache.org/protocol) 1) doesn't display tagged fields in BNF, and 2) includes empty tagged fields and redundant nested tables in tables. # Change ## BNF Now tagged fields are displayed as FIELD_NAME<tag number> Old: <img width="1316" height="275" alt="Screenshot 2025-09-13 at 5 34 28 PM" src="https://github.com/user-attachments/assets/c3e59382-7a6b-43f3-bc7a-893fb27d524d" /> New: <img width="1386" height="328" alt="Screenshot 2025-09-24 at 12 50 34 PM" src="https://github.com/user-attachments/assets/1ddbc95e-b0a7-4cd5-a5e0-e1303ffd2d06" /> Array Field: <img width="914" height="275" alt="Screenshot 2025-09-24 at 12 52 19 PM" src="https://github.com/user-attachments/assets/cfe66a21-0d66-4f23-8e5d-1d5dac8e4c9b" /> ## Table Empty tagged fields are removed from the table. Nested table for tagged fie Old: <img width="805" height="506" alt="Screenshot 2025-09-28 at 11 07 01 PM" src="https://github.com/user-attachments/assets/0669c2f3-150c-479d-b6ff-1d2857540fef" /> lds are removed. Tag of the field is shown in the "Field" column. New: <img width="1371" height="727" alt="Screenshot 2025-09-28 at 11 10 30 PM" src="https://github.com/user-attachments/assets/030abde6-60ec-4195-9778-da48ebd01084" /> Reviewers: Andrew Schofield <[email protected]>

codeant-ai · 2025-10-15T08:47:02Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

coderabbitai · 2025-10-15T08:47:43Z

Important

Review skipped

More than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review.

85 files out of 300 files are above the max files limit of 100. Please upgrade to Pro plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch clone-fault_tolerance_kafka

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-10-15T09:01:31Z

Summary of Changes

Hello @visz11, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Kafka client ecosystem by introducing advanced data loss detection for consumers, bolstering transactional capabilities with two-phase commit support for producers, and updating numerous dependencies and build configurations. It also expands the integration test suite to cover a wider array of client behaviors and fault tolerance scenarios, ensuring a more robust and reliable messaging system.

Highlights

Dependency and Build System Updates: Numerous dependencies across the project have been updated, including various Jackson libraries, Log4j, Jetty, Jose4j, RocksDB, Scala, Snappy-Java, SnakeYAML, OpenTelemetry-Proto, JLine, RE2J, Classgraph, PCollections, and Jersey libraries. GitHub Actions workflows have been refined, with Java setup updated to v5, Python setup to v6, and Gradle build logic adjusted to exclude redundant SpotBugs tasks and incorporate a Mockito Java Agent workaround for Java 21+.
New Integration Tests and Refactorings: A substantial number of new integration tests have been introduced, covering diverse scenarios such as policy-based topic creation, metadata version upgrades, metrics during topic operations, consumer bounce behavior, legacy message format handling, and various consumer functionalities (assign, commit, fetch, poll, subscription). Existing test utilities have been refactored and consolidated into ClientsTestUtils.java for better reusability and clarity.
Consumer Data Loss Detection and Fault Tolerance: A new DataLossDetector class has been added to enhance Kafka consumer fault tolerance. This feature introduces configurable detection for silent data loss scenarios, including offset gaps, topic recreation, and out-of-range offsets, integrating with existing auto.offset.reset strategies to provide warnings or exceptions based on configuration.
Producer Transactional Enhancements: The producer API now supports two-phase commit (2PC) transactions with the introduction of prepareTransaction() and completeTransaction(PreparedTxnState) methods. This allows external transaction managers to coordinate Kafka transactions more robustly, with transactions remaining active until explicitly finalized. A new PreparedTxnState class encapsulates the necessary producer ID and epoch for 2PC operations.
Consumer Rebalance Protocol and Metrics Improvements: The consumer rebalance protocol has seen several refinements, including clearer Javadoc for ConsumerRebalanceListener callbacks, explicit handling of rackId in GroupRebalanceConfig and ConsumerMembershipManager, and improved metric tracking for assigned partitions and background event processing. The FetchBuffer now uses a more robust awaitWakeup mechanism.

Ignored Files

Ignored by pattern: .github/workflows/** (13)
- .github/workflows/build.yml
- .github/workflows/ci-complete.yml
- .github/workflows/deflake.yml
- .github/workflows/docker_build_and_test.yml
- .github/workflows/docker_official_image_build_and_test.yml
- .github/workflows/docker_rc_release.yml
- .github/workflows/docker_scan.yml
- .github/workflows/generate-reports.yml
- .github/workflows/pr-labeled.yml
- .github/workflows/pr-linter.yml
- .github/workflows/pr-update.yml
- .github/workflows/prepare_docker_official_image_source.yml
- .github/workflows/workflow-requested.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant number of changes, including a new data loss detection feature for consumers, two-phase commit support for producers, and numerous dependency updates and refactorings across the codebase. My review focuses on potential issues related to these changes, including API design, build configuration, and thread safety. Overall, the changes are extensive and touch many parts of the system. I've provided specific feedback on a few areas that could be improved or clarified.

gemini-code-assist · 2025-10-15T09:05:59Z

clients/src/main/java/org/apache/kafka/clients/admin/Admin.java

     * @return The DeleteShareGroupOffsetsResult.
     */
-    DeleteShareGroupOffsetsResult deleteShareGroupOffsets(String groupId, Set<TopicPartition> partitions, DeleteShareGroupOffsetsOptions options);
+    DeleteShareGroupOffsetsResult deleteShareGroupOffsets(String groupId, Set<String> topics, DeleteShareGroupOffsetsOptions options);


The deleteShareGroupOffsets method signature has been changed to accept a Set<String> of topic names instead of a Set<TopicPartition>. This is a breaking API change that removes the ability for users to delete offsets for specific partitions, forcing them to delete offsets for all partitions of a topic. This seems like a significant reduction in functionality. If this was intentional, the rationale should be clearly documented. If not, the previous partition-level granularity should be restored.

gemini-code-assist · 2025-10-15T09:05:59Z

bin/kafka-run-class.sh

 # MaxInlineLevel=15 is the default since JDK 14 and can be removed once older JDKs are no longer supported
 if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then
-  KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true"
+  KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15"


The -Djava.awt.headless=true property has been removed from KAFKA_JVM_PERFORMANCE_OPTS. While many server environments default to headless mode, explicitly setting this property is a safer practice to prevent unexpected issues with AWT initialization on systems that might have a graphical environment. Was this removal intentional? If so, it would be helpful to understand the reasoning. A similar change was also made in bin/windows/kafka-run-class.bat.

gemini-code-assist · 2025-10-15T09:06:00Z

build.gradle

+    compileOnly libs.bndlib
+    compileOnly libs.spotbugs


The :server project now includes compileOnly dependencies on bndlib and spotbugs. This is an unusual dependency configuration, as bndlib is for OSGi bundles and spotbugs is a static analysis tool. Could you provide some context on why these are necessary as compileOnly dependencies for the server module? Understanding the purpose will help maintainers with future build file updates.

apoorvmittal10 and others added 30 commits August 1, 2025 23:20

KAFKA-18607: Update jfreechart dependency (apache#20264)

e67c042

This patch updates the code and the dependency with the latest namespace and version. Signed-off-by: Federico Valeri <[email protected]> Reviewers: Mickael Maison <[email protected]>

MINOR: Upgrade Spotbugs to 4.9.1 (apache#20294)

cb9e7fe

Add exclusions for new warnings to allow this upgrade. Reviewers: Chia-Ping Tsai <[email protected]>

MINOR: Cleanup RaftClient#upgradeKRaftVersion java document (apache#2…

a12d38f

…0275) The compiler warning is due to a lack of import. This patch imports the ApiException to fix it. Reviewers: TengYao Chi <[email protected]>, Yung <[email protected]>

MINOR: add missing section to TOC (apache#20305)

4a6a546

Add new group coordinator metrics section to TOC. Reviewers: Chia-Ping Tsai <[email protected]>

MINOR: cleanup in QuotaFactory (apache#20312)

71442bf

cleanup in QuotaFactory. Reviewers: Chia-Ping Tsai <[email protected]>

KAFKA-18265: Move persister call outside of the lock (3/N) (apache#20316

ddab943

) Minor PR to move persister call outside of the lock. The lock is not required while making the persister call. Reviewers: Manikumar Reddy <[email protected]>, Abhinav Dixit <[email protected]>

MINOR: Remove SPAM URL in Streams Documentation (apache#20321)

8deb6c6

The previous URL http://lambda-architecture.net/ seems to now be controlled by spammers Co-authored-by: Shashank <[email protected]> Reviewers: Mickael Maison <[email protected]>

DL1231 and others added 8 commits September 29, 2025 08:26

MINOR: Moving quota test to core directory (apache#20582)

7426629

## Summary Quota test isn't testing anything on the client side, but rather enforcing server-side quotas, so moving it out of the clients directory into the core directory. Reviewers: Lianet Magrans <[email protected]>

KAFKA-19721: Update streams documentation with KIP-1147 changes (apac…

0ddc69d

…he#20606) Update KIP-1147 changes (renaming --property to --formatter-property) in the ops and streams documentation. Reviewers: Andrew Schofield <[email protected]>

MINOR: Typo in ListShareGroupOffsetsResults javadoc (apache#20623)

7f65b1f

Fixed a tiny javadoc typo. Reviewers: Apoorv Mittal <[email protected]>

Offset handling, data loss, fault tolerance added

b372f21

github-actions bot added clients connect mirror-maker-2 tools build core storage streams generator transactions kip-932 docker performance consumer producer kraft dependencies tiered-storage labels Oct 15, 2025

gemini-code-assist bot reviewed Oct 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Offset handling, data loss, fault tolerance added #48

Offset handling, data loss, fault tolerance added #48

Uh oh!

visz11 commented Oct 15, 2025

Uh oh!

codeant-ai bot commented Oct 15, 2025

Uh oh!

coderabbitai bot commented Oct 15, 2025

Review skipped

Uh oh!

gemini-code-assist bot commented Oct 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 15, 2025

Uh oh!

gemini-code-assist bot Oct 15, 2025

Uh oh!

gemini-code-assist bot Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

78 participants

Offset handling, data loss, fault tolerance added #48

Are you sure you want to change the base?

Offset handling, data loss, fault tolerance added #48

Uh oh!

Conversation

visz11 commented Oct 15, 2025

Uh oh!

codeant-ai bot commented Oct 15, 2025

Thanks for using CodeAnt! 🎉

Uh oh!

coderabbitai bot commented Oct 15, 2025

Review skipped

Uh oh!

gemini-code-assist bot commented Oct 15, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

78 participants