Skip to content

Conversation

@visz11
Copy link
Collaborator

@visz11 visz11 commented Oct 15, 2025

This enhancement adds configurable data loss detection to Kafka consumers to prevent silent data loss scenarios. The feature integrates with existing auto.offset.reset strategies to provide enhanced fault tolerance capabilities, including detection of silent data loss from retention policies, service disruptions from topic recreation, and sophisticated edge case handling.

apoorvmittal10 and others added 30 commits August 1, 2025 23:20
…ment (apache#20286)

The PR fixes following:

1. In case share partition arrive at a state which should be treated as
final state
of that batch/offset (example - LSO movement which causes offset/batch
to be ARCHIVED permanently), the result of pending write state RPCs for
that offset/batch override the ARCHIVED state. Hence track such updates
and apply when transition is completed.

2. If an acquisition lock timeout occurs while an offset/batch is
undergoing transition followed by write state RPC failure, then
respective batch/offset can
land in a scenario where the offset stays in ACQUIRED state with no
acquisition lock timeout task.

3. If a timer task is cancelled, but due to concurrent execution of
timer task and acknowledgement, there can be a scenario when timer task
has processed post cancellation. Hence it can mark the offset/batch
re-avaialble despite already acknowledged.

Reviewers: Andrew Schofield <[email protected]>, Abhinav Dixit
 <[email protected]>
…validation (apache#20177)

The bug was a duplicate parameter validation in the
`DelegationTokenCommand` class.  The `checkInvalidArgs` method for the
`describeOpt` was incorrectly including `renewTimePeriodOpt` twice in
the set of invalid arguments.

This bug caused unexpected command errors during E2E testing.

### Before the fix:
The following command would fail due to the duplicate validation logic:
```
TC_PATHS="tests/kafkatest/tests/core/delegation_token_test.py::DelegationTokenTest"
/bin/bash tests/docker/run_tests.sh
```

### Error output:
```
ducktape.cluster.remoteaccount.RemoteCommandError: ducker@ducker03:
Command
'KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/jaas.conf
-Djava.security.krb5.conf=/mnt/security/krb5.conf"
/opt/kafka-dev/bin/kafka-delegation-tokens.sh --bootstrap-server
ducker03:9094  --create  --max-life-time-period -1  --command-config
/mnt/kafka/client.properties > /mnt/kafka/delegation_token.out' returned
non-zero exit status 1. Remote error message: b'duplicate element:
[renew-time-period]\njava.lang.IllegalArgumentException: duplicate
element: [renew-time-period]\n\tat
java.base/java.util.ImmutableCollections$SetN.<init>(ImmutableCollections.java:918)\n\tat
java.base/java.util.Set.of(Set.java:544)\n\tat
org.apache.kafka.tools.DelegationTokenCommand$DelegationTokenCommandOptions.checkArgs(DelegationTokenCommand.java:304)\n\tat
org.apache.kafka.tools.DelegationTokenCommand.execute(DelegationTokenCommand.java:79)\n\tat
org.apache.kafka.tools.DelegationTokenCommand.mainNoExit(DelegationTokenCommand.java:57)\n\tat
org.apache.kafka.tools.DelegationTokenCommand.main(DelegationTokenCommand.java:52)\n\n'

[INFO:2025-07-31 11:27:25,531]: RunnerClient:
kafkatest.tests.core.delegation_token_test.DelegationTokenTest.test_delegation_token_lifecycle.metadata_quorum=ISOLATED_KRAFT:
Data: None
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-07-31--002
run time:         33.213 seconds
tests run:        1
passed:           0
flaky:            0
failed:           1
ignored:          0
================================================================================
test_id:
kafkatest.tests.core.delegation_token_test.DelegationTokenTest.test_delegation_token_lifecycle.metadata_quorum=ISOLATED_KRAFT
status:     FAIL
run time:   33.090 seconds
```

### After the fix:
The same command now executes successfully:
```
TC_PATHS="tests/kafkatest/tests/core/delegation_token_test.py::DelegationTokenTest"
/bin/bash tests/docker/run_tests.sh
```

### Success output:
```
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-07-31--001
run time:         35.488 seconds
tests run:        1
passed:           1
flaky:            0
failed:           0
ignored:          0
================================================================================
test_id:
kafkatest.tests.core.delegation_token_test.DelegationTokenTest.test_delegation_token_lifecycle.metadata_quorum=ISOLATED_KRAFT
status:     PASS
run time:   35.363 seconds
--------------------------------------------------------------------------------
```

Reviewers: Jhen-Yung Hsu <[email protected]>, TengYao Chi
 <[email protected]>, Ken Huang <[email protected]>, PoAn Yang
 <[email protected]>, Chia-Ping Tsai <[email protected]>
…culation (apache#18526)

The timeout value may be overflowed if users set a large expiration
time.

```
sessionExpirationTimeNanos = authenticationEndNanos + 1000 * 1000 *
sessionLifetimeMs;
```

Fixed it by throwing exception if the value is overflowed.

Reviewers: TaiJuWu <[email protected]>, Luke Chen <[email protected]>,
 TengYao Chi <[email protected]>

Signed-off-by: PoAn Yang <[email protected]>
This patch updates the code and the dependency with the latest namespace
and version.

Signed-off-by: Federico Valeri <[email protected]>

Reviewers: Mickael Maison <[email protected]>
## Changes:
- Replaced all references to boundPort with brokerBoundPort.

## Reasons
- boundPort and brokerBoundPort share the same definition and behavior.

Reviewers: TaiJuWu <[email protected]>, Ken Huang <[email protected]>,
 Chia-Ping Tsai <[email protected]>
apache#20263)

OffsetFetchResponses can have three different error structures depending
on the version. Version 2 adds a top level error code for group-level
errors. Version 8 adds support for querying multiple groups at a time
and nests the fields within a groups array. Add a test for the
errorCounts implementation since it varies depending on the version.

Reviewers: Dongnuo Lyu <[email protected]>, Chia-Ping Tsai
 <[email protected]>
Along with the change: apache#17952

([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)),
the semantics of `min.insync.replicas` config has small change, and add
some constraints. We should document them clearly.

Reviewers: Jun Rao <[email protected]>, Calvin Liu <[email protected]>,
 Mickael Maison <[email protected]>, Paolo Patierno
 <[email protected]>, Federico Valeri <[email protected]>, Chia-Ping
 Tsai <[email protected]>
Add exclusions for new warnings to allow this upgrade.

Reviewers: Chia-Ping Tsai <[email protected]>
This flag also skips control records, so the description needs to be
updated.

---------

Signed-off-by: Federico Valeri <[email protected]>

Reviewers: Luke Chen <[email protected]>, Jhen-Yung Hsu <[email protected]>, Vincent Potucek
…e#20269)

The `state-change.log` file is being incorrectly rotated to
`stage-change.log.[date]`. This change fixes the typo to have the log
file correctly rotated to `state-change.log.[date]`

_No functional changes._

Reviewers: Mickael Maison <[email protected]>, Christo Lolov
 <[email protected]>, Luke Chen <[email protected]>, Ken Huang
 <[email protected]>, TengYao Chi <[email protected]>, Chia-Ping Tsai
 <[email protected]>
Add missing metrics for document tiered storage

-

kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs:Introduced
in

[KIP-1018](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests)

-

kafka.server:type=DelayedRemoteListOffsetsMetrics,name=ExpiresPerSec,topic=([-.\w]),partition=([0-9]):Introduced
in

[KIP-1075](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1075%3A+Introduce+delayed+remote+list+offsets+purgatory+to+make+LIST_OFFSETS+async)

Reviewers: Chia-Ping Tsai <[email protected]>, Lan Ding
 <[email protected]>, Kamal Chandraprakash
 <[email protected]>
When using a connector that requires a schema, such as JDBC connectors,
with JSON messages, the current JSONConverter necessitates including the
schema within every message. To address this, we are introducing a new
parameter, schema.content, which allows you to provide the schema
externally. This approach not only reduces the size of the messages but
also facilitates the use of more complex schemas.

KIP :
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1054%3A+Support+external+schemas+in+JSONConverter

Reviewers: Mickael Maison <[email protected]>, TengYao Chi <[email protected]>, Edoardo Comar <[email protected]>
…0275)

The compiler warning is due to a lack of import. This patch imports the ApiException to fix it.

Reviewers: TengYao Chi <[email protected]>, Yung
<[email protected]>
Now that Kafka support Java 17, this PR makes some changes in tools
module. The changes in this PR are limited to only some files. A future
PR(s) shall follow. The changes mostly include:

- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()

Some minor changes to use the enhanced switch.

Sub modules targeted: tools/src/test

Reviewers: Chia-Ping Tsai <[email protected]>
This commit updates CI to test against Java 24 instead of Java 23 which
is EOL.

Due to Spotbugs not having released version 4.9.4 yet, we can't run
Spotbugs on Java 24. Instead, we are choosing to run Spotbugs, and the
rest of the compile and validate build step, on Java 17 for now.

Once 4.9.4 has released, we will switch to using Java 24 for this.

Exclude spotbugs from the run-tests gradle action. Spotbugs is already
being run once in the build by "compile and validate", there is no
reason  to run it again as part of executing tests.

Reviewers: Chia-Ping Tsai <[email protected]>
Add new group coordinator metrics section to TOC.

Reviewers: Chia-Ping Tsai <[email protected]>
cleanup in QuotaFactory.

Reviewers: Chia-Ping Tsai <[email protected]>
….html (apache#20313)

The link for the heading Errant Record Reporter is missing the # symbol,
which is causing it to redirect to a 404 Not Found page.  Please refer
to the updated preview.  <img width="665" height="396"
alt="kafka-site-preview"

src="https://github.com/user-attachments/assets/1c6f3ea9-de9b-4b2c-a4d6-919199a6ff6f"
/>

Reviewers: PoAn Yang <[email protected]>, Ken Huang
 <[email protected]>, Chia-Ping Tsai <[email protected]>
Implements a timeout mechanism (using maxPollTimeMs) that waits for
missing source topics to be created before failing, instead of
immediately  throwing exceptions in the new Streams protocol.
Additionally, throw  TopologyException when partition count mismatch is
detected.

Reviewers: Lucas Brutschy <[email protected]>, Alieh Saeedi
 <[email protected]>, Matthias J. Sax <[email protected]>
…#20299)

The default value of `num.recovery.threads.per.data.dir` is now 2
according to KIP-1030. We should update config files which are still
setting 1.

---------

Signed-off-by: Federico Valeri <[email protected]>

Reviewers: Luke Chen <[email protected]>
…in ProducerConfig (apache#20317)

Fixes a typo in ProducerConfig:  Renames
`PARTITIONER_ADPATIVE_PARTITIONING_ENABLE_CONFIG` →
`PARTITIONER_ADAPTIVE_PARTITIONING_ENABLE_CONFIG`

The old key is retained for backward compatibility.

See: [KIP-1175: Fix the typo `PARTITIONER_ADPATIVE_PARTITIONING_ENABLE`
in ProducerConfig](https://cwiki.apache.org/confluence/x/KYogFQ)

Reviewers: Yung <[email protected]>, TengYao Chi
<[email protected]>, Ken Huang <[email protected]>, Nick Guo
<[email protected]>, Ranuga Disansa <[email protected]>
…ache#20315)

The PR removes unnecessary updates for find next fetch offset. When the
state is in transition and not yet completed then anyways respective
offsets should not be considered for acquisition. The find next fetch
offset is updated finally when transition is completed.

Reviewers: Manikumar Reddy <[email protected]>, Abhinav Dixit
<[email protected]>
)

Minor PR to move persister call outside of the lock. The lock is not
required while making the persister call.

Reviewers: Manikumar Reddy <[email protected]>, Abhinav Dixit <[email protected]>
Now that Kafka support Java 17, this PR makes some changes in `trogdor`
module. The changes mostly include:

- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()

Some minor cleanups around use of enhanced switch blocks and conversion
of classes to record classes.

Reviewers: Ken Huang <[email protected]>, Vincent Jiang
 <[email protected]>, Chia-Ping Tsai <[email protected]>
…timeout (apache#20310)

Fixing max delivery check on acquisition lock timeout and write state
RPC failure.

When acquisition lock is already timed out and write state RPC failure
occurs then we need to check if records need to be archived. However
with the fix we do not persist the information, which is relevant as
some records may be  archived or delivery count is bumped. The
information will be persisted eventually.

The persister call has failed already hence issuing another persister
call due to a failed persister call earlier is not correct. Rather let
the data persist in future persister calls.

Reviewers: Manikumar Reddy <[email protected]>, Abhinav Dixit
 <[email protected]>
…_ENABLE_CONFIG (apache#20322)

Document deprecation of PARTITIONER_ADPATIVE_PARTITIONING_ENABLE_CONFIG
in `upgrade.html`, which was missed in apache#20317

Reviewers: Ken Huang <[email protected]>, Chia-Ping Tsai
 <[email protected]>
…ose (apache#20290)

*What*
https://issues.apache.org/jira/browse/KAFKA-19572

- If a `ShareConsumer` constructor failed due to any exception, then we
call `close()` in the catch block.

- If there were uninitialized members accessed during `close()`, then it
would throw a NPE. Currently there are no null checks, hence we were
attempting to use these fields during `close()` execution.

- To avoid this, PR adds null checks in the `close()` function before we
access fields which possibly could be null.

Reviewers: Apoorv Mittal <[email protected]>, Lianet Magrans
 <[email protected]>
Improve RLMM doc:
1. Distinguish RLMM configs from other tiered storage configs, all RLMM
configs need to start with a specific prefix, but the original
documentation miss description.
2. Added description of additional configs for client, which is required
when configuring authentication information. This can confuse users, for
example: Aiven-Open/tiered-storage-for-apache-kafka#681

Reviewers: Luke Chen <[email protected]>, TengYao Chi
<[email protected]>, Chia-Ping Tsai <[email protected]>
The previous URL http://lambda-architecture.net/ seems to now be controlled by spammers

Co-authored-by: Shashank <[email protected]>
Reviewers: Mickael Maison <[email protected]>
DL1231 and others added 8 commits September 29, 2025 08:26
… user commit requests (apache#20234)

Repartition topic records should be purged up to the currently committed
offset once `repartition.purge.interval.ms` duration has passed.

Reviewers: Matthias J. Sax <[email protected]>
Partially addresses KAFKA-15873. When filtering and sorting, we should
be applying the filter before the sort of topics. Order that
unauthorizedForDescribeTopicMetadata is added to not relevant as it is a
HashSet.

Reviewers: TaiJuWu <[email protected]>, Calvin Liu
 <[email protected]>, Matthias J. Sax <[email protected]>
…nd RemoveRaftVoterOptions (apache#20555)

Improves the documentation of the clusterId field in AddRaftVoterOptions
and RemoveRaftVoterOptions.

The changes include:
1. Adding Javadoc to both addRaftVoter and removeRaftVoter methods to
explain the behavior of the optional clusterId.
2. Integration tests have been added to verify the correct behavior of
add and remove voter operations with and without clusterId, including
scenarios with inconsistent cluster ids.

Reviewers: TengYao Chi <[email protected]>, Chia-Ping Tsai
 <[email protected]>
## Summary

Quota test isn't testing anything on the client side, but rather
enforcing server-side quotas, so moving it out of the clients directory
into the core directory.

Reviewers: Lianet Magrans <[email protected]>
…he#20606)

Update KIP-1147 changes (renaming --property to --formatter-property) in
the ops and streams documentation.

Reviewers: Andrew Schofield <[email protected]>
…tagged fields (apache#20536)

# Description
The [protocol guide](https://kafka.apache.org/protocol) 1) doesn't
display
tagged fields in BNF, and 2) includes empty tagged fields and redundant
nested tables in tables.

# Change
## BNF
Now tagged fields are displayed as FIELD_NAME<tag number>

Old:  <img width="1316" height="275" alt="Screenshot 2025-09-13 at 5 34
28 PM"

src="https://github.com/user-attachments/assets/c3e59382-7a6b-43f3-bc7a-893fb27d524d"
/>

New:  <img width="1386" height="328" alt="Screenshot 2025-09-24 at 12 50
34 PM"

src="https://github.com/user-attachments/assets/1ddbc95e-b0a7-4cd5-a5e0-e1303ffd2d06"
/>

Array Field: <img width="914" height="275" alt="Screenshot 2025-09-24 at
12 52 19 PM"

src="https://github.com/user-attachments/assets/cfe66a21-0d66-4f23-8e5d-1d5dac8e4c9b"
/>

## Table
Empty tagged fields are removed from the table.

Nested table for tagged fie  Old: <img width="805" height="506"
alt="Screenshot 2025-09-28 at 11 07 01 PM"

src="https://github.com/user-attachments/assets/0669c2f3-150c-479d-b6ff-1d2857540fef"
/>  lds are removed. Tag of the field is shown in the "Field" column.

New:   <img width="1371" height="727" alt="Screenshot 2025-09-28 at 11
10 30 PM"

src="https://github.com/user-attachments/assets/030abde6-60ec-4195-9778-da48ebd01084"
/>

Reviewers: Andrew Schofield <[email protected]>
@codeant-ai
Copy link

codeant-ai bot commented Oct 15, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link

coderabbitai bot commented Oct 15, 2025

Important

Review skipped

More than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review.

85 files out of 300 files are above the max files limit of 100. Please upgrade to Pro plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clone-fault_tolerance_kafka

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @visz11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Kafka client ecosystem by introducing advanced data loss detection for consumers, bolstering transactional capabilities with two-phase commit support for producers, and updating numerous dependencies and build configurations. It also expands the integration test suite to cover a wider array of client behaviors and fault tolerance scenarios, ensuring a more robust and reliable messaging system.

Highlights

  • Dependency and Build System Updates: Numerous dependencies across the project have been updated, including various Jackson libraries, Log4j, Jetty, Jose4j, RocksDB, Scala, Snappy-Java, SnakeYAML, OpenTelemetry-Proto, JLine, RE2J, Classgraph, PCollections, and Jersey libraries. GitHub Actions workflows have been refined, with Java setup updated to v5, Python setup to v6, and Gradle build logic adjusted to exclude redundant SpotBugs tasks and incorporate a Mockito Java Agent workaround for Java 21+.
  • New Integration Tests and Refactorings: A substantial number of new integration tests have been introduced, covering diverse scenarios such as policy-based topic creation, metadata version upgrades, metrics during topic operations, consumer bounce behavior, legacy message format handling, and various consumer functionalities (assign, commit, fetch, poll, subscription). Existing test utilities have been refactored and consolidated into ClientsTestUtils.java for better reusability and clarity.
  • Consumer Data Loss Detection and Fault Tolerance: A new DataLossDetector class has been added to enhance Kafka consumer fault tolerance. This feature introduces configurable detection for silent data loss scenarios, including offset gaps, topic recreation, and out-of-range offsets, integrating with existing auto.offset.reset strategies to provide warnings or exceptions based on configuration.
  • Producer Transactional Enhancements: The producer API now supports two-phase commit (2PC) transactions with the introduction of prepareTransaction() and completeTransaction(PreparedTxnState) methods. This allows external transaction managers to coordinate Kafka transactions more robustly, with transactions remaining active until explicitly finalized. A new PreparedTxnState class encapsulates the necessary producer ID and epoch for 2PC operations.
  • Consumer Rebalance Protocol and Metrics Improvements: The consumer rebalance protocol has seen several refinements, including clearer Javadoc for ConsumerRebalanceListener callbacks, explicit handling of rackId in GroupRebalanceConfig and ConsumerMembershipManager, and improved metric tracking for assigned partitions and background event processing. The FetchBuffer now uses a more robust awaitWakeup mechanism.
Ignored Files
  • Ignored by pattern: .github/workflows/** (13)
    • .github/workflows/build.yml
    • .github/workflows/ci-complete.yml
    • .github/workflows/deflake.yml
    • .github/workflows/docker_build_and_test.yml
    • .github/workflows/docker_official_image_build_and_test.yml
    • .github/workflows/docker_rc_release.yml
    • .github/workflows/docker_scan.yml
    • .github/workflows/generate-reports.yml
    • .github/workflows/pr-labeled.yml
    • .github/workflows/pr-linter.yml
    • .github/workflows/pr-update.yml
    • .github/workflows/prepare_docker_official_image_source.yml
    • .github/workflows/workflow-requested.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant number of changes, including a new data loss detection feature for consumers, two-phase commit support for producers, and numerous dependency updates and refactorings across the codebase. My review focuses on potential issues related to these changes, including API design, build configuration, and thread safety. Overall, the changes are extensive and touch many parts of the system. I've provided specific feedback on a few areas that could be improved or clarified.

* @return The DeleteShareGroupOffsetsResult.
*/
DeleteShareGroupOffsetsResult deleteShareGroupOffsets(String groupId, Set<TopicPartition> partitions, DeleteShareGroupOffsetsOptions options);
DeleteShareGroupOffsetsResult deleteShareGroupOffsets(String groupId, Set<String> topics, DeleteShareGroupOffsetsOptions options);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The deleteShareGroupOffsets method signature has been changed to accept a Set<String> of topic names instead of a Set<TopicPartition>. This is a breaking API change that removes the ability for users to delete offsets for specific partitions, forcing them to delete offsets for all partitions of a topic. This seems like a significant reduction in functionality. If this was intentional, the rationale should be clearly documented. If not, the previous partition-level granularity should be restored.

# MaxInlineLevel=15 is the default since JDK 14 and can be removed once older JDKs are no longer supported
if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then
KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true"
KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The -Djava.awt.headless=true property has been removed from KAFKA_JVM_PERFORMANCE_OPTS. While many server environments default to headless mode, explicitly setting this property is a safer practice to prevent unexpected issues with AWT initialization on systems that might have a graphical environment. Was this removal intentional? If so, it would be helpful to understand the reasoning. A similar change was also made in bin/windows/kafka-run-class.bat.

Comment on lines +927 to +928
compileOnly libs.bndlib
compileOnly libs.spotbugs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The :server project now includes compileOnly dependencies on bndlib and spotbugs. This is an unusual dependency configuration, as bndlib is for OSGi bundles and spotbugs is a static analysis tool. Could you provide some context on why these are necessary as compileOnly dependencies for the server module? Understanding the purpose will help maintainers with future build file updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.