Skip to content

Conversation

@xiangfu0
Copy link
Contributor

  • pinot-kafka-base: Add stream.kafka.partition.ids config and KafkaPartitionSubsetUtils parser
  • pinot-kafka-2.0/3.0: Override fetchPartitionCount, fetchPartitionIds, computePartitionGroupMetadata to respect partition subset; validate configured IDs against topic
  • Add unit tests for subset parsing and metadata provider (KafkaPartitionSubsetUtilsTest, KafkaPartitionLevelConsumerTest)
  • Add stream subset example (subsetPartitions) and Kafka 2.0 README docs
  • QuickStart: Add fineFoodReviews-part-0 and fineFoodReviews-part-1 realtime tables, each consuming one partition of fineFoodReviews topic

@xiangfu0 xiangfu0 requested a review from Copilot January 27, 2026 16:30
@xiangfu0 xiangfu0 added feature release-notes Referenced by PRs that need attention when compiling the next release notes kafka ingestion labels Jan 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for configuring Kafka ingestion to consume only a subset of topic partitions via the stream.kafka.partition.ids configuration property. This enables multiple tables to share a single Kafka topic by consuming different partitions.

Changes:

  • Added stream.kafka.partition.ids configuration property and parsing utilities
  • Modified Kafka metadata providers (2.0 and 3.0) to validate and respect partition subsets
  • Updated instance assignment logic to support non-contiguous partition IDs
  • Added comprehensive unit tests and example configurations

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pinot-kafka-base/KafkaStreamConfigProperties.java Defines new PARTITION_IDS constant for subset configuration
pinot-kafka-base/KafkaPartitionSubsetUtils.java Implements parsing, validation, and deduplication of partition ID lists
pinot-kafka-base/KafkaPartitionSubsetUtilsTest.java Comprehensive unit tests for partition ID parsing
pinot-kafka-base/KafkaPartitionLevelStreamConfig.java Exposes stream config map for partition subset utilities
pinot-kafka-2.0/KafkaStreamMetadataProvider.java Overrides partition methods to validate and return subset partitions
pinot-kafka-3.0/KafkaStreamMetadataProvider.java Mirrors 2.0 implementation for Kafka 3.0 compatibility
pinot-kafka-2.0/KafkaPartitionLevelConsumerTest.java Tests subset validation and partition count/ID fetching
pinot-kafka-2.0/README.md Documents the partition subset feature
InstanceReplicaGroupPartitionSelector.java Supports explicit partition IDs in instance assignment
ImplicitRealtimeTablePartitionSelector.java Fetches and uses stream partition IDs for instance assignment
RealtimeSegmentAssignment.java Updates segment assignment to handle non-contiguous partition IDs
InstanceAssignmentTest.java Tests single-partition subset with non-zero ID
QuickStartBase.java Adds fineFoodReviews-part-0 and fineFoodReviews-part-1 examples
examples/stream/subsetPartitions/* Example configuration and documentation
examples/stream/fineFoodReviews-part-/ Demo tables consuming single partitions

@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2026

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
9336 2 9334 32
View the top 2 failed test(s) by shortest run time
org.apache.pinot.controller.helix.core.assignment.segment.RealtimeReplicaGroupSegmentAssignmentTest::testExplicitPartition
Stack Traces | 0s run time
No instances for partition 3 in CONSUMING instance partitions (table: testTable_REALTIME). Check that the stream partition subset configuration (e.g. 'stream.kafka.partition.ids') matches the instance partition selection in the table configuration.
org.apache.pinot.controller.helix.core.assignment.segment.RealtimeReplicaGroupSegmentAssignmentTest::testExplicitPartition
Stack Traces | 0s run time
No instances for partition 3 in CONSUMING instance partitions (table: testTable_REALTIME). Check that the stream partition subset configuration (e.g. 'stream.kafka.partition.ids') matches the instance partition selection in the table configuration.

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 2 times, most recently from 82ba313 to ab92eca Compare January 28, 2026 04:00
- pinot-kafka-base: Add stream.kafka.partition.ids config and KafkaPartitionSubsetUtils parser
- pinot-kafka-2.0/3.0: Override fetchPartitionCount, fetchPartitionIds, computePartitionGroupMetadata to respect partition subset; validate configured IDs against topic
- Add unit tests for subset parsing and metadata provider (KafkaPartitionSubsetUtilsTest, KafkaPartitionLevelConsumerTest)
- Add stream subset example (subsetPartitions) and Kafka 2.0 README docs
- QuickStart: Add fineFoodReviews-part-0 and fineFoodReviews-part-1 realtime tables, each consuming one partition of fineFoodReviews topic
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from ab92eca to 3760d06 Compare January 28, 2026 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature ingestion kafka release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants