Skip to content

Conversation

@yashmayya
Copy link
Contributor

@yashmayya yashmayya commented Jan 15, 2026

@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 83.61582% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.18%. Comparing base (7a96b42) to head (dae7ed4).
⚠️ Report is 57 commits behind head on master.

Files with missing lines Patch % Lines
...ntroller/helix/core/rebalance/TableRebalancer.java 64.70% 8 Missing and 4 partials ⚠️
...ntroller/helix/core/PinotHelixResourceManager.java 75.60% 6 Missing and 4 partials ⚠️
...not/common/assignment/InstancePartitionsUtils.java 91.11% 2 Missing and 2 partials ⚠️
...ker/routing/instanceselector/InstanceSelector.java 0.00% 1 Missing ⚠️
...ing/instanceselector/SegmentInstanceCandidate.java 75.00% 1 Missing ⚠️
.../core/realtime/PinotLLCRealtimeSegmentManager.java 95.65% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17515      +/-   ##
============================================
- Coverage     63.25%   63.18%   -0.07%     
- Complexity     1477     1479       +2     
============================================
  Files          3170     3173       +3     
  Lines        189469   190072     +603     
  Branches      28988    29090     +102     
============================================
+ Hits         119840   120099     +259     
- Misses        60339    60632     +293     
- Partials       9290     9341      +51     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.15% <83.61%> (-0.06%) ⬇️
java-21 63.14% <83.61%> (-0.05%) ⬇️
temurin 63.18% <83.61%> (-0.07%) ⬇️
unittests 63.18% <83.61%> (-0.07%) ⬇️
unittests1 55.54% <87.50%> (-0.04%) ⬇️
unittests2 34.07% <79.09%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya force-pushed the ideal-state-instance-partitions branch 2 times, most recently from 2065108 to b95aee0 Compare January 15, 2026 18:56
@yashmayya yashmayya force-pushed the ideal-state-instance-partitions branch from b95aee0 to 5117d78 Compare January 16, 2026 01:29
@yashmayya yashmayya force-pushed the ideal-state-instance-partitions branch from 820e6a0 to e642147 Compare January 23, 2026 01:18

public TableRebalancer(HelixManager helixManager) {
this(helixManager, null, null, null, null, null);
this(helixManager, null, null, null, null, null, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be true or false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path is mainly used during testing, so I think it makes sense to keep it enabled to help catch any regressions?

"Cannot rebalance disabled table without downtime", null, null, null, null, null);
}

// Wipe out ideal state instance partitions metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't wipe it until a rebalance is indeed required.
E.g. when segmentAssignmentUnchanged, we should check if instance partitions changed, then modify accordingly.
If we wipe it here, and following part throws exception, we might end up with an IS without instance partitions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wipe it here, and following part throws exception, we might end up with an IS without instance partitions

That would cause the rebalance to fail, in which case it will be retried anyway right?

E.g. when segmentAssignmentUnchanged, we should check if instance partitions changed, then modify accordingly.

We're already updating ideal state instance partitions when segmentAssignmentUnchanged but instance partitions changed. But good point about segmentAssignmentUnchanged - when there's no instance partitions change, I think that would've wiped out ideal state instance partitions. I've updated the logic.

Map<String, List<String>> idealStateListFields = currentIdealState.getRecord().getListFields();
InstancePartitionsUtils.replaceInstancePartitionsInIdealState(currentIdealState, instancePartitionsList);

return HelixHelper.updateIdealState(_helixManager, tableNameWithType, is -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't perform retry here. The update needs to be version checked update to ensure consistency of IS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should wipe the IP with the first IS change, and restore it with the last IS change. Replacing IP as separate step can cause inconsistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's three callers for this method:

  1. When segmentAssignmentUnchanged but instance partitions is changed, this method is called to update ideal state instance partitions just before completing the rebalance. I think this one is safe.
  2. After the end of the rebalance when the current assignment matches the target assignment. I think this one is safe too?
  3. Before starting the actual rebalance - I guess this is the one you're concerned about?

I've updated the wipe out logic to be performed alongside IS change, but the restoration part at the end is still separate because it looks like there could be cases where the assignment reaches the target assignment outside of the rebalance initiated IS updates.

if (_updateIdealStateInstancePartitions) {
// Rebalance completed successfully, so we can update the instance partitions in the ideal state to reflect
// the new set of instance partitions.
List<InstancePartitions> instancePartitionsList = new ArrayList<>(instancePartitionsMap.values());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the order of this list deterministic, so that we can check if it is identical to the existing one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing one should've been wiped out before this point though.

Integer replicaGroup = Integer.parseInt(key.substring(separatorIndex + 1));
listFields.getValue().forEach(value -> {
if (serverToReplicaGroupMap.containsKey(value)) {
LOGGER.warn("Server {} assigned to multiple replica groups ({}, {})", value, replicaGroup,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that one server is assigned to multiple replicas? If so, will this break routing?
Should we consider throwing exception and fall back when this happens?

for (InstancePartitions instancePartitions : instancePartitionsMap.values()) {
if (!instancePartitions.equals(
idealStateInstancePartitions.get(instancePartitions.getInstancePartitionsName()))) {
LOGGER.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a table level gauge to reflect if IP is wiped for IP enabled table


// Assign instances
assignInstances(tableConfig, true);
assignInstances(tableConfig, idealState, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we revert the changes for instance assignment?
We should modify IS when assigning segment, not instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I can revert it for the update table path (when override is false) which only does instance assignment for new instance partitions (and rely on table rebalance to update like in other paths). But for new tables this is required, since they wouldn't have this ideal state instance partitions metadata otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants