Skip to content

fix: score corruption caused by duplicate, implement randomIterator#2161

Open
Christopher-Chianelli wants to merge 3 commits intoTimefoldAI:mainfrom
Christopher-Chianelli:feat/8
Open

fix: score corruption caused by duplicate, implement randomIterator#2161
Christopher-Chianelli wants to merge 3 commits intoTimefoldAI:mainfrom
Christopher-Chianelli:feat/8

Conversation

@Christopher-Chianelli
Copy link
Contributor

  • Move UnfinishedJoiners to Joiners
  • Score corruption was caused when a duplicate was encounter, causing the forEach iterator to skip the remaining elements in its current downstream iterator and proceed to the next downstream iterator instead. It now exhausts the current downstream iterator beforce proceeding to the next one.
  • Due to the existance of duplicates, the implemented randomIterator is not completely fair; to create a fair random iterator, all the elements must be put into a set first.

Draft, since it probably need more tests, but the modified examples pass FULL_ASSERT.

Copy link
Collaborator

@triceo triceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some comments after first reading.

@triceo
Copy link
Collaborator

triceo commented Mar 6, 2026

One more thing that crossed my mind: the indexer for intersections is expensive, because it has to aggregate things from multiple collections and keep track of that. I'm wondering - would it make sense to create a specialized implementation for cases of only 1 collection? It's a question of whether this situation is likely to happen.

- Move UnfinishedJoiners to Joiners
- Score corruption was caused when a duplicate was encounter,
  causing the forEach iterator to skip the remaining elements
  in its current downstream iterator and proceed to the next
  downstream iterator instead. It now exhausts the current
  downstream iterator beforce proceeding to the next one.
- Due to the existance of duplicates, the implemented randomIterator
  is not completely fair; to create a fair random iterator, all
  the elements must be put into a set first.
Copy link
Collaborator

@triceo triceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after comments are addressed.
(There are some valid ones in Sonar as well.)

Comment on lines +307 to +314
this.downstreamIteratorSupplierList = new ArrayList<>(indexKeyCollection.size());
this.removedSet = new HashSet<>();
this.workingRandom = workingRandom;
this.downstreamIndexerIteratorFunction = downstreamIndexerIteratorFunction;
for (var indexKey : indexKeyCollection) {
this.downstreamIteratorSupplierList.add(new DownstreamIteratorSupplier(indexKey));
}
}
Copy link
Collaborator

@triceo triceo Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the main (and currently the only) use case of the random iterator, Neighborhoods.

Late Acceptance is a fast-stepping algorithm, therefore typically it will only select 1 move in a step. This means this iterator will be created once, its next() will be called once, and then it will be thrown away.

In this light, this constructor is still far too heavy. It creates one possibly large collection, and fills it entirely. It also instantiates another hash set. This is quite the overhead for something which will be happening all the time, and only be used once.

IMO this constructor needs to be as lazy as it can possibly be. This class needs to be optimized for the 1-time use path. And then possibly delegate to a different implementation which is optimized for the other path, where there are multiple selections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do realize SequencedCollection cannot be accessed by index nor any way of randomly accessing its elements? The only way to randomly select an element from a SequencedCollection is to store it inside a list or another RandomAccess structure. If we always picked the first element, it is no longer a randomIterator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this point of view, it matters not if it's a Collection or SequencedCollection; random access is not possible either way. Sequenced gives us more information without taking any away.

I'm probably missing something in your comment, because I don't see how it's relevant to this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In this light, this constructor is still far too heavy. It creates one possibly large collection, and fills it entirely. It also instantiates another hash set. This is quite the overhead for something which will be happening all the time, and only be used once."

i.e. You need to create one possible large collection and fill it regardless so you have random access to keys.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you don't need this one large collection. The logic is pretty much what it is now - pick one iterator, pick one random thing from it. If more values are needed, that's when the switch happens to the heavier logic.

That said, this may be a moot point once we address the comment on fair random selection and kill performance doing it.

@triceo triceo requested a review from Copilot March 10, 2026 06:29
@triceo triceo marked this pull request as ready for review March 10, 2026 06:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes duplicate-handling issues in Bavet “containing/contained-in/containing-any-of” indexers (preventing skipped elements that can corrupt scoring), adds randomIterator support for ContainingAnyOfIndexer, and promotes the previously “unfinished” joiners into the public Joiners API.

Changes:

  • Move containing, containedIn, and containingAnyOf joiners from UnfinishedJoiners into ai.timefold.solver.core.api.score.stream.Joiners and remove UnfinishedJoiners (including JPMS export cleanup).
  • Fix duplicate iteration behavior in ContainingAnyOfIndexer so encountering a duplicate does not prematurely advance to the next downstream iterator.
  • Implement and test randomIterator for containing/contained-in/containing-any-of indexers (including new shared test helpers).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/src/main/java/ai/timefold/solver/core/api/score/stream/Joiners.java Adds public API joiners for containing/contained-in/containing-any-of across Bi/Tri/Quad/Penta.
core/src/main/java/ai/timefold/solver/core/impl/score/stream/UnfinishedJoiners.java Removes the internal placeholder joiners class now that functionality is in Joiners.
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java Fixes duplicate iteration logic and implements randomIterator with duplicate-aware removal.
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingIndexer.java Updates documentation/imports to reference Joiners.
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainedInIndexer.java Updates documentation/imports to reference Joiners.
core/src/main/java/module-info.java Removes export of ai.timefold.solver.core.impl.score.stream package (now empty).
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/AbstractIndexerTest.java Adds utilities to exercise randomIterator deterministically (seeded) and helper tuple insertion methods.
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingIndexerTest.java Switches to Joiners and adds randomIterator test coverage for containing.
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexerTest.java Switches to Joiners, adds duplicate regression test, and adds randomIterator tests for containing-any-of.
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainedInIndexerTest.java Switches to Joiners and adds randomIterator tests for contained-in.

Comment on lines +334 to +337
while (!downstreamIteratorSupplierList.isEmpty()) {
var remainingIteratorCount = downstreamIteratorSupplierList.size();
var selectedIndex = workingRandom.nextInt(remainingIteratorCount);
currentIteratorSupplier = downstreamIteratorSupplierList.get(selectedIndex);
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomIterator selects a downstream iterator uniformly via workingRandom.nextInt(downstreamIteratorSupplierList.size()), which biases selection toward tuples in smaller downstream iterators and violates the Indexer.randomIterator uniform-probability contract, so pick downstream iterators proportionally to remaining element counts (or pre-flatten distinct elements before sampling when duplicates exist).

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random selection is hard. :-( I think there is some logic to this comment.

We cannot do this absolutely perfectly, the possibility of duplicates screws us here. But selecting the iterators proportionally makes sense; if something has 2x the size, it should be selected 2x as likely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't size() expensive (or expensive enough to justify removing it from the constructor)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expensive. But:

  • Maybe it needn't be. Perhaps the indexer can maintain a counter, instead of always calling size() on the downstreams. It knows when it inserts and when it removes. Maybe it has no way of knowing when it removes a duplicate?
  • Correctness over performance; Copilot is right here, we skew the selection probability towards the smaller indexers.

It sucks though, because if we can't make size() perform reasonably, this really will be slow.

Comment on lines +296 to +301
Iterator<T> iterator() {
if (cachedDownstreamIterator != null) {
return cachedDownstreamIterator;
}
cachedDownstreamIterator = downstreamIndexerIteratorFunction.apply(downstreamIndexerMap.get(key));
return cachedDownstreamIterator;
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DownstreamIteratorSupplier.iterator() calls downstreamIndexerIteratorFunction.apply(downstreamIndexerMap.get(key)) without handling missing keys, which will throw a NullPointerException when the query key collection contains a key that has no downstream indexer (or when downstreamIndexerMap is empty), so skip absent keys or return an empty iterator instead.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
69.8% Coverage on New Code (required ≥ 70%)

See analysis details on SonarQube Cloud

@Christopher-Chianelli
Copy link
Contributor Author

I have a suggestion: since Neighborhoods apparently require Joiners from NeighborhoodJoiners instead of Joiners, make NeighborhoodJoiners use SequencedCollection, and regular Joiners use Collection. Forcing SequencedCollection means users won't be able to use Set (which may be better for their domain model) even if they don't use neighborhoods.

@triceo
Copy link
Collaborator

triceo commented Mar 10, 2026

Forcing SequencedCollection means users won't be able to use Set (which may be better for their domain model) even if they don't use neighborhoods.

This is the bit I don't understand. SequencedSet extends SequencedCollection - so they can always use LinkedHashSet. Or TreeSet. Yes, they can no longer use HashSet; but most uses of HashSet around the solver are wrong anyway, because they introduce non-reproducibility. IMO nothing bad about forcing (teaching) this when we can now.

(For example: we recently found non-reproducibility in one of the examples, because Jackson was deserializing JSON into a Set and decided to use HashSet. The collection then ended up as a value range, and that caused non-determinism. Had we forced SequencedSet, Jackson would have no choice but to use LinkedHashSet and this tricky well-hidden bug would never have shown up.)

@Christopher-Chianelli
Copy link
Contributor Author

Forcing SequencedCollection means users won't be able to use Set (which may be better for their domain model) even if they don't use neighborhoods.

This is the bit I don't understand. SequencedSet extends SequencedCollection - so they can always use LinkedHashSet. Or TreeSet. Yes, they can no longer use HashSet; but most uses of HashSet around the solver are wrong anyway, because they introduce non-reproducibility. IMO nothing bad about forcing (teaching) this when we can now.

(For example: we recently found non-reproducibility in one of the examples, because Jackson was deserializing JSON into a Set and decided to use HashSet. The collection then ended up as a value range, and that caused non-determinism. Had we forced SequencedSet, Jackson would have no choice but to use LinkedHashSet and this tricky well-hidden bug would never have shown up.)

Set.of(...) returns a Set and not SequencedSet, and there no of method for SequencedSet.

@Christopher-Chianelli
Copy link
Contributor Author

(Java unfortantly makes using SequencedSet extremely annoying)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants