fix: score corruption caused by duplicate, implement randomIterator by Christopher-Chianelli · Pull Request #2161 · TimefoldAI/timefold-solver

Christopher-Chianelli · 2026-03-04T20:57:16Z

Move UnfinishedJoiners to Joiners
Score corruption was caused when a duplicate was encounter, causing the forEach iterator to skip the remaining elements in its current downstream iterator and proceed to the next downstream iterator instead. It now exhausts the current downstream iterator beforce proceeding to the next one.
Due to the existance of duplicates, the implemented randomIterator is not completely fair; to create a fair random iterator, all the elements must be put into a set first.

Draft, since it probably need more tests, but the modified examples pass FULL_ASSERT.

triceo

Adding some comments after first reading.

core/src/main/java/ai/timefold/solver/core/api/score/stream/Joiners.java

core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java

core/src/main/java/module-info.java

triceo · 2026-03-06T19:56:06Z

One more thing that crossed my mind: the indexer for intersections is expensive, because it has to aggregate things from multiple collections and keep track of that. I'm wondering - would it make sense to create a specialized implementation for cases of only 1 collection? It's a question of whether this situation is likely to happen.

- Move UnfinishedJoiners to Joiners - Score corruption was caused when a duplicate was encounter, causing the forEach iterator to skip the remaining elements in its current downstream iterator and proceed to the next downstream iterator instead. It now exhausts the current downstream iterator beforce proceeding to the next one. - Due to the existance of duplicates, the implemented randomIterator is not completely fair; to create a fair random iterator, all the elements must be put into a set first.

triceo

LGTM after comments are addressed.
(There are some valid ones in Sonar as well.)

core/src/main/java/ai/timefold/solver/core/api/score/stream/Joiners.java

triceo · 2026-03-10T06:22:32Z

core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java

+            this.downstreamIteratorSupplierList = new ArrayList<>(indexKeyCollection.size());
+            this.removedSet = new HashSet<>();
+            this.workingRandom = workingRandom;
+            this.downstreamIndexerIteratorFunction = downstreamIndexerIteratorFunction;
+            for (var indexKey : indexKeyCollection) {
+                this.downstreamIteratorSupplierList.add(new DownstreamIteratorSupplier(indexKey));
+            }
+        }


Consider the main (and currently the only) use case of the random iterator, Neighborhoods.

Late Acceptance is a fast-stepping algorithm, therefore typically it will only select 1 move in a step. This means this iterator will be created once, its next() will be called once, and then it will be thrown away.

In this light, this constructor is still far too heavy. It creates one possibly large collection, and fills it entirely. It also instantiates another hash set. This is quite the overhead for something which will be happening all the time, and only be used once.

IMO this constructor needs to be as lazy as it can possibly be. This class needs to be optimized for the 1-time use path. And then possibly delegate to a different implementation which is optimized for the other path, where there are multiple selections.

You do realize SequencedCollection cannot be accessed by index nor any way of randomly accessing its elements? The only way to randomly select an element from a SequencedCollection is to store it inside a list or another RandomAccess structure. If we always picked the first element, it is no longer a randomIterator.

From this point of view, it matters not if it's a Collection or SequencedCollection; random access is not possible either way. Sequenced gives us more information without taking any away.

I'm probably missing something in your comment, because I don't see how it's relevant to this code.

"In this light, this constructor is still far too heavy. It creates one possibly large collection, and fills it entirely. It also instantiates another hash set. This is quite the overhead for something which will be happening all the time, and only be used once."

i.e. You need to create one possible large collection and fill it regardless so you have random access to keys.

But you don't need this one large collection. The logic is pretty much what it is now - pick one iterator, pick one random thing from it. If more values are needed, that's when the switch happens to the heavier logic.

That said, this may be a moot point once we address the comment on fair random selection and kill performance doing it.

core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java

Copilot

Pull request overview

This PR fixes duplicate-handling issues in Bavet “containing/contained-in/containing-any-of” indexers (preventing skipped elements that can corrupt scoring), adds randomIterator support for ContainingAnyOfIndexer, and promotes the previously “unfinished” joiners into the public Joiners API.

Changes:

Move containing, containedIn, and containingAnyOf joiners from UnfinishedJoiners into ai.timefold.solver.core.api.score.stream.Joiners and remove UnfinishedJoiners (including JPMS export cleanup).
Fix duplicate iteration behavior in ContainingAnyOfIndexer so encountering a duplicate does not prematurely advance to the next downstream iterator.
Implement and test randomIterator for containing/contained-in/containing-any-of indexers (including new shared test helpers).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
core/src/main/java/ai/timefold/solver/core/api/score/stream/Joiners.java	Adds public API joiners for containing/contained-in/containing-any-of across Bi/Tri/Quad/Penta.
core/src/main/java/ai/timefold/solver/core/impl/score/stream/UnfinishedJoiners.java	Removes the internal placeholder joiners class now that functionality is in `Joiners`.
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java	Fixes duplicate iteration logic and implements `randomIterator` with duplicate-aware removal.
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingIndexer.java	Updates documentation/imports to reference `Joiners`.
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainedInIndexer.java	Updates documentation/imports to reference `Joiners`.
core/src/main/java/module-info.java	Removes export of `ai.timefold.solver.core.impl.score.stream` package (now empty).
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/AbstractIndexerTest.java	Adds utilities to exercise `randomIterator` deterministically (seeded) and helper tuple insertion methods.
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingIndexerTest.java	Switches to `Joiners` and adds `randomIterator` test coverage for containing.
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexerTest.java	Switches to `Joiners`, adds duplicate regression test, and adds `randomIterator` tests for containing-any-of.
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainedInIndexerTest.java	Switches to `Joiners` and adds `randomIterator` tests for contained-in.

Copilot · 2026-03-10T06:34:38Z

core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java

+            while (!downstreamIteratorSupplierList.isEmpty()) {
+                var remainingIteratorCount = downstreamIteratorSupplierList.size();
+                var selectedIndex = workingRandom.nextInt(remainingIteratorCount);
+                currentIteratorSupplier = downstreamIteratorSupplierList.get(selectedIndex);


RandomIterator selects a downstream iterator uniformly via workingRandom.nextInt(downstreamIteratorSupplierList.size()), which biases selection toward tuples in smaller downstream iterators and violates the Indexer.randomIterator uniform-probability contract, so pick downstream iterators proportionally to remaining element counts (or pre-flatten distinct elements before sampling when duplicates exist).

Random selection is hard. :-( I think there is some logic to this comment.

We cannot do this absolutely perfectly, the possibility of duplicates screws us here. But selecting the iterators proportionally makes sense; if something has 2x the size, it should be selected 2x as likely.

But isn't size() expensive (or expensive enough to justify removing it from the constructor)?

It is expensive. But:

Maybe it needn't be. Perhaps the indexer can maintain a counter, instead of always calling size() on the downstreams. It knows when it inserts and when it removes. Maybe it has no way of knowing when it removes a duplicate?

Correctness over performance; Copilot is right here, we skew the selection probability towards the smaller indexers.

It sucks though, because if we can't make size() perform reasonably, this really will be slow.

core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainedInIndexerTest.java

Copilot · 2026-03-10T06:34:39Z

core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java

+            Iterator<T> iterator() {
+                if (cachedDownstreamIterator != null) {
+                    return cachedDownstreamIterator;
+                }
+                cachedDownstreamIterator = downstreamIndexerIteratorFunction.apply(downstreamIndexerMap.get(key));
+                return cachedDownstreamIterator;


DownstreamIteratorSupplier.iterator() calls downstreamIndexerIteratorFunction.apply(downstreamIndexerMap.get(key)) without handling missing keys, which will throw a NullPointerException when the query key collection contains a key that has no downstream indexer (or when downstreamIndexerMap is empty), so skip absent keys or return an empty iterator instead.

sonarqubecloud · 2026-03-10T19:01:05Z

Quality Gate failed

Failed conditions
69.8% Coverage on New Code (required ≥ 70%)

See analysis details on SonarQube Cloud

Christopher-Chianelli · 2026-03-10T19:01:42Z

I have a suggestion: since Neighborhoods apparently require Joiners from NeighborhoodJoiners instead of Joiners, make NeighborhoodJoiners use SequencedCollection, and regular Joiners use Collection. Forcing SequencedCollection means users won't be able to use Set (which may be better for their domain model) even if they don't use neighborhoods.

triceo · 2026-03-10T19:20:48Z

Forcing SequencedCollection means users won't be able to use Set (which may be better for their domain model) even if they don't use neighborhoods.

This is the bit I don't understand. SequencedSet extends SequencedCollection - so they can always use LinkedHashSet. Or TreeSet. Yes, they can no longer use HashSet; but most uses of HashSet around the solver are wrong anyway, because they introduce non-reproducibility. IMO nothing bad about forcing (teaching) this when we can now.

(For example: we recently found non-reproducibility in one of the examples, because Jackson was deserializing JSON into a Set and decided to use HashSet. The collection then ended up as a value range, and that caused non-determinism. Had we forced SequencedSet, Jackson would have no choice but to use LinkedHashSet and this tricky well-hidden bug would never have shown up.)

Christopher-Chianelli · 2026-03-10T19:22:49Z

Forcing SequencedCollection means users won't be able to use Set (which may be better for their domain model) even if they don't use neighborhoods.

This is the bit I don't understand. SequencedSet extends SequencedCollection - so they can always use LinkedHashSet. Or TreeSet. Yes, they can no longer use HashSet; but most uses of HashSet around the solver are wrong anyway, because they introduce non-reproducibility. IMO nothing bad about forcing (teaching) this when we can now.

(For example: we recently found non-reproducibility in one of the examples, because Jackson was deserializing JSON into a Set and decided to use HashSet. The collection then ended up as a value range, and that caused non-determinism. Had we forced SequencedSet, Jackson would have no choice but to use LinkedHashSet and this tricky well-hidden bug would never have shown up.)

Set.of(...) returns a Set and not SequencedSet, and there no of method for SequencedSet.

Christopher-Chianelli · 2026-03-10T19:23:51Z

(Java unfortantly makes using SequencedSet extremely annoying)

Christopher-Chianelli temporarily deployed to internal March 4, 2026 20:57 — with GitHub Actions Inactive

triceo reviewed Mar 5, 2026

View reviewed changes

Christopher-Chianelli mentioned this pull request Mar 5, 2026

feat: convert benchmarks to use contains Joiners TimefoldAI/timefold-solver-benchmarks#115

Open

Christopher-Chianelli added 2 commits March 9, 2026 13:33

chore: review comments

f9b64b5

Christopher-Chianelli force-pushed the feat/8 branch from db14e7f to f9b64b5 Compare March 9, 2026 18:12

Christopher-Chianelli temporarily deployed to internal March 9, 2026 18:12 — with GitHub Actions Inactive

Christopher-Chianelli requested a review from triceo March 9, 2026 18:13

triceo approved these changes Mar 10, 2026

View reviewed changes

triceo requested a review from Copilot March 10, 2026 06:29

triceo marked this pull request as ready for review March 10, 2026 06:29

Copilot started reviewing on behalf of triceo March 10, 2026 06:29 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

chore: review comments

2eb6880

Christopher-Chianelli temporarily deployed to internal March 10, 2026 18:44 — with GitHub Actions Inactive

Conversation

Christopher-Chianelli commented Mar 4, 2026

Uh oh!

triceo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

triceo commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

triceo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

triceo Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Christopher-Chianelli Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

triceo Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Christopher-Chianelli Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

triceo Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

triceo Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Christopher-Chianelli Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

triceo Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Mar 10, 2026

Quality Gate failed

Uh oh!

Christopher-Chianelli commented Mar 10, 2026

Uh oh!

triceo commented Mar 10, 2026

Uh oh!

Christopher-Chianelli commented Mar 10, 2026

Uh oh!

Christopher-Chianelli commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

triceo commented Mar 6, 2026 •

edited

Loading

triceo Mar 10, 2026 •

edited

Loading