fix: score corruption caused by duplicate, implement randomIterator#2161
fix: score corruption caused by duplicate, implement randomIterator#2161Christopher-Chianelli wants to merge 3 commits intoTimefoldAI:mainfrom
Conversation
triceo
left a comment
There was a problem hiding this comment.
Adding some comments after first reading.
core/src/main/java/ai/timefold/solver/core/api/score/stream/Joiners.java
Outdated
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java
Show resolved
Hide resolved
|
One more thing that crossed my mind: the indexer for intersections is expensive, because it has to aggregate things from multiple collections and keep track of that. I'm wondering - would it make sense to create a specialized implementation for cases of only 1 collection? It's a question of whether this situation is likely to happen. |
- Move UnfinishedJoiners to Joiners - Score corruption was caused when a duplicate was encounter, causing the forEach iterator to skip the remaining elements in its current downstream iterator and proceed to the next downstream iterator instead. It now exhausts the current downstream iterator beforce proceeding to the next one. - Due to the existance of duplicates, the implemented randomIterator is not completely fair; to create a fair random iterator, all the elements must be put into a set first.
db14e7f to
f9b64b5
Compare
triceo
left a comment
There was a problem hiding this comment.
LGTM after comments are addressed.
(There are some valid ones in Sonar as well.)
core/src/main/java/ai/timefold/solver/core/api/score/stream/Joiners.java
Outdated
Show resolved
Hide resolved
| this.downstreamIteratorSupplierList = new ArrayList<>(indexKeyCollection.size()); | ||
| this.removedSet = new HashSet<>(); | ||
| this.workingRandom = workingRandom; | ||
| this.downstreamIndexerIteratorFunction = downstreamIndexerIteratorFunction; | ||
| for (var indexKey : indexKeyCollection) { | ||
| this.downstreamIteratorSupplierList.add(new DownstreamIteratorSupplier(indexKey)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Consider the main (and currently the only) use case of the random iterator, Neighborhoods.
Late Acceptance is a fast-stepping algorithm, therefore typically it will only select 1 move in a step. This means this iterator will be created once, its next() will be called once, and then it will be thrown away.
In this light, this constructor is still far too heavy. It creates one possibly large collection, and fills it entirely. It also instantiates another hash set. This is quite the overhead for something which will be happening all the time, and only be used once.
IMO this constructor needs to be as lazy as it can possibly be. This class needs to be optimized for the 1-time use path. And then possibly delegate to a different implementation which is optimized for the other path, where there are multiple selections.
There was a problem hiding this comment.
You do realize SequencedCollection cannot be accessed by index nor any way of randomly accessing its elements? The only way to randomly select an element from a SequencedCollection is to store it inside a list or another RandomAccess structure. If we always picked the first element, it is no longer a randomIterator.
There was a problem hiding this comment.
From this point of view, it matters not if it's a Collection or SequencedCollection; random access is not possible either way. Sequenced gives us more information without taking any away.
I'm probably missing something in your comment, because I don't see how it's relevant to this code.
There was a problem hiding this comment.
"In this light, this constructor is still far too heavy. It creates one possibly large collection, and fills it entirely. It also instantiates another hash set. This is quite the overhead for something which will be happening all the time, and only be used once."
i.e. You need to create one possible large collection and fill it regardless so you have random access to keys.
There was a problem hiding this comment.
But you don't need this one large collection. The logic is pretty much what it is now - pick one iterator, pick one random thing from it. If more values are needed, that's when the switch happens to the heavier logic.
That said, this may be a moot point once we address the comment on fair random selection and kill performance doing it.
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR fixes duplicate-handling issues in Bavet “containing/contained-in/containing-any-of” indexers (preventing skipped elements that can corrupt scoring), adds randomIterator support for ContainingAnyOfIndexer, and promotes the previously “unfinished” joiners into the public Joiners API.
Changes:
- Move
containing,containedIn, andcontainingAnyOfjoiners fromUnfinishedJoinersintoai.timefold.solver.core.api.score.stream.Joinersand removeUnfinishedJoiners(including JPMS export cleanup). - Fix duplicate iteration behavior in
ContainingAnyOfIndexerso encountering a duplicate does not prematurely advance to the next downstream iterator. - Implement and test
randomIteratorfor containing/contained-in/containing-any-of indexers (including new shared test helpers).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| core/src/main/java/ai/timefold/solver/core/api/score/stream/Joiners.java | Adds public API joiners for containing/contained-in/containing-any-of across Bi/Tri/Quad/Penta. |
| core/src/main/java/ai/timefold/solver/core/impl/score/stream/UnfinishedJoiners.java | Removes the internal placeholder joiners class now that functionality is in Joiners. |
| core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexer.java | Fixes duplicate iteration logic and implements randomIterator with duplicate-aware removal. |
| core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingIndexer.java | Updates documentation/imports to reference Joiners. |
| core/src/main/java/ai/timefold/solver/core/impl/bavet/common/index/ContainedInIndexer.java | Updates documentation/imports to reference Joiners. |
| core/src/main/java/module-info.java | Removes export of ai.timefold.solver.core.impl.score.stream package (now empty). |
| core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/AbstractIndexerTest.java | Adds utilities to exercise randomIterator deterministically (seeded) and helper tuple insertion methods. |
| core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingIndexerTest.java | Switches to Joiners and adds randomIterator test coverage for containing. |
| core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainingAnyOfIndexerTest.java | Switches to Joiners, adds duplicate regression test, and adds randomIterator tests for containing-any-of. |
| core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainedInIndexerTest.java | Switches to Joiners and adds randomIterator tests for contained-in. |
| while (!downstreamIteratorSupplierList.isEmpty()) { | ||
| var remainingIteratorCount = downstreamIteratorSupplierList.size(); | ||
| var selectedIndex = workingRandom.nextInt(remainingIteratorCount); | ||
| currentIteratorSupplier = downstreamIteratorSupplierList.get(selectedIndex); |
There was a problem hiding this comment.
RandomIterator selects a downstream iterator uniformly via workingRandom.nextInt(downstreamIteratorSupplierList.size()), which biases selection toward tuples in smaller downstream iterators and violates the Indexer.randomIterator uniform-probability contract, so pick downstream iterators proportionally to remaining element counts (or pre-flatten distinct elements before sampling when duplicates exist).
There was a problem hiding this comment.
Random selection is hard. :-( I think there is some logic to this comment.
We cannot do this absolutely perfectly, the possibility of duplicates screws us here. But selecting the iterators proportionally makes sense; if something has 2x the size, it should be selected 2x as likely.
There was a problem hiding this comment.
But isn't size() expensive (or expensive enough to justify removing it from the constructor)?
There was a problem hiding this comment.
It is expensive. But:
- Maybe it needn't be. Perhaps the indexer can maintain a counter, instead of always calling
size()on the downstreams. It knows when it inserts and when it removes. Maybe it has no way of knowing when it removes a duplicate? - Correctness over performance; Copilot is right here, we skew the selection probability towards the smaller indexers.
It sucks though, because if we can't make size() perform reasonably, this really will be slow.
core/src/test/java/ai/timefold/solver/core/impl/bavet/common/index/ContainedInIndexerTest.java
Show resolved
Hide resolved
| Iterator<T> iterator() { | ||
| if (cachedDownstreamIterator != null) { | ||
| return cachedDownstreamIterator; | ||
| } | ||
| cachedDownstreamIterator = downstreamIndexerIteratorFunction.apply(downstreamIndexerMap.get(key)); | ||
| return cachedDownstreamIterator; |
There was a problem hiding this comment.
DownstreamIteratorSupplier.iterator() calls downstreamIndexerIteratorFunction.apply(downstreamIndexerMap.get(key)) without handling missing keys, which will throw a NullPointerException when the query key collection contains a key that has no downstream indexer (or when downstreamIndexerMap is empty), so skip absent keys or return an empty iterator instead.
|
|
I have a suggestion: since Neighborhoods apparently require Joiners from |
This is the bit I don't understand. (For example: we recently found non-reproducibility in one of the examples, because Jackson was deserializing JSON into a |
|
|
(Java unfortantly makes using |


Draft, since it probably need more tests, but the modified examples pass
FULL_ASSERT.