Skip to content

Weighted RandomSelector returns heavily biased results due to incorrect binarySearch usage #131

@twisti-dev

Description

@twisti-dev

Summary

RandomSelectorImpl.pick() uses Kotlin’s List.binarySearch with a comparison lambda that returns -1 for values < randomValue and 0 for values ≥ randomValue. This violates the comparator contract (it never returns a positive value for >), so the binary search is free to return any index where the lambda yields 0. In practice, it often returns a near-middle index, causing severe bias (e.g., “almost always the middle item”).

Affected code

// dev.slne.surf.surfapi.core.api.random.RandomSelectorImpl

override fun pick(randomGenerator: RandomGenerator): E {
    val totalWeight = cumulativeWeights.getDouble(cumulativeWeights.size - 1)
    val randomValue = randomGenerator.nextDouble(totalWeight)

    // BUG: comparator never returns > 0 for elements greater than randomValue
    val index = cumulativeWeights.binarySearch { if (it < randomValue) -1 else 0 }
        .let { if (it < 0) -(it + 1) else it }

    return elements[index]
}

Why this is a bug

Kotlin’s List<T>.binarySearch(comparison: (T) -> Int) expects the lambda to behave like a comparator relative to a fixed key: return negative for elements less than the key, zero for elements equal to the key, and positive for elements greater than the key. Our lambda returns 0 for both equal *and greater elements, which makes the ordering non-strict and unspecified. The implementation may therefore return an arbitrary “match” index among all elements ≥ randomValue—commonly near the middle—leading to biased selection.

This explains field behavior like “almost always gravel” in certain phases even though weights suggest otherwise.

Steps to reproduce

  1. Build a cumulative weight list, e.g. [50, 70, 100] (weights 50/20/30).
  2. Call pick() thousands of times.
  3. Observe that the second element is selected disproportionately often (bias towards a middle index), instead of ~50%, ~20%, ~30%.

(We validated this by swapping in a simple lower-bound implementation: the bias disappears and distributions match expectations.)

Expected behavior

Elements should be selected with probability proportional to their weights (i.e., index is the first cumulative ≥ randomValue—a proper lower bound).

Fix

Use a correct lower-bound search. Two safe options:

Option A: Manual lower-bound (recommended)

override fun pick(randomGenerator: RandomGenerator): E {
    val totalWeight = cumulativeWeights.getDouble(cumulativeWeights.size - 1)
    val r = randomGenerator.nextDouble(totalWeight)

    // lower_bound: first index with cumulativeWeights[idx] >= r
    var lo = 0
    var hi = cumulativeWeights.size - 1
    while (lo < hi) {
        val mid = (lo + hi) ushr 1
        if (r <= cumulativeWeights.getDouble(mid)) {
            hi = mid
        } else {
            lo = mid + 1
        }
    }
    return elements[lo]
}

Option B: Use proper comparator with binarySearch

If you keep the Kotlin helper, make sure the comparator distinguishes >:

val i = cumulativeWeights.binarySearch { it.compareTo(r) } // negative, zero, positive
val index = if (i < 0) -(i + 1) else i
return elements[index]

Note: cumulativeWeights is a FastUtil DoubleList. Ensure you’re calling an overload that actually uses the provided comparator; otherwise prefer Option A.

Additional notes

  • fromWeightedIterable and fromIterable correctly build cumulative sums; the bug is strictly in how the index is found.
  • nextDouble(totalWeight) generates r ∈ [0, totalWeight), so there is always an index with cumulative ≥ r—no extra clamping needed.
  • The FlowRandomSelectorImpl is unrelated to this bug.

Suggested tests

  1. Deterministic distribution test (seeded RNG):

    • Weights: [50.0, 20.0, 30.0]
    • Run 100k picks; assert frequencies within ±2% of expected.
  2. Edge cases:

    • Single element (always index 0).
    • Highly skewed weights (e.g., [1e-9, 1e9]) to ensure extremes behave.
    • Many elements (e.g., 10k items) to validate no off-by-one at boundaries.

Environment

  • Kotlin stdlib: binary search behavior per documentation (no guarantee which equal element is returned).
  • FastUtil DoubleList for cumulative weights.

Proposed change (diff)

diff --git a/core/api/random/RandomSelectorImpl.kt b/core/api/random/RandomSelectorImpl.kt
@@
     override fun pick(randomGenerator: RandomGenerator): E {
         val totalWeight = cumulativeWeights.getDouble(cumulativeWeights.size - 1)
-        val randomValue = randomGenerator.nextDouble(totalWeight)
-
-        val index = cumulativeWeights.binarySearch { if (it < randomValue) -1 else 0 }
-            .let { if (it < 0) -(it + 1) else it }
-
-        return elements[index]
+        val r = randomGenerator.nextDouble(totalWeight)
+
+        // lower_bound search for first cumulative >= r
+        var lo = 0
+        var hi = cumulativeWeights.size - 1
+        while (lo < hi) {
+            val mid = (lo + hi) ushr 1
+            if (r <= cumulativeWeights.getDouble(mid)) {
+                hi = mid
+            } else {
+                lo = mid + 1
+            }
+        }
+        return elements[lo]
     }

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions