Skip to content

Move evaluation speed should not count non-foraged moves in multi-threaded solving #2261

@ge0ffrey

Description

@ge0ffrey

What is move evaluation?

A) The scoring of a move
B) The scoring + foraging of a move

For single-threaded solving, A and B are equal. Every scored moved is evaluated.
For multi-threaded solving, there are "wasted moves": moves that got scored but don't get foraged.

Why do users care about move evaluation speed?

  1. In constraint development, to asses if adding a new constraint halves that number.
  2. When comparing two run, to validate the hardware is apples to apples.
  3. When comparing two benchmarks, to see if its "twice as fast", to get a lineair growing number. It's a proxy to performance.
  4. During a model review, to quickly asses if "the constraints are slow" (below 10'000).

The first three care about the relative difference only. The last one only cares about the order of magnitude of the absolute number.

Why is move speed used as a proxy to performance quality?

Because there's nothing better.

Ideally, it would use the score: better performance leads to a better score. And better score is better.

But the score doesn't increase linearly. It even flatlines. So if the code is 500% faster for a lot of work, the score might be only be 3% better. Or even the same. Looking at the score on that dataset would not value the impact of that work correctly. Score is a bad way to measure performance quality.

Should move evaluation speed count wasted moves in multi-threading?

Only if we define it as A) the scoring of a move.
But we can also define it as B).

No, because it corrupts "comparing two benchmarks" between benchmarks with different number of threads.

For example, according to A:

  • benchmark X has 2 threads with a speed of 20'000
  • benchmark Y has 4 threads with a speed of 25'000

This will lead to the conclusion that Y is better than X, which is false: Y has a worse score, because Y has 10'000 wasted moves.

According to B:

  • benchmark X has 2 threads with a speed of 20'000
  • benchmark Y has 4 threads with a speed of 15'000

This will lead to the right conclusion that X is better than Y.

C) Should we not show both? scoring speed and scoring+foraging speed?

No. Users don't care about A), not in case 1), case 2), case 3) nor case 4).
Only our solver engineers do. Of course it can be an internal metric.

For users, the only question is "is it faster?" (as in "do I get more?")
The log line and platform UI should stick to one number ("move speed"), the one that matters. Less is more.

Metadata

Metadata

Assignees

No one assigned

    Labels

    process/needs triageRequires initial assessment of validity, priority etc.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions