Skip to content

Score type where a submission's score can depend on other submissions ("relative-to-best" scoring) #1492

@prandla

Description

@prandla

To elaborate on the title a bit: we occasionally make optimization tasks where we don't know an optimal solution, and where we'd like to give 100 points to the best contestant, and proportionally less to everyone else.

The way we currently do this is not great: we hardcode the best score on each subtask into the checker (or as a hint in the "correct output" files that are passed to the checker), and whenever someone submits a better solution, we manually edit the checker and re-evaluate all submissions.

I looked through our archives to see what kind of scoring rules we have used. Turns out that while the scoring is phrased quite differently in task statements (e.g. "points = 5 * 2^(your_result - best_result)"), almost all of them are "linearizable", where the only dynamic adjustment necessary is score = outcome / best_outcome. So for these, rerunning the entire evaluation could be quite wasteful.

(There was one task where half of the points were for correctness and half for optimality. This could be implemented by making the dynamic scoring configurable per-testcase, and have the score type implement a dependency between these pairs of testcases. There was also one task where the scoring was something like "best solution gets 100 points, worst one 5 points, the others are linear by their position on the leaderboard". I'm going to say that is out of scope, I don't even know how to implement it automatically in a secure way.)

So, I think my proposal UX-wise would be to add a boolean flag on each testcase: "use relative-to-best scoring", which scales the outcome returned by the checker by the highest-outcome official solution for that testcase. Implementation-wise, ScoringService can keep track of the best score for each testcase, and invalidate all scores for a testcase whenever a new best score is found. This would allow all the linearizable tasks to be implemented very efficiently.

However, for the separate correctness and optimality use case, this isn't actually ideal, as the evaluation for correctness checking and optimality checking would likely be mostly duplicated. This specific use case could be implemented more efficiently by letting the checker return two floats, one for base score and one for relative score. (the "solution must be fully correct in order to get optimality points" rule can then be implemented inside the checker, by returning 0 for the relative score if the base score isn't good enough.) But i'm wondering if this is too specific to be worth implementing in base CMS (and it would require a bit of internal rework to make returning multiple values from the checker work...).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions