From dd63d01598fa218edaa71ee45363d009f28c0cf5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fredrik=20Niemel=C3=A4?= Date: Sun, 31 Aug 2025 23:17:59 +0400 Subject: [PATCH 1/6] secret and sample are not groups --- spec/2023-07-draft.md | 51 ++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 25 deletions(-) diff --git a/spec/2023-07-draft.md b/spec/2023-07-draft.md index af4983fc..24465e6e 100644 --- a/spec/2023-07-draft.md +++ b/spec/2023-07-draft.md @@ -446,7 +446,7 @@ Markdown statements also support `.svg`. - For problem statements provided as PDFs: the judge system will display the PDF verbatim; therefore any sample data must be included in the PDF. - The judge system is not required to reconcile sample data embedded in PDFs with the `sample` test data group nor to validate it in any other way. + The judge system is not required to reconcile sample data embedded in PDFs with the `sample` test data nor to validate it in any other way. ### LaTeX Environment and Supported Subset @@ -573,10 +573,10 @@ The format of `test_group.yaml` is as follows: Key | Type | Default | Comments ------------------------- | ----------------------------------- | ---------------------------------------------- | -------- -`max_score` | Integer or `unbounded` | 100 in `secret`, otherwise `unbounded` | The maximum possible score of the test data group. Must be a non-negative integer or `unbounded`. This key is only permitted for the `secret` group and its subgroups. -`score_aggregation` | `pass-fail`, `sum`, or `min` | `sum` in `secret`, otherwise `pass-fail` | How the score is determined based on the scores of the contained groups or test cases. See [Result Aggregation](#result-aggregation). This key is only permitted for the `secret` group and its subgroups. -`static_validation_score` | Integer or `pass-fail` | | The maximum score of the static validation test case, or `pass-fail`. See [Static Validator](#static-validator). -`require_pass` | String or sequence of strings | empty sequence | Other test data groups whose test cases a submission must pass in order to receive a score for this test group. See [Result Aggregation](#result-aggregation). This key is only permitted for the `secret` group and its subgroups. +`max_score` | Integer or `unbounded` | 100 in `secret`, otherwise `unbounded` | The maximum possible score of the test data group or `secret`. Must be a non-negative integer or `unbounded`. This key is not permitted in `sample`. +`score_aggregation` | `pass-fail`, `sum`, or `min` | `sum` in `secret`, otherwise `pass-fail` | How the score is determined based on the scores of the contained groups or test cases. See [Result Aggregation](#result-aggregation). This key is not permitted in `sample`. +`static_validation_score` | Integer or `pass-fail` | | The maximum score of the static validation test case, or `pass-fail`. See [Static Validator](#static-validator). +`require_pass` | String or sequence of strings | empty sequence | Other test data groups (or `sample`) whose test cases a submission must pass in order to receive a score for this test data group. See [Result Aggregation](#result-aggregation). This key is not permitted in `sample`. `args` | Sequence of strings | empty sequence | See [Test Case Configuration](#test-case-configuration). `input_validator_args` | Sequence of strings or map of strings to sequences of strings | empty sequence | See [Test Case Configuration](#test-case-configuration). `static_validator_args` | Sequence of strings | empty sequence | See [Static Validator](#static-validator). @@ -592,7 +592,7 @@ Every subdirectory of `data/secret/` is a test data group and may contain a `tes `data/secret` can only have test data groups *or* test cases, never both. That is, if there are any directories under `data/secret/` there must not be any `.in` files directly in `data/secret/` and vice versa. -The test groups themselves can contain directories, but not further groups. +The test data groups themselves can contain directories, but not further groups. This means that there are no `test_group.yaml` further down in the directory hierarchy. A directory must not have the same name as a test case in the same directory. @@ -720,7 +720,7 @@ Every submission is run on these test cases. Sample test cases do not contribute to the problem score for [scoring problems](#scoring-problems). If a `score.txt` file is produced on sample test cases on a scoring problem, it is not an error, but simply ignored. -`data/sample` must not contain test groups. +`data/sample` must not contain any test data groups. It may be missing (for problems with no samples) or empty. #### Samples Shown in the Problem Statement @@ -845,13 +845,13 @@ Every submission matched by the glob pattern must satisfy: The tooling should check the constraints for consistency, such as that two disjoint `permitted` sets are never applied to a single `(submission, testcase)` pair. -### Groups +### Test Data Groups The `permitted`, `required`, `score`, `message`, and `use_for_time_limit` requirements can also be given for only a subset of test cases, -by adding them under a key with the name of a test group (relative to `data/`). +by adding them under a key with the name of a test data group (relative to `data/`). In this case, the `permitted`, `required`, `message`, and `use_for_time_limit` keys only apply to the set of test cases (recursively) in the given group. -The `score` key puts a constraint on the aggregated score of a given test group, _not_ on the _set_ of test cases the group contains. +The `score` key puts a constraint on the aggregated score of a given test data group, _not_ on the _set_ of test cases the group contains. For example, the configuration below tests that the submission solves all cases in `group1`, but times out on at least one case in `group2`. ```yaml @@ -867,8 +867,8 @@ solves_group_1.py: #### Glob patterns -Glob patterns can be used to apply restrictions to a subset of submissions. It is also possible to use glob patterns to put restrictions on a subset of test -cases and test groups, for example, when test groups are not used: +Glob patterns can be used to apply restrictions to a subset of submissions. +It is also possible to use glob patterns to put restrictions on a subset of test cases and test data groups, for example, when groups are not used: ```yaml time_limit_exceeded/solves_easy_cases.py: sample: @@ -883,9 +883,9 @@ This means that the submission must solve all samples and all easy cases, but must time out on at least one of the hard cases. Submission glob patterns are matched against all paths to files and directories of submissions inside and relative to the `submissions/` directory. -Test case glob patterns are matched against all paths of test groups and test cases relative to `data/`, +Test case glob patterns are matched against all paths of test data groups and test cases relative to `data/`, excluding the trailing `.in`. Wildcards (`*`) only match within a file name (i.e., do not match `/`). -A test case is matched by the glob pattern if either itself or any of its parent test groups is matched by it, +A test case is matched by the glob pattern if either itself or any of its parent test data groups is matched by it, and similarly a submission is matched if either itself or a parent directory is matched. Using `**` to match any number of directories and `[xyz]` to match only a subset of characters is not supported. @@ -977,7 +977,7 @@ Validation fails if any validator fails. An input validator program must be an application (executable or interpreted) capable of being invoked with a command line call. -All input validators provided will be run on every test data file using the arguments specified for the test data group they are part of. +All input validators provided will be run on every test data file using the arguments specified for the test data group (or `sample, or `secret`) they are part of. Validation fails if any validator fails. When invoked, the input validator will get the input file on stdin. @@ -1017,17 +1017,17 @@ A static validator may be provided under the `static_validator` directory, simil ### Static Validation Test Cases -Each test group may define a static validation test case. +Each test data group (or `sample, or `secret`) may define a static validation test case. It is an error to define static validation test cases without providing a static validator. -A static validation test case is defined within a group's `test_group.yaml` file by specifying the key `static_validation_score`. +A static validation test case is defined in `test_group.yaml` by specifying the key `static_validation_score`. If `static_validation_score` is specified as a non-negative integer, then it is the maximum score of the static validation test case (see [Scoring Problems](#scoring-problems) for details). -If it is specified as `pass-fail`, then the test group it is part of must have `pass-fail` aggregation, or the problem must be of type `pass-fail`. +If it is specified as `pass-fail`, then `score_aggregation` must be set to `pass-fail`, or the problem must be of type `pass-fail`. If `static_validator_args` is given, then it defines arguments passed to the static validator. It is an error to: - provide a static validator for `submit-answer` type problems, -- specify a `static_validation_score` in a test group with `pass-fail` aggregation or a problem that does not have the type `scoring`, +- specify a `static_validation_score` in a test data group with `pass-fail` aggregation or a problem that does not have the type `scoring`, - specify `static_validator_args` without specifying `static_validation_score`. ### Invocation @@ -1340,10 +1340,10 @@ It is a judge error if: - an output or static validator produces a `score.txt` for a test case with bounded maximum score with a value that exceeds this maximum score; - an output or static validator produces a `score.txt` or `score_multiplier.txt` with invalid contents. -#### Scoring Test Groups +#### Scoring Test Data Groups The score of `secret` is determined by its groups or test cases (it can only have one or the other). -The score of a test group is determined by its test cases. +The score of a test data group is determined by its test cases. The score depends on the aggregation mode, which is either `pass-fail`, `sum`, or `min`. - If a group uses `pass-fail` aggregation, the group must have bounded maximum score. @@ -1353,13 +1353,13 @@ Otherwise the group score is 0. - If a group uses `sum` aggregation, the group score is the sum of the scores of its test cases or groups. - If a group uses `min` aggregation, then the group score is the minimum of these scores. -The submission score is the score of the `secret` group. +The submission score is the score of `secret`. -It is a judge error if the score of any group or subgroup exceeds its `max_score`. +It is a judge error if the score of `secret`, or any test data group exceeds its `max_score`. #### Required Dependent Groups -A group may specify that it depends on some other test data groups. +A test data group, or `secret` may specify that it depends on some other test data groups, or `sample`. Each required group must be either `sample` or have `pass-fail` aggregation. The dependent group will only be run if the group being depended on receives an accepted verdict for all test cases in the group. If the dependent group is not run, the group score is 0. @@ -1367,4 +1367,5 @@ If the dependent group is not run, the group score is 0. The paths of these required groups, relative to the `data` folder, are listed under the `require_pass` key. A path consists of zero or more directory names followed by a directory or file name, with a `/` character separating consecutive names. Each name must conform to the [general requirements](#general-requirements) on directory and file names and the specified test data group must exist. -The path of a group, relative to the `data/` folder, must come later lexicographically than the paths of all test cases and groups it depends on. +A group must not depend on a group that is lexicographically earlier than itself. +A group must not depend on a group that is hierarchally above itself. From c81e493bd677e18520448286c8d96cb181abd1b7 Mon Sep 17 00:00:00 2001 From: Etienne Vouga Date: Tue, 2 Sep 2025 01:11:09 +0400 Subject: [PATCH 2/6] Refactor test group scoring to account for secret not being a test group --- spec/2023-07-draft.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/spec/2023-07-draft.md b/spec/2023-07-draft.md index 24465e6e..b8f62c10 100644 --- a/spec/2023-07-draft.md +++ b/spec/2023-07-draft.md @@ -1342,7 +1342,6 @@ It is a judge error if: #### Scoring Test Data Groups -The score of `secret` is determined by its groups or test cases (it can only have one or the other). The score of a test data group is determined by its test cases. The score depends on the aggregation mode, which is either `pass-fail`, `sum`, or `min`. @@ -1350,12 +1349,16 @@ The score depends on the aggregation mode, which is either `pass-fail`, `sum`, o If the submission receives an accepted verdict for all test cases in the group, the score of the group is equal to its maximum possible score. Otherwise the group score is 0. -- If a group uses `sum` aggregation, the group score is the sum of the scores of its test cases or groups. +- If a group uses `sum` aggregation, the group score is the sum of the scores of its test cases. - If a group uses `min` aggregation, then the group score is the minimum of these scores. +The score of `secret` is determined by its groups or test cases (it can only have one or the other). +- If `secret` uses `pass-fail` aggregation, then `secret` must have bounded maximum score. If the submission receives an accepted verdict for all test cases in `secret`, or for all test cases in its test data groups (which must also have `pass-fail` aggregation), then the score of `secret` is equal to its maximum possible score. Otherwise the score of `secret` is 0. +- If `secret` uses `sum` or `min` aggregation, then the score of `secret` is computed from the scores of its test cases or test data groups, analogously to test data group scoring above. + The submission score is the score of `secret`. -It is a judge error if the score of `secret`, or any test data group exceeds its `max_score`. +It is a judge error if the score of `secret` or any test data group exceeds its `max_score`. #### Required Dependent Groups From ac881931cead8855db043542cb7de71f11742af1 Mon Sep 17 00:00:00 2001 From: Etienne Vouga Date: Tue, 2 Sep 2025 01:13:49 +0400 Subject: [PATCH 3/6] rm stray commas --- spec/2023-07-draft.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/2023-07-draft.md b/spec/2023-07-draft.md index b8f62c10..499fb933 100644 --- a/spec/2023-07-draft.md +++ b/spec/2023-07-draft.md @@ -1362,7 +1362,7 @@ It is a judge error if the score of `secret` or any test data group exceeds its #### Required Dependent Groups -A test data group, or `secret` may specify that it depends on some other test data groups, or `sample`. +A test data group or `secret` may specify that it depends on some other test data groups or `sample`. Each required group must be either `sample` or have `pass-fail` aggregation. The dependent group will only be run if the group being depended on receives an accepted verdict for all test cases in the group. If the dependent group is not run, the group score is 0. From 2264619b1d25a1d04786848bb1a54933a13a3f01 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fredrik=20Niemel=C3=A4?= Date: Tue, 2 Sep 2025 11:55:14 +0400 Subject: [PATCH 4/6] Update spec/2023-07-draft.md Co-authored-by: Harry Zhang <75111093+hairez@users.noreply.github.com> --- spec/2023-07-draft.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/2023-07-draft.md b/spec/2023-07-draft.md index 499fb933..6d1eedaa 100644 --- a/spec/2023-07-draft.md +++ b/spec/2023-07-draft.md @@ -977,7 +977,7 @@ Validation fails if any validator fails. An input validator program must be an application (executable or interpreted) capable of being invoked with a command line call. -All input validators provided will be run on every test data file using the arguments specified for the test data group (or `sample, or `secret`) they are part of. +All input validators provided will be run on every test data file using the arguments specified for the test data group (or `sample`, or `secret`) they are part of. Validation fails if any validator fails. When invoked, the input validator will get the input file on stdin. From 07801a4b521b7b0f00f1db0308cf051d9cf66a03 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fredrik=20Niemel=C3=A4?= Date: Tue, 2 Sep 2025 11:55:33 +0400 Subject: [PATCH 5/6] Update spec/2023-07-draft.md Co-authored-by: Harry Zhang <75111093+hairez@users.noreply.github.com> --- spec/2023-07-draft.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/2023-07-draft.md b/spec/2023-07-draft.md index 6d1eedaa..75045928 100644 --- a/spec/2023-07-draft.md +++ b/spec/2023-07-draft.md @@ -1017,7 +1017,7 @@ A static validator may be provided under the `static_validator` directory, simil ### Static Validation Test Cases -Each test data group (or `sample, or `secret`) may define a static validation test case. +Each test data group (or `sample`, or `secret`) may define a static validation test case. It is an error to define static validation test cases without providing a static validator. A static validation test case is defined in `test_group.yaml` by specifying the key `static_validation_score`. From 7c4ebb785401a71af74039bc7d7d2b16d4b55d54 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fredrik=20Niemel=C3=A4?= Date: Tue, 2 Sep 2025 12:02:17 +0400 Subject: [PATCH 6/6] Update spec/2023-07-draft.md Co-authored-by: Harry Zhang <75111093+hairez@users.noreply.github.com> --- spec/2023-07-draft.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/2023-07-draft.md b/spec/2023-07-draft.md index 75045928..36e0a3da 100644 --- a/spec/2023-07-draft.md +++ b/spec/2023-07-draft.md @@ -1363,7 +1363,7 @@ It is a judge error if the score of `secret` or any test data group exceeds its #### Required Dependent Groups A test data group or `secret` may specify that it depends on some other test data groups or `sample`. -Each required group must be either `sample` or have `pass-fail` aggregation. +Each required group must have `pass-fail` aggregation. The dependent group will only be run if the group being depended on receives an accepted verdict for all test cases in the group. If the dependent group is not run, the group score is 0.