Skip to content

Conversation

mzuenni
Copy link
Collaborator

@mzuenni mzuenni commented Oct 11, 2025

Addresses #460

This adds the new command bt check_testing_tool which uses the downloadable samples as well as the files found in the new directory data/testing_tool_test.

This is a bit hacky but should work in most cases?

  • it assumes that the testing tool is called testing_tool<.ext>
  • it assumes that the testing tool can be called with -f <test file> <submission run command>
  • it uses a man-in-the-middle script to get the exit code of the submission and of the testing tool
  • the MITM script only works if the working directory was not changed by the testing tool
  • it assumes that the testing tool uses a non-zero exit code if something goes wrong

@paul-wild want to try this?

@paul-wild
Copy link
Contributor

I tested this on three interactive problems, Slot Machine (WF'2025 I), Where Am I Now? (WF'2024 L) and Lateral Damage (NWERC'2023 L). It works pretty well already (in particular it was quite easy to rediscover the bugs in the testing tools for the former two), but there are a few things that could/should be improved:

  • The testing tool might take a different input format than the input validator (this is the case for the latter two examples), so the generate command should not try to validate the data in testing_tool_test.
  • I think that two very common use cases for testing tool tests are "all secret data" or "all secret data as modified by the following script" and it would be convenient to have a shorthand for that. In the former case you can just copy all generator lines from secret, but then bt generate will shout at you because of duplicated entries. Plus, if you do it manually you need to remember to update this when adding new secret data.
  • It would be nice if the name and arguments of the testing tool were not assumed to follow a fixed scheme. I could imagine some testing tools whose behaviour may depend on additional flags instead of just an input file. This point is also linked to the previous points; you could have a wrapper script that specifies how to feed a secret case into the testing tool.
  • When testing this with a buggy testing tool, a submission ran into an infinite loop. Maybe apply a timeout?
  • I don't know whether you care, but this new command makes the set of bt commands no longer prefixfree (because run also exists).

I did not yet test this on any multipass problems.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Oct 12, 2025

The testing tool might take a different input format than the input validator (this is the case for the latter two examples), so the generate command should not try to validate the data in testing_tool_test.

I thought i already disabled all validation... but maybe i only disabled answer valdiation... need to check fixed?

I think that two very common use cases for testing tool tests are "all secret data" or "all secret data as modified by the following script" and it would be convenient to have a shorthand for that.

if its identical, you can use the include feature. For the second, I am unsure if a shorthand for that is good.

When testing this with a buggy testing tool, a submission ran into an infinite loop. Maybe apply a timeout

I thought there is a timeout... need to check again fixed?

I don't know whether you care, but this new command makes the set of bt commands no longer prefixfree (because run also exists

hmmm any name recommendation?

@mzuenni
Copy link
Collaborator Author

mzuenni commented Oct 12, 2025

It would be nice if the name and arguments of the testing tool were not assumed to follow a fixed scheme. I could imagine some testing tools whose behaviour may depend on additional flags instead of just an input file.

i think you can already put it inside a directory and add a run script, not sure though

@paul-wild
Copy link
Contributor

After the most recent change, I now receive Error occurred during initialization of VM for any Java/Kotlin submissions. To be specific, this is what those submissions output, which of course causes the interaction with the testing tool to fail.

The input files in testing_tool_test now indeed are no longer validated, and a timeout is properly applied.

I think that two very common use cases for testing tool tests are "all secret data" or "all secret data as modified by the following script" and it would be convenient to have a shorthand for that.

if its identical, you can use the include feature. For the second, I am unsure if a shorthand for that is good.

Ah yes, include works really well in the first case. I guess it's alright if the second case doesn't get a shorthand, but it is quite inconvenient to achieve this effect currently, unless I'm missing a cleaner method. I basically placed the transformed inputs inside a subdirectory of generators/ and used copy, but of course this results in a lot of clutter.

It would be nice if the name and arguments of the testing tool were not assumed to follow a fixed scheme. I could imagine some testing tools whose behaviour may depend on additional flags instead of just an input file.

i think you can already put it inside a directory and add a run script, not sure though

But you wouldn't want that stuff in the attachments/ directory, right? So I'm not sure I understand this comment.

To give some examples for why I think more flexibility would be nice:

  • If you have more than one interactive/multipass problem in a contest, you might use <problemname>_testing_tool.py to differentiate them.
  • Existing testing tools might use some different conventions. The -f flag is kind of redundant, so a testing tool might not actually have it. A testing tool might also not want to use input files, e.g. if the input is just a single integer, which you could then specify as command line argument instead.

Of course one can always rewrite testing tools to fit the given format, but if one was instead able to specify where the testing tool is located and how it should be run, then you could perhaps do something like this inside the testing_tool_test entry in the generators.yaml (or even for each testdata group inside of it):

# no -f needed, can specify path
testing_tool: attachments/slotmachine_testing_tool.py testcase.in {solution}

# chaining with an input transformation script
testing_tool: generators/transform-input.py < testcase.in > testcase.in.transformed; attachments/whereaminow_testing_tool.py testcase.in.transformed {solution}

@paul-wild
Copy link
Contributor

Though thinking about it a bit more, for newly developed interactive problems it should always be possible to choose the input format such that any valid input for the testing tool is also valid input for the interactor, even if the latter features additional behaviour such as adaptiveness or different strategies of playing a game.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Oct 13, 2025

After the most recent change, I now receive Error occurred during initialization of VM for any Java/Kotlin submissions.

ahrg... thats because jvm does not handle memory restrictions and our exec call does not see the submission but only the testing_tool. will be fixed

@mzuenni
Copy link
Collaborator Author

mzuenni commented Oct 13, 2025

for newly developed interactive problems it should always be possible to choose the input format such that any valid input for the testing tool is also valid input for the interactor

yes and I think the same is true for the -f flag. I would also argue that this makes usage easier for teams (if we always use the same arguments)

there is still the issue with the name... run is really commonly used so a different prefix would be nice, but bt test also already exists...

Copy link
Owner

@RagnarGrootKoerkamp RagnarGrootKoerkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have a close look at testing_tool.py:run but otherwise lgtm.

You should add some docs though to explain precisely what is run when doing bt check_testing_tool, and what is the expected invocation of the testing tool.

@mzuenni mzuenni merged commit 5eff3a0 into main Oct 20, 2025
6 checks passed
@mzuenni mzuenni deleted the testing-tool-test branch October 20, 2025 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants