Many of our SUTs no longer work, but it's hard to tell which. We should take the list it gives us when it finds a bad SUT name and run it through something like this:
for s in `cat known_uids`; do modelbench benchmark general --sut $s -m 1 --evaluator private &> scratch/${s}.log; echo "$? $s"; done
Then we can either remove or fix anything that isn't working anymore.
Note that this requires a full secrets file.
Many of our SUTs no longer work, but it's hard to tell which. We should take the list it gives us when it finds a bad SUT name and run it through something like this:
Then we can either remove or fix anything that isn't working anymore.
Note that this requires a full secrets file.