Skip to content

Find all the old SUTs that need pruning #1510

@wpietri

Description

@wpietri

Many of our SUTs no longer work, but it's hard to tell which. We should take the list it gives us when it finds a bad SUT name and run it through something like this:

for s in `cat known_uids`; do modelbench benchmark general --sut $s -m 1 --evaluator private &> scratch/${s}.log; echo "$? $s"; done

Then we can either remove or fix anything that isn't working anymore.

Note that this requires a full secrets file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    choreShould do it, but it doesn't directly create value

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions