There should be an easy workflow way to do "run smoke across many implementations and only see failures".
For --format json this likely just means ensuring there's a decent jq query on what we output.
But for "pretty" output we may need a CLI option, or else simply to ensure this is easily doable.