This is an issue related to resilient batch jobs, first mentioned in #6304. If a batch job has configured -o exit-timeout=none, and non-critical ranks are lost up to the point where there are not enough available nodes to run any pending job, the instance will hang until timeout. This is because Flux is currently designed to assume that down nodes may eventually return to service, but this not supported except when bootstrap uses a config file, so does not apply to jobs.
I'm not exactly sure how to address this, since the down node assumption is currently fundamental. Note that this cannot just be handled in a special submission feasibility plugin because jobs could already be pending when a node is lost.