You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Fix the configuration launch scripts. Add a check for when the number of ranks and the number of GPUs per rank exceed the total number of GPUs per node.
* Updating tests to new API.
* Fixed a bug in the scheduler classes, where the host file was being
created after the job was run. This prevented the CI tests from self
checking. Additionally, added checks in the CI to skip if MPI wasn't
detected.
f"The combination of {procs_per_node} processes per node and {gpus_per_proc} GPUs per process exceeds the number of GPUs per node {system_params.gpus_per_node}"
91
+
f"The combination of {procs_per_node} processes per node and {gpus_per_proc} GPUs per process exceeds the number of GPUs per node {system_params.gpus_per_node} - Job will not launch, please fix requested parameters"
82
92
)
83
93
84
94
# If the user requested a specific number of processes per node, honor that
@@ -88,8 +98,6 @@ def configure_launch(
88
98
# Otherwise, if there is a valid set of system parameters, try to fill in
0 commit comments