You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add CLI option for system parameters and support for Matrix system (#27)
* Added a system definition for Matrix and Vector.
* Fixed memory size
* Added command line argument to express the desired number of GPUs per
process (task). For most AI codes, this should be 1, which is the
default, but it can now be set.
Updated the FLUX and SLURM schedulers to set this field when
necessary. This also addresses an issue when running on compute
resources that can be shared and are not exclusive.
* Ran black.
* Added support for specifying a set of command line arguments for the
system parameters. These will overwrite and known or autodetected
system parameters.
help="Run locally (i.e., one process without a batch ""scheduler)",
99
118
)
100
119
120
+
# System
121
+
group=parser.add_argument_group(
122
+
"System",
123
+
"Provide system parameters from the CLI -- overrides built-in system descriptions and autodetection",
124
+
)
125
+
group.add_argument(
126
+
"-p",
127
+
"--system-params",
128
+
dest="system_params",
129
+
nargs='+',
130
+
action=ParseKVAction,
131
+
help="Specifies some or all of the parameters of a system as a dictionary (note it will override any known or autodetected parameters): -p cores_per_node=<int> gpus_per_node=<int> gpu_arch=<str> mem_per_gpu=<float> numa_domains=<int> scheduler=<str>",
132
+
metavar="KEY1=VALUE1",
133
+
)
134
+
101
135
# Schedule
102
136
group=parser.add_argument_group(
103
137
"Schedule", "Arguments that determine when a job will run"
f"The combination of {procs_per_node} processes per node and {gpus_per_proc} GPUs per process exceeds the number of GPUs per node {system_params.gpus_per_node}"
82
+
)
83
+
50
84
# If the user requested a specific number of processes per node, honor that
0 commit comments