HPC-launcher `launch` Command Documentation

Overview

The launch command is a versatile tool for launching distributed jobs on HPC clusters or cloud environments. It provides a unified interface across different schedulers (SLURM, LSF, Flux) and supports both interactive and batch job submission.

Synopsis

launch [options] command [args...]

Command Structure

launch [-h] [--verbose] [-N NODES] [-n PROCS_PER_NODE] [--gpus-per-proc GPUS_PER_PROC]
       [-q QUEUE] [-t TIME_LIMIT] [-g GPUS_AT_LEAST] [--gpumem-at-least GPUMEM_AT_LEAST]
       [--exclusive] [--local] [--comm-backend JOB_COMM_PROTOCOL]
       [-x KEY=VALUE [KEY=VALUE ...]] [--bg] [--batch-script BATCH_SCRIPT]
       [--scheduler {local,flux,slurm,lsf}]
       [-l [LAUNCH_DIR]] [-o OUTPUT_SCRIPT] [--setup-only] [--dry-run]
       [--account ACCOUNT] [--dependency DEPENDENCY] [-J JOB_NAME]
       [--reservation RESERVATION] [--save-hostlist]
       [-p KEY=VALUE [KEY=VALUE ...]] [--out OUT_LOG_FILE] [--err ERR_LOG_FILE]
       [--color-stderr] command [args...]

Positional Arguments

Argument	Description
`command`	Command to be executed
`args`	Arguments to the command that should be executed

Optional Arguments

General Options

Option	Short Form	Description
`--help`	`-h`	Show help message and exit
`--verbose`	`-v`	Run in verbose mode. Also save the hostlist as if `--save-hostlist` is set

Job Size Options

These options determine the number of nodes, accelerators, and ranks for the job.

Option	Short Form	Description	Notes
`--nodes`	`-N`	Specifies the number of requested nodes
`--procs-per-node`	`-n`	Specifies the number of requested processes per node	Mutually exclusive with `-g`
`--gpus-per-proc`		Specifies the number of requested GPUs per process	Default: 1
`--queue`	`-q`	Specifies the queue to use
`--time-limit`	`-t`	Set a time limit for the job in minutes
`--gpus-at-least`	`-g`	Specifies the total number of accelerators requested	Mutually exclusive with `-n` and `-N`
`--gpumem-at-least`		Constraint for accelerator memory needed (in GB)	System must be registered with launcher
`--exclusive`		Request exclusive access from the scheduler
`--local`		Run locally (one process without batch scheduler)
`--comm-backend`		Indicate primary communication protocol	Options: MPI, *CCL (NCCL, RCCL)
`--xargs`	`-x`	Specify scheduler and launch arguments	Format: `KEY=VALUE`

Notes on `--xargs`:

Will override any known key
Use format: --xargs k1=v1 k2=v2 or --xargs k1=v1 --xargs k2=v2
Double dash -- needed if this is the last argument
Arguments with leading tilde ~ will be removed if found

Schedule Options

Arguments that determine when a job will run.

Option	Description	Notes
`--bg`	Run job in background	Launcher won't wait for job start; uses timestamped directory by default
`--batch-script`	Launch a user-provided batch script
`--scheduler`	Override default batch scheduler	Options: None, local, LocalScheduler, flux, FluxScheduler, slurm, SlurmScheduler, lsf, LSFScheduler

Script Options

Batch scheduler script parameters.

Option	Short Form	Description	Notes
`--launch-dir`	`-l`	Control launch directory creation	See detailed behavior below
`--output-script`	`-o`	Output job setup script file	Uses temporary file if not specified
`--setup-only`		Only write job setup script without scheduling
`--dry-run`		Output results without side-effects
`--account`		Specify account/bank for the job
`--dependency`		Specify scheduler dependency
`--job-name`	`-J`	Specify job name
`--reservation`		Add reservation argument	Typically for DAT runs
`--save-hostlist`		Write hostlist to `hpc_launcher_hostlist.txt`

`--launch-dir` Behavior:

No argument: Creates timestamped launch directory
With argument: Creates directory named [LAUNCH_DIR]
Argument = ".": Creates launch script in current directory
Not set + blocking job: Runs without creating files
Not set + non-blocking job: Creates launch file and logs in current directory
Note: Double dash -- needed if this is the last argument

System Options

Provide system parameters from CLI - overrides built-in system descriptions and autodetection.

Option	Short Form	Description	Format
`--system-params`	`-p`	Specify system parameters	`KEY=VALUE` pairs

System Parameter Examples:

-p cores_per_node=128 gpus_per_node=8 gpu_arch=ampere mem_per_gpu=80 numa_domains=4 scheduler=slurm

Available parameters:

cores_per_node: Integer value for CPU cores per node
gpus_per_node: Integer value for GPUs per node
gpu_arch: String value for GPU architecture
mem_per_gpu: Float value for memory per GPU
numa_domains: Integer value for NUMA domains
scheduler: String value for scheduler type

Note: Double dash -- needed if this is the last argument

Logging Options

Control output and error logging.

Option	Description
`--out`	Capture standard output to a log file (console only if not specified)
`--err`	Capture standard error to a log file (console only if not specified)
`--color-stderr`	Use terminal colors to color stderr in red (doesn't affect output files)

Usage Examples

Basic Examples

# Simple single-node job
launch -N 1 -n 1 hostname

# Multi-node MPI job
launch -N 4 -n 2 ./mpi_application

# GPU job with 2 GPUs per process
launch -N 2 -n 2 --gpus-per-proc 2 ./gpu_application

Resource Specification

# Request specific number of GPUs total
launch -g 16 ./gpu_application

# Request allocation with at least 80GB GPU memory
launch --gpumem-at-least 80 ./memory_intensive_app

# Exclusive node access
launch -N 2 --exclusive ./exclusive_app

# Local execution without scheduler
launch -N 1 --local ./test_script.py

Job Scheduling

# Submit to specific queue with time limit
launch -q gpu_queue -t 120 -N 2 ./training_script.py

# Background job with custom name
launch --bg -J my_experiment -N 4 ./long_running_job

# Job with dependencies
launch --dependency afterok:12345 -N 1 ./dependent_job

# Use specific account/bank
launch --account project123 -N 2 ./billable_job

Communication Backend

# MPI backend
launch --comm-backend MPI -N 4 -n 4 ./mpi_app

# NCCL backend for GPU communication
launch --comm-backend NCCL -N 2 -n 2 --gpus-per-proc 2 ./gpu_training

Script and Directory Management

# Create timestamped launch directory
launch -l -N 2 ./my_job

# Use specific launch directory
launch -l experiment_001 -N 2 ./my_job

# Run in current directory
launch -l . -N 2 ./my_job

# Generate script without running
launch --setup-only -l -o job_script.sh -N 4 ./my_application

# Dry run to see what would be executed
launch --dry-run -N 4 -n 4 ./my_application

System Override Examples

# Override system detection
launch -p cores_per_node=64 gpus_per_node=4 -N 2 ./custom_job

# Multiple system parameters
launch -p gpu_arch=sm_90 mem_per_gpu=32 scheduler=slurm -N 2 ./gpu_job

Scheduler-Specific Arguments

# Multiple xargs
launch --xargs key1=val1 --xargs key2=val2 -N 2 ./job

Logging Configuration

# Capture output and error to files
launch -l --out output.log --err error.log -N 2 ./my_job

# Colored error output in terminal
launch --color-stderr -N 1 ./verbose_job

# Verbose mode with saved hostlist
launch -l --verbose --save-hostlist -N 4 ./debug_job

Complex Example

# Production job with all options
launch \
  --verbose \
  -N 8 \
  -n 4 \
  --gpus-per-proc 2 \
  -q production \
  -t 480 \
  --exclusive \
  --comm-backend NCCL \
  --bg \
  --scheduler slurm \
  -l production_run_001 \
  --account ml_project \
  -J "ResNet Training" \
  --save-hostlist \
  --out output.log \
  --err error.log \
  python train_resnet.py --epochs 100 --batch-size 256

Environment Variables

The launch command may set or use various environment variables depending on the scheduler and communication backend:

For MPI jobs: Standard MPI environment variables
For GPU jobs: CUDA-related environment variables
For NCCL/RCCL: Communication library environment variables

Exit Status

0: Successful execution
Non-zero: Error occurred (specific codes depend on failure type)

Tips and Best Practices

Use --dry-run first to verify your command before submitting large jobs
Use --verbose for debugging to see detailed information about job submission
Save scripts with --setup-only to review and reuse job configurations
Use timestamped directories (-l without argument) for experiment tracking
Specify --account for proper resource accounting on shared systems
Use --save-hostlist when you need to know which nodes were allocated
Set appropriate time limits with -t to avoid jobs being killed prematurely
Use --exclusive for performance-critical jobs to avoid interference

Scheduler Detection

The launcher automatically detects the available scheduler. Override with --scheduler if needed:

local: Run without a scheduler
slurm: SLURM workload manager
lsf: IBM Spectrum LSF
flux: Flux resource manager

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPC-launcher `launch` Command Documentation

Overview

Synopsis

Command Structure

Positional Arguments

Optional Arguments

General Options

Job Size Options

Notes on `--xargs`:

Schedule Options

Script Options

`--launch-dir` Behavior:

System Options

System Parameter Examples:

Logging Options

Usage Examples

Basic Examples

Resource Specification

Job Scheduling

Communication Backend

Script and Directory Management

System Override Examples

Scheduler-Specific Arguments

Logging Configuration

Complex Example

Environment Variables

Exit Status

Tips and Best Practices

Scheduler Detection

See Also

FilesExpand file tree

launch_cli.md

Latest commit

History

launch_cli.md

File metadata and controls

HPC-launcher launch Command Documentation

Overview

Synopsis

Command Structure

Positional Arguments

Optional Arguments

General Options

Job Size Options

Notes on --xargs:

Schedule Options

Script Options

--launch-dir Behavior:

System Options

System Parameter Examples:

Logging Options

Usage Examples

Basic Examples

Resource Specification

Job Scheduling

Communication Backend

Script and Directory Management

System Override Examples

Scheduler-Specific Arguments

Logging Configuration

Complex Example

Environment Variables

Exit Status

Tips and Best Practices

Scheduler Detection

See Also

HPC-launcher `launch` Command Documentation

Notes on `--xargs`:

`--launch-dir` Behavior: