Skip to content

Commit 1cc2f41

Browse files
committed
readme and some minor print statement fixes
1 parent d670950 commit 1cc2f41

File tree

3 files changed

+176
-39
lines changed

3 files changed

+176
-39
lines changed

README.md

Lines changed: 171 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,206 @@
1-
# python-batchtools
1+
python-batchtools
22

3-
## Set up your development environment
3+
## Overview
44

5-
### Install tools
5+
`python-batchtools` is a CLI for students and researchers to
6+
submit **GPU batch jobs** through **Kueue-managed GPU queues** on an
7+
OpenShift cluster. It provides an inexpensive and accessible way to use
8+
GPU hardware without reserving dedicated GPU nodes.
69

7-
1. Start by installing `uv`. Depending on your distribution, this may be as simple as:
10+
Users submit GPU jobs with a single command:
811

9-
```sh
10-
sudo dnf -y install uv
11-
```
12+
``` sh
13+
batchtools br "./cuda_program"
14+
```
15+
16+
The CLI automatically:
17+
- Creates the batch job<br>
18+
- Submits it to the appropriate Kueue-managed LocalQueue<br>
19+
- Tracks job status<br>
20+
- Streams logs on completion <br>
21+
22+
23+
# For Users
1224

13-
If you would like to run the latest version, you can install the command using `pipx`. First, install `pipx`:
25+
## Installation
1426

15-
```
16-
sudo dnf -y install pipx
17-
```
27+
### Option 1: Use the provided container image (recommended)
1828

19-
And then use `pipx` to install `uv`:
29+
### Option 2: Install from source
30+
31+
``` sh
32+
git clone https://github.com/memalhot/python-batchtools.git
33+
cd python-batchtools
34+
pip install -e .
35+
```
2036

21-
```
22-
pipx install uv
23-
```
2437

25-
2. Next, install `pre-commit`. As with `uv`, you can install this using your system package manager:
38+
## Prerequisites
2639

27-
```
28-
sudo dnf -y install pre-commit
29-
```
40+
1. A Kueue-enabled OpenShift cluster, with local-queues named: v100-localqueue, a100-localqueue, h100-localqueue, dummy-localqueue<br>
41+
2. An OpenShift account<br>
42+
3. The Python OpenShift client:
43+
44+
``` sh
45+
pip install openshift-client
46+
```
3047

31-
Or you can install a possibly more recent version using `pipx`:
48+
# Usage Examples
3249

33-
```
34-
pipx install pre-commit
35-
```
50+
For any command you can run:
51+
`batchtools <command> -h` or `batchtools <command> --help`
3652

53+
## **1. Submit a Batch Job --- `br`**
54+
The br command is how to submit batchjobs. It submits code intended to run on GPUs to the Kueue, where it is queued, then run, produces logs stored in the `RUNDIR`, and then deletes the job for resource conservation.
3755

38-
### Activate pre-commit
56+
Here's how to use thed br command:
3957

40-
Activate `pre-commit` for your working copy of this repository by running:
58+
First write a CUDA program and compile it :D
59+
Then to submit your CUDA program to the GPU node:
4160

61+
``` sh
62+
batchtools br "./cuda-code"
4263
```
43-
pre-commit install
64+
65+
Submit a program with arguments:
66+
67+
``` sh
68+
batchtools br './simulate --steps 1000'
69+
```
70+
71+
Specify GPU type:
72+
73+
``` sh
74+
batchtools br --gpu v100 "./train_model"
75+
```
76+
77+
Run without waiting for logs (for longer runs, similar to a more traditional batch system):
78+
79+
``` sh
80+
batchtools br --no-wait "./cuda_program"
4481
```
82+
***WARNING***
83+
If you run br with the --no-wait flag, it will not be cleaned up for you. You must delete it on your own by running `batchtools bd <job-name>` or `oc delete job <job-name>`
84+
But don't worry, running with --no-wait will give you a reminder to delete your jobs!
4585

46-
This will configure `.git/hooks/pre-commit` to run the `pre-commit` tool every time you make a commit. Running these tests locally ensures that your code is clean and that tests are passing before you share your code with others. To manually run all the checks:
86+
And if you need help or want to see more flas:
4787

88+
``` sh
89+
batchtools br --h
4890
```
49-
pre-commit run --all-files
91+
92+
93+
## **2. List Jobs --- `bj`**
94+
95+
List all jobs:
96+
97+
``` sh
98+
batchtools bj
5099
```
51100

52101

53-
### Install dependencies
102+
## **3. Delete Jobs --- `bd`**
54103

55-
To install the project dependencies, run:
104+
Delete all jobs:
56105

106+
``` sh
107+
batchtools bd
57108
```
58-
uv sync --all-extras
109+
110+
To delete specific jobs:
111+
112+
``` sh
113+
batchtools bd job-a job-b
59114
```
60115

61-
### Run tests
62116

63-
To run just the unit tests:
117+
## **4. List active GPU pods per node --- `bps`**
118+
119+
``` sh
120+
batchtools bps
121+
```
122+
123+
Output will be empty if all nodes are free.
124+
125+
If some nodes are busy:
126+
```
127+
wrk-4: BUSY 3 project-1/project-stuff testing/other-stuff test/fraud-detectiob
128+
```
129+
130+
To always ensure output, you can run:
64131

132+
``` sh
133+
batchtools --verbose bps
65134
```
135+
To get output like:
136+
```
137+
ctl-0: FREE
138+
ctl-1: FREE
139+
ctl-2: FREE
140+
wrk-0: FREE
141+
wrk-1: FREE
142+
wrk-3: FREE
143+
wrk-4: BUSY 3 project-1/project-stuff testing/other-stuff test/fraud-detection
144+
wrk-5: FREE
145+
wrk-6: FREE
146+
wrk-7: FREE
147+
148+
```
149+
150+
## **5. Show pod logs --- `bl`**
151+
152+
``` sh
153+
batchtools bl
154+
```
155+
156+
For a specific pod:
157+
``` sh
158+
batchtools bl pod-name
159+
```
160+
161+
162+
## **6. Show pod logs --- `bq`**
163+
164+
``` sh
165+
batchtools bq
166+
```
167+
168+
Output will look like:
169+
``` sh
170+
a100-clusterqueue admitted: 0 pending: 0 reserved: 0 GPUs: 0 BestEffortFIFO
171+
dummy-clusterqueue admitted: 0 pending: 0 reserved: 0 GPUs: 0 BestEffortFIFO
172+
h100-clusterqueue admitted: 0 pending: 0 reserved: 0 GPUs: 0 BestEffortFIFO
173+
v100-clusterqueue admitted: 0 pending: 0 reserved: 0 GPUs: 3 BestEffortFIFO
174+
```
175+
176+
# For Contributors
177+
178+
## Tools
179+
180+
Install uv:
181+
182+
``` sh
183+
pipx install uv
184+
```
185+
186+
Install pre-commit:
187+
188+
``` sh
189+
pipx install pre-commit
190+
```
191+
192+
Activate hooks:
193+
194+
``` sh
195+
pre-commit install
196+
```
197+
198+
## Running Tests
199+
200+
``` sh
66201
uv run pytest
67202
```
68203

69-
This will generate a test coverage report in `htmlcov/index.html`.
204+
Coverage report is generated at:
205+
206+
htmlcov/index.html

batchtools/br.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -228,11 +228,11 @@ def run(args: argparse.Namespace):
228228
oc_delete("job", job_name)
229229
else:
230230
print(
231-
f"User specified not to wait, or not to delete, so {job_name} must be deleted by user."
231+
f"User specified not to wait, or not to delete, so {job_name} must be deleted by user.\n"
232+
f"You can do this by running:\n"
233+
f" bd {job_name} OR\n"
234+
f" oc delete job {job_name}"
232235
)
233-
print("You can do this by running:")
234-
print(f"bd {job_name} OR ")
235-
print(f"oc delete job {job_name}")
236236

237237

238238
def get_pod_status(pod_name: str | None = None) -> str:

tests/test_bj.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def test_lists_all_jobs_and_count(capsys):
7070
assert sel.calls.count("jobs") == 1
7171

7272

73-
def test_lists_jobs_ignoring_workloads_or_kueue_details(capsys):
73+
def test_lists_jobs(capsys):
7474
"""
7575
Regression test: ensure that ListJobsCommand behavior is simple:
7676
it just lists whatever oc.selector("jobs").objects() returns,

0 commit comments

Comments
 (0)