Skip to content

Commit df21329

Browse files
Add Phase 3.1: friendly errors, soup doctor, soup quickstart, UX polish (v0.3.1)
- Friendly error messages: wrap all commands in try/except, map known errors (CUDA OOM, missing deps, connection errors) to 2-3 line messages with fix hints - Global --verbose flag for full tracebacks - soup doctor: check system info, GPU, all dependency versions with fix suggestions - soup quickstart: one-command demo (creates data + config + trains TinyLlama) - Confirmation prompts before train/sweep (skip with --yes) - 40 new tests (321 total), all passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8cd1916 commit df21329

File tree

14 files changed

+1010
-10
lines changed

14 files changed

+1010
-10
lines changed

CLAUDE.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,14 @@ soup train --config soup.yaml
7171

7272
**DeepSpeed:** `utils/deepspeed.py` provides ZeRO Stage 2/3 config templates. `commands/train.py` supports `--deepspeed zero2|zero3|zero2_offload|<path>`. Trainers (SFT/DPO) pass `deepspeed` to HF TrainingArguments. Requires `pip install 'soup-cli[deepspeed]'`.
7373

74+
**Error handling:** `utils/errors.py` maps known exceptions (CUDA OOM, missing deps, connection errors, validation errors) to friendly 2-3 line messages with fix suggestions. `cli.py` wraps all commands in a try/except and uses `--verbose` flag for full tracebacks.
75+
76+
**Doctor:** `commands/doctor.py` checks system info, GPU availability, and all dependency versions. Reports missing/outdated packages with fix suggestions.
77+
78+
**Quickstart:** `commands/quickstart.py` runs a complete demo — creates 20-example alpaca dataset, TinyLlama config, and trains a LoRA adapter. Supports `--dry-run` to create files only.
79+
80+
**Confirmation prompts:** `commands/train.py` and `commands/sweep.py` ask for confirmation before starting. Skip with `--yes` / `-y`.
81+
7482
## Code Conventions
7583

7684
- **Line length:** 100 chars (ruff enforced)
@@ -99,7 +107,7 @@ soup train --config soup.yaml
99107

100108
## Tests
101109

102-
Test suite (~281 tests) lives in `tests/`:
110+
Test suite (~321 tests) lives in `tests/`:
103111

104112
| File | Covers |
105113
|---|---|
@@ -128,3 +136,6 @@ Test suite (~281 tests) lives in `tests/`:
128136
| `test_sweep.py` | Sweep params parsing, combinations, nested config |
129137
| `test_diff.py` | Diff prompts collection, metrics, CLI |
130138
| `test_deepspeed.py` | DeepSpeed configs, multi-GPU detection, trainer integration |
139+
| `test_errors.py` | Friendly error messages, --verbose flag, error mapping |
140+
| `test_doctor.py` | `soup doctor` command, version checking, dependency table |
141+
| `test_quickstart.py` | `soup quickstart` demo, data/config creation, --dry-run |

README.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<a href="https://pypi.org/project/soup-cli/"><img src="https://img.shields.io/pypi/v/soup-cli?color=blue" alt="PyPI"></a>
2222
<img src="https://img.shields.io/badge/python-3.9%2B-blue" alt="Python 3.9+">
2323
<img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License">
24-
<img src="https://img.shields.io/badge/tests-281%20passed-brightgreen" alt="Tests">
24+
<img src="https://img.shields.io/badge/tests-321%20passed-brightgreen" alt="Tests">
2525
<a href="https://github.com/MakazhanAlpamys/Soup/actions"><img src="https://github.com/MakazhanAlpamys/Soup/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
2626
</p>
2727

@@ -324,6 +324,43 @@ soup train --config soup.yaml --deepspeed zero2_offload
324324
soup train --config soup.yaml --deepspeed ./my_ds_config.json
325325
```
326326

327+
## Quickstart Demo
328+
329+
Run a complete demo in one command — creates sample data, config, and trains a tiny model:
330+
331+
```bash
332+
# Full demo (creates data + config + trains TinyLlama)
333+
soup quickstart
334+
335+
# Just create files without training
336+
soup quickstart --dry-run
337+
338+
# Skip confirmation
339+
soup quickstart --yes
340+
```
341+
342+
## Health Check
343+
344+
Check your environment for compatibility issues:
345+
346+
```bash
347+
soup doctor
348+
```
349+
350+
Shows: Python version, GPU availability, all dependency versions, and fix suggestions.
351+
352+
## Error Handling
353+
354+
Soup shows friendly error messages by default (2-3 lines with a fix suggestion). For full tracebacks:
355+
356+
```bash
357+
# Any command with --verbose
358+
soup train --config soup.yaml --verbose
359+
360+
# Global flag works with all commands
361+
soup --verbose eval --model ./output --benchmarks mmlu
362+
```
363+
327364
## Data Formats
328365

329366
Soup supports these formats (auto-detected):
@@ -429,6 +466,10 @@ soup eval --model ./output --benchmarks mmlu --run-id run_20260223_143052_a1b2
429466
| Hyperparameter sweep (grid/random) ||
430467
| Model comparison (diff) ||
431468
| Multi-GPU / DeepSpeed ||
469+
| Friendly error messages ||
470+
| Health check (soup doctor) ||
471+
| Quickstart demo ||
472+
| Confirmation prompts ||
432473
| Web dashboard | 🔜 |
433474
| Cloud mode (BYOG) | 🔜 |
434475

@@ -459,7 +500,10 @@ soup sweep --config soup.yaml --param lr=... Hyperparameter search
459500
soup diff --model-a ./a --model-b ./b Compare two models
460501
soup data generate --prompt "..." --count 100 Generate synthetic data
461502
soup train --deepspeed zero2 Multi-GPU with DeepSpeed
503+
soup doctor Check environment & dependencies
504+
soup quickstart [--dry-run] Full demo: create data + config + train
462505
soup version Show version
506+
soup --verbose <command> Show full traceback on errors
463507
```
464508

465509
## Requirements
@@ -478,7 +522,7 @@ pip install -e ".[dev]"
478522
# Lint
479523
ruff check soup_cli/ tests/
480524

481-
# Run unit tests (fast, no GPU needed — 281 tests)
525+
# Run unit tests (fast, no GPU needed — 321 tests)
482526
pytest tests/ -v
483527

484528
# Run smoke tests (downloads tiny model, runs real training)

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "soup-cli"
7-
version = "0.3.0"
7+
version = "0.3.1"
88
description = "Fine-tune LLMs in one command. No SSH, no config hell."
99
readme = "README.md"
1010
license = "MIT"
@@ -48,7 +48,7 @@ generate = ["httpx>=0.24.0"]
4848
deepspeed = ["deepspeed>=0.12.0"]
4949

5050
[project.scripts]
51-
soup = "soup_cli.cli:app"
51+
soup = "soup_cli.cli:run"
5252

5353
[project.urls]
5454
Homepage = "https://github.com/MakazhanAlpamys/Soup"

soup_cli/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
"""Soup CLI — Fine-tune LLMs in one command."""
22

3-
__version__ = "0.3.0"
3+
__version__ = "0.3.1"

soup_cli/__main__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""Allow running as `python -m soup_cli`."""
22

3-
from soup_cli.cli import app
3+
from soup_cli.cli import run
44

5-
app()
5+
run()

soup_cli/cli.py

Lines changed: 44 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
"""Main CLI entry point — all commands registered here."""
22

3+
import sys
4+
35
import typer
46
from rich.console import Console
57

@@ -19,9 +21,14 @@
1921
sweep,
2022
train,
2123
)
24+
from soup_cli.commands import doctor as doctor_cmd
25+
from soup_cli.commands import quickstart as quickstart_cmd
2226

2327
console = Console()
2428

29+
# Global verbose flag — set via callback, read by error handler
30+
_verbose = False
31+
2532
app = typer.Typer(
2633
name="soup",
2734
help="Fine-tune LLMs in one command. No SSH, no config hell.",
@@ -45,6 +52,8 @@
4552
app.command()(serve.serve)
4653
app.command()(sweep.sweep)
4754
app.command(name="diff")(diff.diff)
55+
app.command()(doctor_cmd.doctor)
56+
app.command()(quickstart_cmd.quickstart)
4857

4958
# Register data generate as a subcommand of data
5059
data.app.command(name="generate")(generate.generate)
@@ -57,6 +66,39 @@ def version():
5766

5867

5968
@app.callback(invoke_without_command=True)
60-
def main(ctx: typer.Context):
69+
def main(
70+
ctx: typer.Context,
71+
verbose: bool = typer.Option(
72+
False,
73+
"--verbose",
74+
"-V",
75+
help="Show full traceback on errors",
76+
),
77+
):
6178
"""Soup — fine-tune LLMs in one command."""
62-
pass
79+
global _verbose
80+
_verbose = verbose
81+
82+
83+
def run():
84+
"""Entry point with friendly error handling."""
85+
try:
86+
app()
87+
except SystemExit:
88+
raise
89+
except typer.Exit:
90+
raise
91+
except KeyboardInterrupt:
92+
console.print("\n[yellow]Interrupted.[/]")
93+
sys.exit(130)
94+
except Exception as exc:
95+
from soup_cli.utils.errors import format_friendly_error
96+
97+
format_friendly_error(exc, verbose=_verbose)
98+
sys.exit(1)
99+
100+
101+
# When invoked via `soup` entry point, use run() for error handling.
102+
# When invoked via `python -m soup_cli`, __main__.py calls run() directly.
103+
if __name__ == "__main__":
104+
run()

soup_cli/commands/doctor.py

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
"""soup doctor — check dependency compatibility and system health."""
2+
3+
import platform
4+
import sys
5+
6+
from rich.console import Console
7+
from rich.panel import Panel
8+
from rich.table import Table
9+
10+
console = Console()
11+
12+
# Dependencies to check: (import_name, package_name, min_version, required)
13+
DEPS = [
14+
("torch", "torch", "2.0.0", True),
15+
("transformers", "transformers", "4.36.0", True),
16+
("peft", "peft", "0.7.0", True),
17+
("trl", "trl", "0.7.0", True),
18+
("datasets", "datasets", "2.14.0", True),
19+
("bitsandbytes", "bitsandbytes", "0.41.0", True),
20+
("accelerate", "accelerate", "0.25.0", True),
21+
("pydantic", "pydantic", "2.0.0", True),
22+
("typer", "typer", "0.9.0", True),
23+
("rich", "rich", "13.0.0", True),
24+
("yaml", "pyyaml", "6.0", True),
25+
("plotext", "plotext", "5.2.0", True),
26+
# Optional
27+
("fastapi", "fastapi", "0.104.0", False),
28+
("uvicorn", "uvicorn", "0.24.0", False),
29+
("datasketch", "datasketch", "1.6.0", False),
30+
("lm_eval", "lm-eval", "0.4.0", False),
31+
("wandb", "wandb", "0.15.0", False),
32+
("deepspeed", "deepspeed", "0.12.0", False),
33+
("httpx", "httpx", "0.24.0", False),
34+
]
35+
36+
37+
def doctor():
38+
"""Check system dependencies, GPU, and compatibility."""
39+
console.print("[bold]Soup Doctor[/] — checking your environment...\n")
40+
41+
# System info
42+
console.print(
43+
Panel(
44+
f"Python: [bold]{sys.version.split()[0]}[/]\n"
45+
f"Platform: [bold]{platform.system()} {platform.release()}[/]\n"
46+
f"Arch: [bold]{platform.machine()}[/]",
47+
title="System",
48+
)
49+
)
50+
51+
# GPU check
52+
_check_gpu()
53+
54+
# Dependencies table
55+
table = Table(title="Dependencies")
56+
table.add_column("Package", style="bold")
57+
table.add_column("Required", justify="center")
58+
table.add_column("Installed", justify="center")
59+
table.add_column("Min Version")
60+
table.add_column("Status")
61+
62+
issues = []
63+
64+
for import_name, pkg_name, min_ver, required in DEPS:
65+
try:
66+
mod = __import__(import_name)
67+
version = getattr(mod, "__version__", getattr(mod, "VERSION", "?"))
68+
version_str = str(version)
69+
70+
if _version_ok(version_str, min_ver):
71+
status = "[green]OK[/]"
72+
else:
73+
status = f"[yellow]outdated (need >={min_ver})[/]"
74+
issues.append(f"Upgrade {pkg_name}: pip install '{pkg_name}>={min_ver}'")
75+
76+
table.add_row(
77+
pkg_name,
78+
"yes" if required else "optional",
79+
version_str,
80+
f">={min_ver}",
81+
status,
82+
)
83+
except ImportError:
84+
if required:
85+
status = "[red]MISSING[/]"
86+
issues.append(f"Install {pkg_name}: pip install '{pkg_name}>={min_ver}'")
87+
else:
88+
status = "[dim]not installed[/]"
89+
90+
table.add_row(
91+
pkg_name,
92+
"yes" if required else "optional",
93+
"—",
94+
f">={min_ver}",
95+
status,
96+
)
97+
98+
console.print(table)
99+
100+
# Summary
101+
if issues:
102+
console.print(f"\n[yellow]Found {len(issues)} issue(s):[/]")
103+
for issue in issues:
104+
console.print(f" [red]>[/] {issue}")
105+
console.print("\n[dim]Fix all: pip install -U " + " ".join(
106+
f"'{pkg_name}>={min_ver}'"
107+
for _, pkg_name, min_ver, required in DEPS
108+
if required
109+
) + "[/]")
110+
else:
111+
console.print("\n[bold green]All checks passed![/] Your environment is ready.")
112+
113+
114+
def _check_gpu():
115+
"""Check GPU availability and display info."""
116+
try:
117+
import torch
118+
119+
if torch.cuda.is_available():
120+
gpu_count = torch.cuda.device_count()
121+
gpus = []
122+
for idx in range(gpu_count):
123+
name = torch.cuda.get_device_name(idx)
124+
mem = torch.cuda.get_device_properties(idx)
125+
total_gb = getattr(mem, "total_memory", getattr(mem, "total_mem", 0))
126+
total_gb = total_gb / (1024 ** 3)
127+
gpus.append(f" GPU {idx}: [bold]{name}[/] ({total_gb:.1f} GB)")
128+
gpu_info = "\n".join(gpus)
129+
cuda_ver = torch.version.cuda or "N/A"
130+
console.print(
131+
Panel(
132+
f"CUDA: [bold green]available[/] (v{cuda_ver})\n"
133+
f"GPUs: [bold]{gpu_count}[/]\n{gpu_info}",
134+
title="GPU",
135+
)
136+
)
137+
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
138+
console.print(
139+
Panel(
140+
"Backend: [bold green]MPS (Apple Silicon)[/]\n"
141+
"Status: [bold green]available[/]",
142+
title="GPU",
143+
)
144+
)
145+
else:
146+
console.print(
147+
Panel(
148+
"Backend: [bold yellow]CPU only[/]\n"
149+
"Warning: Training will be slow without GPU.",
150+
title="GPU",
151+
)
152+
)
153+
except ImportError:
154+
console.print(
155+
Panel(
156+
"Backend: [red]unknown (torch not installed)[/]",
157+
title="GPU",
158+
)
159+
)
160+
161+
162+
def _version_ok(installed: str, minimum: str) -> bool:
163+
"""Check if installed version meets minimum requirement."""
164+
try:
165+
inst_parts = [int(x) for x in installed.split(".")[:3]]
166+
min_parts = [int(x) for x in minimum.split(".")[:3]]
167+
# Pad to same length
168+
while len(inst_parts) < 3:
169+
inst_parts.append(0)
170+
while len(min_parts) < 3:
171+
min_parts.append(0)
172+
return inst_parts >= min_parts
173+
except (ValueError, AttributeError):
174+
return True # Can't parse, assume OK

0 commit comments

Comments
 (0)