Skip to content

Commit a05ef6e

Browse files
Unified Docker image with CI auto-build (#17)
* Unified Docker image with CI auto-build on merge to main - Simplify Dockerfile to a single image (no multi-target builds) - Uses RunPod base entrypoint (SSH + Jupyter) — run experiments via SSH - Add .github/workflows/docker.yml: builds and pushes to Docker Hub on every merge to main, tagged as :latest and :sha - Uses GitHub Actions cache for Docker layer reuse - Update CLAUDE.md to reflect the new workflow Requires DOCKERHUB_USERNAME and DOCKERHUB_TOKEN secrets in GitHub. * Address review: fix broken COPYs, .dockerignore, GIT_TAG, workflow_dispatch - Remove COPY data/ (doesn't exist in repo) - Simplify .dockerignore to allow deploy/, docs/, cards/ into build context - Fix GIT_TAG: use git tag --points-at HEAD instead of github.ref_name - Add workflow_dispatch trigger for manual rebuilds
1 parent b06963e commit a05ef6e

File tree

4 files changed

+68
-69
lines changed

4 files changed

+68
-69
lines changed

.dockerignore

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,13 @@
11
.git
22
.venv
3+
.env
34
__pycache__
45
*.pyc
56
*.pyo
67
*.egg-info
78
.mypy_cache
89
engine/target
10+
*.so
911
data/
1012
checkpoints/
1113
logs/
12-
deploy/
13-
!deploy/entrypoint-run.sh
14-
!deploy/entrypoint-extract.sh
15-
!deploy/entrypoint-lc0.sh
16-
!deploy/entrypoint-rosa-sweep.sh
17-
!deploy/entrypoint-lc0-selfplay.sh
18-
!deploy/entrypoint-lichess-parquet.sh
19-
*.so
20-
CLAUDE.md
21-
docs/

.github/workflows/docker.yml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: Docker
2+
3+
on:
4+
push:
5+
branches: [main]
6+
workflow_dispatch:
7+
8+
env:
9+
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
10+
IMAGE: thomasschweich/pawn
11+
12+
jobs:
13+
build-and-push:
14+
runs-on: ubuntu-latest
15+
steps:
16+
- uses: actions/checkout@v4
17+
with:
18+
fetch-depth: 0
19+
20+
- name: Resolve git tag
21+
id: meta
22+
run: echo "tag=$(git tag --points-at HEAD | head -1)" >> "$GITHUB_OUTPUT"
23+
24+
- name: Set up Docker Buildx
25+
uses: docker/setup-buildx-action@v3
26+
27+
- name: Log in to Docker Hub
28+
uses: docker/login-action@v3
29+
with:
30+
username: ${{ secrets.DOCKERHUB_USERNAME }}
31+
password: ${{ secrets.DOCKERHUB_TOKEN }}
32+
33+
- name: Build and push
34+
uses: docker/build-push-action@v6
35+
with:
36+
context: .
37+
push: true
38+
platforms: linux/amd64
39+
tags: |
40+
${{ env.IMAGE }}:latest
41+
${{ env.IMAGE }}:${{ github.sha }}
42+
build-args: |
43+
GIT_HASH=${{ github.sha }}
44+
GIT_TAG=${{ steps.meta.outputs.tag }}
45+
cache-from: type=gha
46+
cache-to: type=gha,mode=max

CLAUDE.md

Lines changed: 6 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -221,27 +221,19 @@ from the trainer. Load via HF repo ID (e.g. `--checkpoint thomas-schweich/pawn-b
221221

222222
## RunPod Operations
223223

224-
### Docker Build & Push
224+
### Docker Image
225225

226-
```bash
227-
# Build runner target (auto-stop after training completes)
228-
docker build --platform linux/amd64 \
229-
--build-arg GIT_HASH=$(git rev-parse HEAD) \
230-
--build-arg GIT_TAG=$(git tag --points-at HEAD) \
231-
--target runner \
232-
-t thomasschweich/pawn:latest-runner .
226+
A single Docker image (`thomasschweich/pawn:latest`) is **automatically built and pushed to Docker Hub by CI** on every merge to main. No manual builds needed.
227+
228+
The image is based on `runpod/pytorch` (CUDA + SSH + Jupyter) with all Python deps pre-installed. Code lives at `/opt/pawn` on pods. SSH in and run experiments directly.
233229

234-
# Build interactive target (SSH + Jupyter, stays alive)
230+
To build locally (rarely needed):
231+
```bash
235232
docker build --platform linux/amd64 \
236233
--build-arg GIT_HASH=$(git rev-parse HEAD) \
237-
--target interactive \
238234
-t thomasschweich/pawn:latest .
239-
240-
docker push thomasschweich/pawn:latest-runner
241235
```
242236

243-
Code lives at `/opt/pawn` on pods (outside the `/workspace` volume mount).
244-
245237
### Pod Lifecycle
246238

247239
Use `deploy/pod.sh` for all pod management. Requires `runpodctl` (`wget -qO- cli.runpod.net | sudo bash`).

Dockerfile

Lines changed: 14 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,13 @@
1-
# PAWN Container for Runpod
1+
# PAWN — single image for all RunPod workloads
22
#
3-
# Uses runpod/base (CUDA + SSH + Jupyter, no PyTorch) with uv for
4-
# reproducible Python dependency management from the lockfile.
3+
# Built automatically by CI on merge to main and pushed to Docker Hub.
4+
# Uses the RunPod base image (CUDA + SSH + Jupyter) with uv for
5+
# reproducible Python dependency management.
56
#
6-
# Build targets:
7-
# interactive (default) — SSH + Jupyter, stays alive
8-
# runner — runs a command then exits (pod auto-stops)
9-
# rosa-sweep — runs RoSA ablation sweeps then exits
7+
# Usage: create a RunPod template pointing at thomasschweich/pawn:latest,
8+
# SSH in, and run experiments. Code lives at /opt/pawn.
109
#
11-
# Build:
12-
# docker build --platform linux/amd64 \
13-
# --build-arg GIT_HASH=$(git rev-parse HEAD) \
14-
# --build-arg GIT_TAG=$(git tag --points-at HEAD) \
15-
# [--target runner] \
16-
# -t pawn:<tag> .
17-
#
18-
# IMPORTANT: Always attach a Runpod network volume. Checkpoints use
19-
# atomic directory writes (tmp + rename) that require persistent disk.
20-
# Set HF_TOKEN as a pod env var for HuggingFace checkpoint push.
10+
# IMPORTANT: Always attach a network volume. Set HF_TOKEN as a pod env var.
2111

2212
# ── Builder: compile Rust engine wheel ───────────────────────────────
2313
FROM python:3.12-slim AS builder
@@ -40,8 +30,8 @@ COPY scripts/ scripts/
4030

4131
RUN cd engine && uv run --no-project --with maturin maturin build --release
4232

43-
# ── Runtime base (shared by all targets) ─────────────────────────────
44-
FROM runpod/pytorch:1.0.3-cu1281-torch280-ubuntu2404 AS runtime-base
33+
# ── Runtime ──────────────────────────────────────────────────────────
34+
FROM runpod/pytorch:1.0.3-cu1281-torch280-ubuntu2404
4535

4636
ENV PYTHONUNBUFFERED=1 \
4737
UV_LINK_MODE=copy
@@ -56,12 +46,12 @@ COPY pyproject.toml uv.lock ./
5646
COPY pawn/ pawn/
5747
COPY scripts/ scripts/
5848
COPY tests/ tests/
49+
COPY deploy/ deploy/
50+
COPY docs/ docs/
51+
COPY cards/ cards/
5952

60-
# Install engine wheel first
53+
# Install engine wheel, then sync Python deps from lockfile
6154
COPY --from=builder /build/engine/target/wheels/*.whl /tmp/
62-
63-
# Create venv with system packages (picks up pre-installed torch + CUDA)
64-
# and install remaining deps from lockfile
6555
RUN uv venv --system-site-packages && \
6656
uv sync --extra cu128 --no-dev --frozen --no-install-workspace && \
6757
uv pip install /tmp/*.whl && rm -rf /tmp/*.whl
@@ -80,25 +70,4 @@ RUN echo "export PYTHONPATH=/opt/pawn" >> /etc/environment && \
8070
echo 'export PATH="/opt/pawn/.venv/bin:$PATH"' >> /etc/environment && \
8171
cat /etc/environment >> /root/.bashrc
8272

83-
# ── Runner — executes command then exits (pod auto-stops) ────────────
84-
FROM runtime-base AS runner
85-
COPY deploy/entrypoint-run.sh /entrypoint-run.sh
86-
RUN chmod +x /entrypoint-run.sh
87-
ENTRYPOINT ["/entrypoint-run.sh"]
88-
89-
# ── RoSA sweep — runs all three ablation sweeps then exits ───────────
90-
FROM runtime-base AS rosa-sweep
91-
COPY deploy/entrypoint-rosa-sweep.sh /entrypoint-rosa-sweep.sh
92-
RUN chmod +x /entrypoint-rosa-sweep.sh
93-
ENTRYPOINT ["/entrypoint-rosa-sweep.sh"]
94-
95-
# ── Lichess extract — downloads PGN, writes Parquet, pushes to HF ───
96-
FROM runtime-base AS lichess-extract
97-
RUN pip install --no-cache-dir zstandard
98-
COPY deploy/entrypoint-lichess-parquet.sh /entrypoint-lichess-parquet.sh
99-
RUN chmod +x /entrypoint-lichess-parquet.sh
100-
ENTRYPOINT ["/entrypoint-lichess-parquet.sh"]
101-
102-
# ── Interactive (default) — SSH + Jupyter, stays alive ───────────────
103-
FROM runtime-base AS interactive
104-
# Inherits /start.sh entrypoint from Runpod base image
73+
# Inherits /start.sh entrypoint from RunPod base image (SSH + Jupyter)

0 commit comments

Comments
 (0)