Skip to content

feat(docker): bundle whisper.cpp + tiny model for out-of-the-box transcription#9

Merged
jkyberneees merged 2 commits into
mainfrom
feat/docker-bundle-whisper
Jun 6, 2026
Merged

feat(docker): bundle whisper.cpp + tiny model for out-of-the-box transcription#9
jkyberneees merged 2 commits into
mainfrom
feat/docker-bundle-whisper

Conversation

@jkyberneees
Copy link
Copy Markdown
Contributor

Summary

Makes voice transcription work out of the box in the Docker image. The transcribe tool and Telegram voice auto-transcription previously required the host to have whisper.cpp + a model installed (or a first-run download). Now the image ships everything: the whisper.cpp CLI, the ggml tiny model, and ffmpeg.

Changes

docker/Dockerfile

  • New whisper build stage compiles whisper-cli from source (same alpine base as runtime for musl ABI match; GGML_OPENMP=OFF so the only runtime shared lib needed is libstdc++) and fetches ggml-tiny.bin.
  • Model is overridable at build time: --build-arg WHISPER_MODEL=base|small|medium.
  • Runtime stage adds ffmpeg (OGG/Opus → WAV for whisper) + libstdc++, and copies whisper-cli onto PATH and the model to /usr/local/share/whisper/models/.

docker/config.restricted.json + docker/config.godmode.json

  • Add a transcription block: model: tiny, auto_transcribe: true, models_dir: /usr/local/share/whisper/models.

Docsdocker/README.md and docs/TELEGRAM.md document the bundled setup.

Why the model lives outside ~/.odek

The default model location is ~/.odek/whisper/models, but the Telegram compose profiles bind-mount ./.odek over /home/odek/.odek — which would shadow a model baked under ~/.odek, breaking exactly the profiles (voice notes) that need it. So the model is baked to a fixed image path (/usr/local/share/whisper/models) and the configs point transcription.models_dir there, keeping it visible across all four profiles.

Verification

Built the image locally and confirmed, running as the non-root odek user:

  • whisper-cli is on PATH and executes (ldd → only libstdc++/libgcc_s + musl)
  • the tiny model is readable at the configured path
  • ffmpeg is present
  • full end-to-end: generated an OGG/Opus clip → ffmpeg to WAV → whisper-cli --output-json inference with the bundled model, JSON written ✅

Image size impact: ~78 MB model + ~2.7 MB binary + ffmpeg/libstdc++.

🤖 Generated with Claude Code

jkyberneees and others added 2 commits June 6, 2026 08:48
…scription

Ship the whisper.cpp CLI, the ggml `tiny` model, and ffmpeg in the image so
the `transcribe` tool and Telegram voice auto-transcription work with zero
setup — no host install, no first-run model download.

Dockerfile:
- New `whisper` build stage compiles whisper-cli (OpenMP off, so the only
  runtime shared lib is libstdc++) and fetches ggml-tiny.bin. The model is
  overridable via `--build-arg WHISPER_MODEL=base|small|medium`.
- Runtime stage installs ffmpeg + libstdc++ and copies whisper-cli onto PATH
  and the model to /usr/local/share/whisper/models/.

The model is baked OUTSIDE ~/.odek on purpose: the Telegram compose profiles
bind-mount ./.odek over /home/odek/.odek, which would shadow a model placed
under the default ~/.odek/whisper/models. A fixed image path keeps it visible
across all four profiles.

Config:
- config.restricted.json and config.godmode.json now set a transcription block
  (model: tiny, auto_transcribe: true, models_dir pointing at the baked path).

Docs: document bundled transcription in docker/README.md and docs/TELEGRAM.md.

Verified by building the image and running OGG->WAV->whisper inference with the
bundled model as the non-root odek user.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clone a tagged release via a WHISPER_VERSION build arg instead of tracking
master, so the bundled whisper-cli is reproducible and can't drift/break in
CI on an upstream change. Verified with a no-cache build (git describe →
v1.8.6, whisper-cli + ggml-tiny.bin produced).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees
Copy link
Copy Markdown
Contributor Author

Pre-merge verification pass ✅

Re-verified end to end before merge:

Build (reproducible)

  • No-cache build of the pinned whisper stage → git describe = v1.8.6; produced whisper-cli (2.7 MB) + ggml-tiny.bin (77 MB).
  • Pinned to a tagged release (WHISPER_VERSION=v1.8.6) instead of tracking master, so the build can't drift in CI.

Runtime (as non-root odek user)

  • whisper-cli on PATH and executes; ldd shows only libstdc++/libgcc_s + musl (both covered by the libstdc++ apk pkg).
  • ffmpeg present; model readable at /usr/local/share/whisper/models/ggml-tiny.bin.
  • Full inference smoke test: OGG/Opus → ffmpeg → WAV → whisper-cli --output-json with the bundled model ✅

Config delivery (compose)

  • Web (godmode): config.json carries the transcription block (models_dir → baked path); model present; whisper resolves.
  • Telegram: ./.odek bind-mount is active and the model is not shadowed (it lives outside .odek) — confirming the design choice.

Unit tests

  • cmd/odek transcribe tests pass, incl. TestTranscribe_MockHappyPath (models_dir + model: tiny → resolves ggml-tiny.bin), validating the exact resolution path the shipped config relies on.

@jkyberneees jkyberneees merged commit d4d0bc5 into main Jun 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant