Skip to content

v1.3.0

Latest

Choose a tag to compare

@github-actions github-actions released this 07 Jun 19:02
903e453

Highlights

odek can now see. This release adds local image and video understanding through a new vision tool powered by MiniCPM-V 4.6 — a compact 1.3B multimodal model that runs entirely on your own machine via llama-mtmd-cli. No cloud API, no keys, no per-image cost. The model and runtime are baked straight into the container image, so it works out of the box: hand the agent an image and it describes what's there (including any visible text); hand it a video and odek samples frames and reasons over them together.

The Telegram bot gets the biggest day-to-day upgrade from this. Send it a photo and it now describes the image first, then answers — and if you add a caption like "what does this error say?" or "is this safe to eat?", that caption becomes your question and focuses what the model looks at. Two papercuts are gone too: photos no longer collide onto the same filename (so the bot stops saying "already processed" when you send a second image), and image text is wrapped as untrusted content so a screenshot can't smuggle instructions into the agent.

Under the hood, the container build was hardened so the vision binary links cleanly against the runtime image (built from source to avoid a glibc mismatch), keeping the zero-setup promise intact across amd64 and arm64.


What's Changed

  • feat(docs): hero "zero → telegram" carousel + doc-link refresh by @jkyberneees in #20
  • feat(vision): add vision tool using MiniCPM-V 4.6 (1.3B) via llama-mtmd-cli by @jkyberneees in #22
  • feat(telegram): unique photo filenames + caption-aware auto-vision by @jkyberneees in #23

Full Changelog: v1.2.0...v1.3.0