macOS menubar app + CLI for running local LLMs via mlx-lm on Apple Silicon.
Three targets share one repository:
- OllmlxCore — shared framework, no UI dependencies
- OllmlxApp — menubar app (AppKit + SwiftUI)
- ollmlx — CLI (ArgumentParser, no AppKit)
The daemon (ServerManager) spawns and monitors mlx_lm.server as a child process. The CLI and menubar app communicate with the daemon exclusively via HTTP on localhost:11435. The public OpenAI-compatible API is exposed on localhost:11434 via a transparent proxy. Ollama-compatible endpoints (/api/tags, /api/version, /api/chat, /api/generate) are also served on :11434 so clients like Open WebUI can connect natively.
Full implementation plan: ollmlx-implementation-plan.md
# Open in Xcode 15+
open ollmlx.xcodeproj
# Bootstrap Python venv (run once before testing)
bash Scripts/install_mlx_lm.sh
# Build CLI from terminal
swift build --target ollmlx
# Build all targets (SPM)
swift build
# Regenerate Xcode project after changing project.yml
xcodegen generate
# Release build with code signing
xcodebuild -scheme OllmlxApp -configuration Release \
-destination "platform=macOS" \
DEVELOPMENT_TEAM=M4RUJ7W6MP \
build
# Build DMG for distribution
bash Scripts/build_dmg.shWork through phases in order. Do not start the next phase until all completion criteria for the current phase are checked off. Each phase is documented in full in ollmlx-implementation-plan.md.
- Core foundation — OllmlxCore compiles; ServerManager spawns/kills mlx_lm.server
- Control API daemon — DaemonServer serves all five
/control/*endpoints on:11435 - Proxy server —
:11434transparently forwards to mlx_lm.server's ephemeral port - CLI — All nine commands work against a running daemon
- Menubar app — Full menubar UI backed by live ServerManager state
- Settings — All settings persist and take effect immediately
- Bootstrap and CLI installation — First-launch experience; CLI symlink to
/usr/local/bin - Polish and distribution — Codesigning, notarisation, Sparkle, DMG
| Field | Value |
|---|---|
| Bundle ID | com.darrylmorley.ollmlx |
| Team ID | M4RUJ7W6MP |
| Signing identity | Developer ID Application: Darryl Morley (M4RUJ7W6MP) |
| Sparkle feed | https://github.com/darrylmorley/ollmlx/releases/latest/download/appcast.xml |
| Sparkle EdDSA key | m/WL9PKIyMMY1Nx5dL9RzE3GqA+3FlR6OiWTC1IyCfA= |
Code signing is configured per-build-configuration in project.yml:
- Debug:
CODE_SIGN_STYLE: Automatic, identity-(ad-hoc) - Release:
CODE_SIGN_STYLE: Manual, identityDeveloper ID Application
After editing project.yml, always run xcodegen generate to rebuild ollmlx.xcodeproj.
ServerManageris@MainActor— all state mutations happen on the main actorOllmlxConfiguses rawUserDefaults(not@AppStorage) so it is safe to read from CLI contexts and background tasks- CLI commands run in async contexts — use
Task { @MainActor in }when touchingServerManagerfrom a non-main-actor context - Never access
ServerManager.sharedfrom background threads withoutawait
OllmlxCoremust have ZERO AppKit or SwiftUI imports — it must compile as a CLI dependency- The CLI target must have ZERO AppKit imports
- Shared types (models, errors, config) belong in
OllmlxCore @AppStorageis banned inOllmlxCore— useOllmlxConfiginstead
- Single instance:
applicationDidFinishLaunchingmust checkNSRunningApplication.runningApplications(withBundleIdentifier:)— ifcount > 1, activate the existing instance and terminate self - Daemon auto-start:
DaemonServerandProxyServerare started automatically inapplicationDidFinishLaunchingviaTask.detached - Bootstrap detection: Check both
OllmlxConfig.pythonPathAND the default venv path~/.ollmlx/venv/bin/python— if either exists on disk, skip bootstrap and set config - Settings window: Always open as a standalone
NSWindowviaAppDelegate— never as a.sheet()on the NSPopover (sheets on popovers deadlock the entire app, and crash on macOS 26 Tahoe) - Pull Model window: Same as Settings — always open as a standalone
NSWindowviaAppDelegatenotification (.openPullModel), never as a.sheet()on the popover - Model selector: The dropdown only updates a
@Stateselection — it must never callServerManager.start(). Starting/switching happens only when the user clicks the Start button.MenuBarViewuses.onChange(of: serverManager.state)to sync the selected model when external clients trigger a model switch
- API key always stored in Keychain via
Keychain.swift— never in UserDefaults allowExternalConnections = truerequires a non-nil API key — enforce inDaemonServerbefore binding to0.0.0.0- The proxy on
:11434must validate the API key before forwarding, not after
- Always send SIGINT first, wait 5 seconds, then SIGKILL — never go straight to SIGKILL
waitForServer()must checkprocess.isRunningon every poll iteration — fast-fail immediately on process death, don't wait out the full 120-second timeoutallocateEphemeralPort()must bind to port 0, read the assigned port, then close the socket before returning — never hardcode internal portsModelStore.pull()must use$VENV/bin/huggingface-cli(with fallback to$VENV/bin/hf) — neverhuggingface-clifrom PATHinstall_mlx_lm.shinstallshuggingface-hub(without[cli]extra) —huggingface-hub >=1.8.0installs the CLI ashfnothuggingface-cli, so the bootstrap script checks for both and creates a symlink if neededServerManager.start()must callModelStore.isModelCached()before spawning — throwServerError.modelNotFoundif the model is not in the local cache
ProxyServermust return503 Service Unavailable(not hang) when no upstream is set- Streaming responses (
text/event-stream) must be forwarded chunk-by-chunk — no full response buffering setUpstream(port:)must be atomic — use a lock or actor to prevent races between old and new upstream- Model switches are serialized by
ModelSwitchCoordinator(actor) — concurrent API requests must never trigger parallel stop/start cycles - URLSession lifecycle: For streaming responses,
session.invalidateAndCancel()must be called inside theResponseBodyclosure (viadefer), not on the outer function scope — the function returns theResponsebefore the body closure executes, so adeferon the outer scope kills the session mid-stream
- Ollama-compatible routes (
/api/tags,/api/version,/api/chat,/api/generate) are registered onProxyServer(before catch-all routes so they take priority) /api/tagsreads fromModelStore.refreshCached()— does not need an upstream;/api/versionis unauthenticated/api/chatand/api/generatetranslate Ollama request/response format to/from OpenAI/v1/chat/completions— both streaming (ndjson) and non-streaming/api/chatand/api/generatecallensureModel()before forwarding — if the requested model differs from the running model, it stops the current model and starts the requested one viaServerManagerensureModel()delegates toModelSwitchCoordinator(actor) which serializes all switch operations: same-model requests coalesce onto the in-flight task; different-model requests cancel the current switch (last writer wins). Awhile trueloop re-evaluates state after everyawaitto handle actor reentrancy safely- Do not use
MainActor.runwith async closures — it only accepts synchronous closures. Call@MainActorasync methods directly withawaitfrom nonisolated contexts - Ollama streaming responses use
application/x-ndjsoncontent type (one JSON object per line), not SSE
- Use
OSLogfor structured daemon events - Pipe
mlx_lm.serverstdout/stderr directly to the log file viareadabilityHandler - Rotate log at startup: if
server.logexceeds 50 MB, rename toserver.log.1and create a fresh file (one backup kept) - Log directory:
~/.ollmlx/logs/— always create withwithIntermediateDirectories: true
refreshCached()must only return models frommlx-community— filter HF cache directories tomodels--mlx-community--*pull()must not pass--quiettohuggingface-cli download— it suppresses tqdm progress output needed for the progress bar
- macOS apps launched via Finder/LaunchServices do not inherit the user's shell PATH —
~/.local/bin,/opt/homebrew/binetc. are not available via/usr/bin/env - When calling external tools like
uvfrom the app, resolve the absolute path by checking known candidate locations (~/.local/bin/uv,/usr/local/bin/uv,/opt/homebrew/bin/uv) - The venv Python path at
~/.ollmlx/venv/bin/pythonis always an absolute path and works fine
- Mirror Ollama's output format exactly (table headers,
\rprogress overwrite,pull completemessage) - Use
fflush(stdout)after everyprint(terminator: "")in streaming contexts - Exit non-zero with a clear error message if the daemon is not running (connection refused on
:11435)
- Swift files: PascalCase matching the primary type they contain
- One primary type per file
- CLI commands live in
Sources/ollmlx/Commands/— one file per command - No file should import both AppKit and OllmlxCore types that are UI-agnostic — keep the dependency direction clean
| Port | Purpose |
|---|---|
11434 |
Public OpenAI-compatible + Ollama-compatible API (clients connect here) |
11435 |
Internal daemon control API (CLI/app only) |
| Ephemeral | mlx_lm.server internal port, allocated dynamically |
~/.ollmlx/venv/bin/python # stored in UserDefaults after bootstrap
~/.ollmlx/venv/bin/huggingface-cli # HF CLI binary (may be a symlink to hf on huggingface-hub >=1.8.0)
~/.ollmlx/venv/bin/hf # HF CLI binary (huggingface-hub >=1.8.0 installs this instead of huggingface-cli)
~/.ollmlx/logs/server.log # current log
~/.ollmlx/logs/server.log.1 # previous log (rotated)
~/.cache/huggingface/hub/ # HF model cache (only models--mlx-community--* are listed)
~/.local/bin/uv # uv binary (used for mlx-lm version detection)
/usr/local/bin/ollmlx # CLI symlink (created by app on first launch)
/usr/local/bin/ollama # optional shim symlink
- Do not use
@AppStorageanywhere inOllmlxCore - Do not import AppKit in
OllmlxCoreor the CLI target - Do not store the API key in UserDefaults — Keychain only
- Do not hardcode the internal
mlx_lm.serverport — always use ephemeral allocation - Do not call
huggingface-clifrom PATH — always use the venv binary (huggingface-cliorhffallback) - Do not buffer streaming proxy responses — forward chunks immediately
- Do not put
defer { session.invalidateAndCancel() }on the outer scope of proxy handlers that return streamingResponseBodyclosures - Do not use
.sheet()to present views from an NSPopover — it deadlocks the app and crashes on macOS 26 Tahoe. Always use standaloneNSWindowvia AppDelegate notifications instead - Do not call
ServerManager.start()from model selector onChange — only from explicit Start button or Ollama API model-switch logic - Do not call
ServerManager.stop()/start()directly from proxy route handlers — always go throughModelSwitchCoordinator.ensureModel()to prevent concurrent model-switch race conditions - Do not use
/usr/bin/envto find tools likeuvfrom the macOS app — resolve absolute paths - Do not send SIGKILL without trying SIGINT first and waiting 5 seconds
- Do not allow external connections without an API key being set
- Do not use Homebrew in the bootstrap script — use
uvfromastral.sh - Do not move on to the next phase until all completion criteria for the current phase are met
- Do not implement multiple concurrent models — v1 supports one model at a time only