Skip to content

Transition packaging from generated Rust project builds to post-link binary payloads #194

@besaleli

Description

@besaleli

Description

We currently generate a Rust project at build time and invoke Cargo to produce a model-specific encoderfile binary. This approach has become increasingly fragile and expensive:

  • Docker builds trigger nested Rust builds (Cargo-in-Cargo)
  • CI frequently OOMs, especially on ARM runners
  • Build behavior depends on crates.io publish timing and cache state
  • Version-coupled crates (encoderfile / encoderfile-core) are resolved at build time, leading to accidental mismatches
  • Build times and failure modes are hard to reason about and debug

In practice, we are re-compiling Rust code solely to embed model assets (weights, tokenizer, configs), not because the executable logic itself is changing.

Proposed change

Move from “generate Rust project + build” to “post-link binary packaging”.

Instead of rebuilding Rust code to embed assets, we will:

  1. Build pre-compiled binaries
  2. Generate model assets separately
  3. Append those assets to the already-compiled binary (llamafile-style)
  4. Load and validate the embedded payload at runtime

This removes Cargo from the packaging path entirely.

What this looks like

Before

Docker build
  → encoderfile build
      → generate Rust project
      → invoke Cargo
          → resolve dependencies
          → compile Rust
          → embed assets

After

CI build
  → cargo build (once, per model type)

Packaging
  → concat binary + payload

Runtime
  → read embedded payload
  → initialize model

Why this is better

  • Eliminates nested Rust builds and CI OOMs
  • Removes crates.io timing and cache dependency
  • Makes Docker builds deterministic and fast
  • Preserves strong compile-time typing and monomorphization
  • Aligns with one-binary-per-model-type architecture
  • Simplifies debugging and failure modes

Importantly, this does not require:

  • Cosmopolitan / universal binaries
  • C++ or linker tricks
  • include_bytes!, custom sections, or build.rs hacks

The OS loader already ignores trailing bytes in executables; we simply take advantage of that.

Scope / follow-ups

  • Define payload format (footer marker, length prefix, optional checksum)
  • Implement runtime payload loader
  • Update CI and Docker pipelines
  • Deprecate generated-project path and related macros

Non-goals

  • Supporting multiple model types in a single binary
  • Re-introducing runtime dispatch or dynamic model selection
  • Universal “run anywhere” binaries

This change trades compile-time asset embedding for runtime initialization, which is acceptable and significantly reduces operational complexity.

On Implementation

Note: Bringing up headless mode and backends for future planning. These are NOT in scope for this issue.

We’re standardizing on the following model going forward:

  1. Targets are (platform × backend) runtime binaries, installed explicitly:

    encoderfile target add arm64-unknown-linux-gnu --backend cuda

    These are downloaded from GitHub Releases and cached locally. No cross-compiling for users, no auto-selection.

  2. Embedded encoderfiles are deployment artifacts only.
    A .encoderfile contains a runtime binary + embedded protobuf payload and is:

    • fully self-contained
    • immutable
    • not allowed to run headless
    • does not load external weights/config at runtime
  3. Headless mode is only supported by pre-built runtime binaries.
    Headless execution (external weights/config/tokenizer) is explicitly disallowed for embedded encoderfiles and enforced at compile time via mutually exclusive features (embedded vs headless).

  4. Exactly one backend per runtime binary (CPU, CUDA, Metal, etc.).
    Backend choice is a build-time decision. There is no runtime backend switching and no multi-backend binaries.

This separation keeps deployment artifacts deterministic, avoids cross-compile pain for users, prevents accidental CUDA/Metal dependencies, and cleanly supports future environments (e.g. WASM) via headless runtimes.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions