Skip to content

Conversation

@besaleli
Copy link
Member

@besaleli besaleli commented Jan 8, 2026

Description

This PR introduces the encoderfile embedded binary format and packaging pipeline. Instead of embedding model artifacts via linker sections or include_bytes!, the runtime binary now has a self-describing payload appended at EOF, located via a fixed-size footer and protobuf manifest. Model artifacts (weights, tokenizer, config, transforms) are streamed from disk using seek-based IO and loaded lazily at runtime, with a single dynamic dispatch on model type followed by fully statically-typed execution. This design avoids platform-specific linker behavior, keeps builds fast, enables inspection and tooling, and cleanly separates compilation from packaging.

Benefits

  • No more cargo-in-cargo
  • Lazy loading of large assets (e.g., model weights)

Why manually specified file format instead of appended sections?

We intentionally avoid embedding artifacts via custom linker sections (.data, extern "C" symbols). While simpler initially, this approach is platform-specific, eagerly loads large blobs into memory, and provides no clear boundary for versioning or inspection. The appended encoderfile container instead offers predictable cross-platform behavior, lazy loading, and a stable, versioned format.

Testing

To test, run:

cargo build --bin encoderfile --release

Then run:

cargo run -- build -f test_config.yml --base-binary-path target/release/encoderfile

To run encoderfile:

chmod +x ./my-model-2.encoderfile
./my-model-2.encoderfile infer "hello"

Notes

  • Tokenizer config is now located in TokenizerService instead of Config
  • Base binaries are compiled from encoderfile-runtime
  • Base binary must be explicitly specified until Pull base binaries from GH releases for encoderfile builds #204 is closed. For testing, run cargo build --bin encoderfile --release, and use target/release/encoderfile-runtime.
  • Dockerfile now uses debian:bookworm-slim as a base image since we no longer need Cargo toolchain.
  • we're explicitly encoding everything in little-endian, BE is theoretically possible but not a priority given our prioritized arch targets

Out of Scope

@besaleli besaleli marked this pull request as ready for review January 8, 2026 21:32
@besaleli besaleli requested a review from javiermtorres January 8, 2026 21:32
Copy link
Contributor

@javiermtorres javiermtorres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@besaleli besaleli merged commit 5166bf6 into main Jan 9, 2026
5 checks passed
@besaleli besaleli deleted the 194-transition-packaging-from-generated-rust-project-builds-to-post-link-binary-payloads branch January 9, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transition packaging from generated Rust project builds to post-link binary payloads

4 participants