Transition packaging from generated Rust project builds to post-link binary payloads

## **Description**

We currently generate a Rust project at build time and invoke Cargo to produce a model-specific `encoderfile` binary. This approach has become increasingly fragile and expensive:

* Docker builds trigger nested Rust builds (Cargo-in-Cargo)
* CI frequently OOMs, especially on ARM runners
* Build behavior depends on crates.io publish timing and cache state
* Version-coupled crates (`encoderfile` / `encoderfile-core`) are resolved at build time, leading to accidental mismatches
* Build times and failure modes are hard to reason about and debug

In practice, we are re-compiling Rust code solely to embed model assets (weights, tokenizer, configs), not because the executable logic itself is changing.

### **Proposed change**

Move from **“generate Rust project + build”** to **“post-link binary packaging”**.

Instead of rebuilding Rust code to embed assets, we will:

1. Build **pre-compiled binaries**
2. Generate model assets separately
3. Append those assets to the already-compiled binary (llamafile-style)
4. Load and validate the embedded payload at runtime

This removes Cargo from the packaging path entirely.

### **What this looks like**

**Before**

```
Docker build
  → encoderfile build
      → generate Rust project
      → invoke Cargo
          → resolve dependencies
          → compile Rust
          → embed assets
```

**After**

```
CI build
  → cargo build (once, per model type)

Packaging
  → concat binary + payload

Runtime
  → read embedded payload
  → initialize model
```

### **Why this is better**

* Eliminates nested Rust builds and CI OOMs
* Removes crates.io timing and cache dependency
* Makes Docker builds deterministic and fast
* Preserves strong compile-time typing and monomorphization
* Aligns with one-binary-per-model-type architecture
* Simplifies debugging and failure modes

Importantly, this does **not** require:

* Cosmopolitan / universal binaries
* C++ or linker tricks
* `include_bytes!`, custom sections, or `build.rs` hacks

The OS loader already ignores trailing bytes in executables; we simply take advantage of that.

### **Scope / follow-ups**

* Define payload format (footer marker, length prefix, optional checksum)
* Implement runtime payload loader
* Update CI and Docker pipelines
* Deprecate generated-project path and related macros

### **Non-goals**

* Supporting multiple model types in a single binary
* Re-introducing runtime dispatch or dynamic model selection
* Universal “run anywhere” binaries

This change trades compile-time asset embedding for runtime initialization, which is acceptable and significantly reduces operational complexity.

## On Implementation

**Note: Bringing up headless mode and backends for future planning. These are NOT in scope for this issue.**

We’re standardizing on the following model going forward:

1. **Targets are `(platform × backend)` runtime binaries**, installed explicitly:

   ```bash
   encoderfile target add arm64-unknown-linux-gnu --backend cuda
   ```

   These are downloaded from GitHub Releases and cached locally. No cross-compiling for users, no auto-selection.

2. **Embedded encoderfiles are deployment artifacts only**.
   A `.encoderfile` contains a runtime binary + embedded protobuf payload and is:

   * fully self-contained
   * immutable
   * **not allowed to run headless**
   * does not load external weights/config at runtime

3. **Headless mode is only supported by pre-built runtime binaries**.
   Headless execution (external weights/config/tokenizer) is explicitly disallowed for embedded encoderfiles and enforced at **compile time** via mutually exclusive features (`embedded` vs `headless`).

4. **Exactly one backend per runtime binary** (CPU, CUDA, Metal, etc.).
   Backend choice is a build-time decision. There is no runtime backend switching and no multi-backend binaries.

This separation keeps deployment artifacts deterministic, avoids cross-compile pain for users, prevents accidental CUDA/Metal dependencies, and cleanly supports future environments (e.g. WASM) via headless runtimes.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transition packaging from generated Rust project builds to post-link binary payloads #194

Description

Proposed change

What this looks like

Why this is better

Scope / follow-ups

Non-goals

On Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transition packaging from generated Rust project builds to post-link binary payloads #194

Description

Description

Proposed change

What this looks like

Why this is better

Scope / follow-ups

Non-goals

On Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions