Add multihost inference support for TPU by ahmeda14960 · Pull Request #2759 · marin-community/marin

ahmeda14960 · 2026-02-12T02:00:15Z

The main document to look at is lib/levanter/CODEX_REFACTOR_KV.md this is where I had codex write high level plans I would approve.

it would then write more details and logging to lib/levanter/CODEX_INFERNCE_MXX.md where MXX is milestone M1 M2 etc. Seems to work!

sample command:

python infra/launch.py
--tpu_name
--tpu_type v5p-32
--node_count 4
--zone us-central1-a
-- python -m levanter.main.sample_lm_multihost
--config_path lib/levanter/config/sampler/sample_llama8b_multihost_real_128prompts_2048_m10_hostdp_wandb_v5p32.yaml

Brings over the multihost inference work from fix/simpo-multihost-inference, excluding all SimPO/preference dataset changes. Core changes: - Inference engine overhaul: page-based KV cache, multi-round generation, DCN axis handling, sync barriers - JIT scheduler for paged attention with page allocation - Multihost sampling entrypoint (sample_lm_multihost.py) with per-round reset, host-level data parallelism, prompt sharding, WandB logging - Attention layer changes for paged KV cache and TPU RPA support - KV cache restructuring for physical page management across hosts - 180 sampler config YAMLs organized by milestone (m2-m11) - Tests for engine, scheduler, page table, KV cache, and multihost

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add multihost inference support for TPU#2759

Add multihost inference support for TPU#2759
ahmeda14960 wants to merge 1 commit intomainfrom
multihost-inference

ahmeda14960 commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

ahmeda14960 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ahmeda14960 commented Feb 12, 2026 •

edited

Loading