trinity_coordinator is an Elixir/Nx implementation of a TRINITY-style local
small-model router. The active project direction is to recreate the Python
Sakana/Qwen artifact application process in Elixir, prove it with rigorous
stage-level parity checks, and use the resulting Qwen coordinator to route real
LLM provider calls.
The current focus is parity and service foundation, not reproducing the original training experiment.
The original paper sources and the full supplemental Python submission have
been audited. The supplemental checkpoint metadata is now treated as the
compatibility source of truth for the imported artifact path: Qwen3-0.6B,
layer-26 SVF, seven agents, five turns, no-generation penultimate hidden-state
extraction, and raw role order solver, thinker, verifier.
The active lane is:
- Load the same base
Qwen/Qwen3-0.6Bmodel. - Consume the Sakana router vector and SVD/SVF components.
- Reconstruct adapted Qwen tensors in Elixir/Nx.
- Prove the Elixir path against Python with stage-level checks and explicit tolerances.
- Materialize reusable adapted artifacts.
- Run the adapted small local coordinator in front of real provider-backed LLM calls.
The earlier experiment-reproduction lane, including sep-CMA-ES training and benchmark harness scaffolding, has been removed from the active codebase. The remaining mainline is the parity and service path.
The current development checkout is a small local workspace, not a single
standalone repo. trinity_coordinator fetches most Hex/Git dependencies through
mix deps.get, but the live provider lane still expects several first-party
repositories to sit next to it on disk.
Create this sibling layout:
workspace/
trinity_coordinator/
agent_session_manager/
gemini_cli_sdk/
cli_subprocess_core/
execution_plane/
Clone the repos from nshkrdotcom:
mkdir trinity-workspace
cd trinity-workspace
git clone git@github.com:nshkrdotcom/trinity_coordinator.git
git clone git@github.com:nshkrdotcom/agent_session_manager.git
git clone git@github.com:nshkrdotcom/gemini_cli_sdk.git
git clone git@github.com:nshkrdotcom/cli_subprocess_core.git
git clone git@github.com:nshkrdotcom/execution_plane.git
cd trinity_coordinator
mix deps.getUse HTTPS URLs instead of SSH if your GitHub account is not configured for SSH:
git clone https://github.com/nshkrdotcom/trinity_coordinator.gitDependency source selection is handled by
build_support/dependency_sources.exs and
build_support/dependency_sources.config.exs. It tries local sibling paths
first, then GitHub, then Hex where a Hex package is available. Use
.dependency_sources.local.exs for local overrides; dependency source
selection does not use environment variables.
Why the sibling repos are useful:
trinity_coordinatoruses local sibling checkouts of../agent_session_manager,../gemini_cli_sdk, and../inference/apps/inferencewhen present.agent_session_managerandgemini_cli_sdkuse../cli_subprocess_corewhen it is present.cli_subprocess_coreuses packages inside../execution_planewhen that workspace is present.- Standalone clones fall back to the configured GitHub sources.
The base coordinator model is Qwen/Qwen3-0.6B. You do not copy it into this
repo manually. Bumblebee downloads it from Hugging Face into the normal local
model cache the first time a Qwen-backed task runs.
The Sakana-adapted runtime artifact is different: it is generated output and is not committed to git. A fresh clone will not contain:
priv/sakana_trinity/adapted_qwen3_0_6b_layer26/
That directory is about 624 MB in the current local build and should contain:
manifest.json
router_head.safetensors
checkpoints/*.safetensors
For normal onboarding, use a blessed artifact bundle and place it at the path above. After that, no Python setup is required for local route demos or mock orchestration.
If you need to rebuild the artifact instead of copying a bundle, use the export and import workflow in Sakana Artifacts And Export. That path loads Qwen and performs SVD/SVF work, so it is heavier than normal first-run setup.
Working now:
- Qwen3-0.6B loads through Bumblebee on EXLA CUDA.
- The Sakana router vector is converted to safetensors.
- The vector split is understood:
- first
9216values: SVF scale offsets; - final
10240values: router-head weights reshaped to{10, 1024}.
- first
- The Elixir SVD/SVF code reconstructs adapted tensors.
- Python and Elixir parity scripts emit detailed JSON reports.
- Python emits a stage tensor bundle from safetensors readback.
- Elixir emits comparable semantic
torch_vstage tensors with--stage-dir. - The fast semantic loop can reuse Python's
stage.source_f32, skip wrong layouts, and run the required reconstruction check through EXLA without loading Qwen for every debug run. --strict-stage-tolerancesis the required functional correctness gate.- Full Python semantic export imports into canonical checkpoint-directory
Elixir artifacts with 9 target-verified tensors, 9,216 singular offsets, and
router head shape
{10, 1024}. - The adapted coordinator smoke loads those canonical artifacts, patches Qwen,
and routes a fixed transcript on CUDA with hidden
{1, 1024}, logits{1, 10}, agent logits{7}, and role logits{3}. - Fixed-transcript router trace parity passes for exact transcript, token ids, router-head hash, and argmax agent/role ids. Hidden/logit vectors are compared with declared alignment thresholds because Python currently runs this trace on CPU while Elixir runs Qwen through EXLA CUDA.
- The adapted runtime loop can route through fake providers with persisted JSONL
traces. The safe smoke path dispatches Worker first, Verifier second, and
terminates on verifier
ACCEPT. - Thinker suggestions, verifier-before-worker failure, max-turn latest-worker termination, and provider failure tracing are covered by focused tests.
Current parity result:
- Original-submission
svd_weights.ptgeneration succeeds and produces current Python safetensors readback hashb4cab13f8a82ccaf49603356e658bc9b77f65b08a69678a7d053a2e4b3197c43. - Historical stored hash
600be6ab0f5a34325b9857182ccb5fce5971549a0ce8588cdacc992eda54014cremains non-reproducible from that regenerated.pt. - The bounded layer-26 all-selected replay checks 7 tensors, 70 stages, and 63
required stages with
failed_required=0. - Source tensors, offsets, scaled singular values, and
u_scaledbyte-match; required f32 reconstruction stages pass explicit tolerances. - Final
bf16byte matching remains aspirational and is reported separately. - Canonical import validation passes with
status=complete,artifact_layout=checkpoint_directory,selected_tensor_count=9,selected_singular_value_count=9216,loaded_tensor_count=9, andtarget_verified_count=9. - Adapted coordinator validation passes against the promoted canonical artifact
at
priv/sakana_trinity/adapted_qwen3_0_6b_layer26; a representative fixed-route smoke selectedagent_id=4,role_id=0, public roleWorker. - Router trace parity passes with exact token ids and head hash, exact
agent_id=4/role_id=0, hidden cosine0.99449, and logits cosine0.99743.
Recent non-matching Elixir final hashes have included
bf089ea0607c93ae69f92bf7b9fcf71dc2a2b53d231cfe307b8cd6f4ef6a85ae and
74dc61d765c95e80ca7298b6e97f29a4fd76e2ae4bfb348b2abbffcbc5e0dff8.
The stage report, not the final Elixir hash alone, is the correctness verdict.
Read the guides in this order:
- Onboarding
- Current Direction And Planning
- System Architecture
- Recreating The Python Parity Process
- Stage Checks And Tolerances
- Sakana Artifacts And Export
- SVD Generation Runbook
- Service Buildout Plan
- Provider Service Hardening
- Operations And Quality Gates
- Troubleshooting
Runnable reviewer examples are in Examples.
Additional technical reference notes are included in HexDocs under
Reference Notes.
Private implementation notes may exist under docs/priv/ in internal
workspaces, but they are not required for fresh-clone onboarding.
- Elixir
~> 1.18. - NVIDIA driver visible to
nvidia-smi. XLA_TARGET=cuda12for Qwen-backed coordinator runs.- Internet access for the first Hugging Face download of
Qwen/Qwen3-0.6B. - The sibling first-party repos listed in Fresh Clone Setup.
- The generated adapted artifact directory at
priv/sakana_trinity/adapted_qwen3_0_6b_layer26/for route demos. - Python with PyTorch, Transformers, and safetensors only when rebuilding artifacts or running parity scripts.
- Gemini CLI authentication only when running live
gemini_cli_asmprovider demos.
Resolved core dependency lane:
nx 0.10.0exla 0.10.0axon 0.7.0bumblebeepinned toelixir-nx/bumblebee0fd8114cf5429af9236f100f3350986e9d823c02
First confirm dependencies resolve and the non-provider test suite passes:
mix deps.get
XLA_TARGET=cuda12 mix testThen run the primary safe demo. This is the first command to use when checking that the project works end to end without spending provider budget:
XLA_TARGET=cuda12 mix trinity.route.demo \
--mock \
--trace-out tmp/trinity_route_demo.jsonlThis loads the adapted Qwen coordinator from the default artifact path,
routes through the Sakana head, dispatches to deterministic mock providers,
persists a JSONL trace, and exits with TRINITY ROUTE DEMO: PASS when the
mock verifier accepts.
If you only want to inspect the local router without provider dispatch, run:
XLA_TARGET=cuda12 mix run examples/local_coordinator_route.exs -- \
--artifact-dir priv/sakana_trinity/adapted_qwen3_0_6b_layer26 \
--prompt "Select a TRINITY role for this reasoning task."That prints the artifact identity, token ids, hidden-vector shape, route logits, selected agent id, and selected role name.
For the direct adapted-coordinator shape smoke:
XLA_TARGET=cuda12 mix trinity.hitl.adaptedRun static checks before committing:
mix format --check-formatted
mix credo --strict
mix dialyzerBuild docs:
mix docsGenerate Python report, components, and stage tensors:
python3 priv/sakana_trinity/scripts/debug_sakana_parity_sample.py \
--model-torch-dtype float32 \
--out tmp/sakana_parity/python_sample_trace.json \
--write-components-dir tmp/sakana_parity/python_componentsGenerate Elixir semantic report and stage tensors:
XLA_TARGET=cuda12 mix trinity.sakana.parity_sample \
--semantic-only \
--device-semantic-only \
--preferred-layout-only \
--source-from-python-stage \
--components-dir tmp/sakana_parity/python_components \
--python-report tmp/sakana_parity/python_sample_trace.json \
--stage-dir tmp/sakana_parity/elixir_stages \
--out tmp/sakana_parity/elixir_sample_trace.jsonThose extra flags are the current recommended debug loop: skip native Nx SVD, skip the large host CPU matmul, skip known-wrong V-layout diagnostics, and reuse Python's stage source tensor instead of loading the full Qwen profile just to recover the sample source.
Run the required functional parity gate:
python3 priv/sakana_trinity/scripts/compare_sakana_parity_reports.py \
--strict-stage-tolerances \
tmp/sakana_parity/python_sample_trace.json \
tmp/sakana_parity/elixir_sample_trace.jsonUse final byte equality only as an explicit opt-in target:
python3 priv/sakana_trinity/scripts/compare_sakana_parity_reports.py \
--strict-current-python \
tmp/sakana_parity/python_sample_trace.json \
tmp/sakana_parity/elixir_sample_trace.jsonThat exact-byte gate is expected to fail in the current state while functional stage parity passes.
After installing the canonical artifact directory, validate the live adapted Qwen coordinator directly:
XLA_TARGET=cuda12 mix trinity.hitl.adaptedThis proves the runtime shape contract:
adapted Qwen vector shape: {1, 1024}
adapted route logits shape: {1, 10}
adapted agent logits shape: {7}
adapted role logits shape: {3}
The live CUDA smoke proves the operational shape contract. The router trace below adds the side-by-side Python/Elixir semantic check for tokenization, hidden extraction, router-head weights, logits, and selected agent/role ids.
Generate the Python trace from the canonical artifact directory. On the current
RTX 5060 Ti host, PyTorch 2.7.1 does not ship CUDA kernels for sm_120, so this
trace is run on CPU:
uv run --python 3.11 \
--with torch==2.7.1 \
--with transformers==4.55.2 \
--with accelerate==1.6.0 \
--with safetensors \
python priv/sakana_trinity/scripts/debug_sakana_router_trace.py \
--artifact-dir priv/sakana_trinity/adapted_qwen3_0_6b_layer26 \
--device cpu \
--model-torch-dtype bfloat16 \
--out tmp/sakana_parity/python_router_trace_bf16_cpu.jsonCompare from Elixir:
XLA_TARGET=cuda12 mix trinity.sakana.router_trace \
--artifact-dir priv/sakana_trinity/adapted_qwen3_0_6b_layer26 \
--python-report tmp/sakana_parity/python_router_trace_bf16_cpu.json \
--out tmp/sakana_parity/elixir_router_trace.jsonRequired exact checks: transcript hash, token ids, router-head hash, hidden/logit shapes, and argmax agent/role ids. Hidden/logit numeric payloads must pass declared cosine and relative-L2 alignment thresholds; max/mean absolute errors remain diagnostics.
For the opt-in all-selected tensor gate, generate Python components and the source-oriented all-selected stage bundle with original SVD components:
python3 priv/sakana_trinity/scripts/debug_sakana_parity_sample.py \
--model-torch-dtype float32 \
--svd-weights path/to/svd_weights.pt \
--all-selected-tensors \
--out tmp/sakana_parity/python_sample_trace.json \
--write-components-dir tmp/sakana_parity/python_componentsThen replay a bounded layer-26 slice from those Python components:
XLA_TARGET=cuda12 mix trinity.sakana.parity_sample \
--semantic-only \
--device-semantic-only \
--preferred-layout-only \
--source-from-python-stage \
--all-selected-tensors \
--selected-source-filter 'model.layers.26.' \
--components-dir tmp/sakana_parity/python_components \
--python-report tmp/sakana_parity/python_sample_trace.json \
--stage-dir tmp/sakana_parity/elixir_stages \
--out tmp/sakana_parity/elixir_sample_trace.jsonThis path is deliberately explicit because it can materialize very large stage
tensors for the embedding and LM-head matrices. Keep embedding/LM-head replay
out of the monolithic EXLA command until the chunked large-tensor gate is in
place. Without --svd-weights, the Python script still requires
--decompose-all-selected-if-missing.
The intended service path is:
- Format and tokenize the transcript.
- Run the adapted local Qwen coordinator on CUDA.
- Extract the penultimate-token hidden state.
- Route through the imported Sakana head.
- Select agent and TRINITY role.
- Inject the selected role prompt.
- Dispatch to a configured LLM provider.
- Persist trace metadata for audit and debugging.
Provider dispatch now enters the shared :inference boundary through
TrinityCoordinator.AgentPool.Inference for hosted, GeminiEx, and Agent Session
Manager specs. Live calls are still explicitly gated; tests verify routing and
provider-boundary behavior without pretending that external LLM calls happened.
Use mix trinity.route.demo --mock as the primary operator-facing command. It
exercises the local adapted router, role injection, provider boundary, verifier
termination, and JSONL trace persistence without external LLM calls:
XLA_TARGET=cuda12 mix trinity.route.demo \
--mock \
--trace-out tmp/trinity_route_demo.jsonlmix trinity.demo --mock is kept as a compatibility wrapper and delegates to
the same route demo. Prefer mix trinity.route.demo --mock in new docs,
scripts, and smoke checks.
The next most useful commands are:
| Command | Purpose | Provider calls |
|---|---|---|
mix trinity.route.demo --mock |
Primary end-to-end smoke: adapted router, mock provider boundary, verifier termination, trace output | Mock only |
mix run examples/local_coordinator_route.exs -- ... |
Inspect tokenization, hidden vector, logits, selected agent, and selected role | None |
mix run examples/qwen_router_prompt_eval.exs |
Eval-style prompt suite that asserts expected Qwen router agent/role choices | None |
mix run examples/mock_orchestration_trace.exs -- ... |
Reviewer-friendly orchestration trace with printed mock turns | Mock only |
mix trinity.hitl.mock_loop |
HITL-style mock orchestrator check | Mock only |
mix trinity.hitl.adapted |
Adapted Qwen shape/logit smoke | None |
mix trinity.sakana.router_trace |
Python/Elixir fixed-transcript parity check | None |
All route/demo commands default to the promoted artifact directory:
priv/sakana_trinity/adapted_qwen3_0_6b_layer26
Use --artifact-dir ... only when testing a non-default artifact bundle.
Run the adapted mock-provider HITL loop when you want terse pass/fail output:
XLA_TARGET=cuda12 mix trinity.hitl.mock_loop \
--trace-out tmp/trinity_mock_trace.jsonlRun the reviewer-friendly mock orchestration trace when you want a readable summary of trace events:
XLA_TARGET=cuda12 mix run examples/mock_orchestration_trace.exs -- \
--artifact-dir priv/sakana_trinity/adapted_qwen3_0_6b_layer26 \
--prompt "Select a TRINITY role for this reasoning task." \
--trace-out tmp/examples/mock_orchestration_trace.jsonlLive provider mode is explicitly gated:
XLA_TARGET=cuda12 mix trinity.route.demo \
--allow-live \
--profile qwen_sakana_adapted \
--provider-pool gemini_cli_asm \
--max-turns 3 \
--trace-out tmp/trinity_route_demo.jsonlWithout --mock or --allow-live, live provider demo mode fails before
dispatch.
The built-in default live provider pool maps all seven agent ids to OpenAI
gpt-4o-mini specs. To use it, provide an OpenAI API key and explicitly enable
live mode:
XLA_TARGET=cuda12 mix trinity.route.demo \
--allow-live \
--openai-api-key "$OPENAI_API_KEY" \
--profile qwen_sakana_adapted \
--provider-pool default \
--max-turns 3 \
--trace-out tmp/trinity_route_demo_openai.jsonlGoverned runs do not read normal provider env as authority. They must provide an explicit authority packet or the matching governed route-demo flags:
XLA_TARGET=cuda12 mix trinity.route.demo \
--profile qwen_sakana_adapted \
--governed-authority-ref auth-trinity-1 \
--governed-workflow-ref workflow-trinity-1 \
--governed-runtime-ref runtime-trinity-1 \
--governed-provider-pool-ref pool-trinity-1 \
--governed-credential-ref cred-trinity-1 \
--governed-api-key "$TRINITY_DISPOSABLE_PROVIDER_KEY" \
--governed-provider openai \
--governed-model gpt-4o-mini \
--trace-out tmp/trinity_route_demo_governed.jsonlThe governed path rejects direct provider-pool and credential options beside the authority packet. Trace output records provider/model labels, opaque refs, hashes, and fixed redaction markers, not materialized secret values.
The built-in gemini_cli_asm pool routes all seven TRINITY agents through
Inference.Adapters.ASM, ASM's SDK lane, and gemini_cli_sdk using
gemini-3.1-flash-lite-preview. The Gemini CLI must be installed or reachable
through the SDK's npx fallback and authenticated in the runtime environment.
Operator-facing commands:
| Command | Use |
|---|---|
mix trinity.route.demo --mock |
Primary safe runtime demo. Use this first. |
mix trinity.route.demo --provider-pool ... --allow-live |
Gated live-provider runtime demo. |
mix trinity.demo --mock |
Compatibility wrapper around mix trinity.route.demo --mock. |
mix trinity.hitl.mock_loop |
Terse mock orchestrator loop with pass/fail output. |
mix trinity.hitl.adapted |
Adapted Qwen coordinator shape/logit check. |
mix trinity.hitl.gpu |
CUDA/EXLA visibility check. |
mix trinity.hitl.base_qwen |
Base Qwen CUDA hidden-state check. |
mix trinity.hitl.head_route |
Live hidden-state to Sakana-head routing check. |
mix trinity.hitl.vector |
Sakana router-vector split check. |
Artifact and parity commands:
| Command | Use |
|---|---|
mix trinity.sakana.import_python |
Import Python semantic Sakana artifacts into the canonical Elixir layout. |
mix trinity.sakana.export_adapted |
Export Sakana-adapted Qwen tensors and router head. |
mix trinity.sakana.parity_sample |
Emit Elixir SVD/SVF parity diagnostics. |
mix trinity.sakana.router_trace |
Emit and compare fixed-transcript router traces. |
mix trinity.sakana.large_tensor_chunks |
Replay embedding and LM-head Sakana stages in row chunks. |
The examples/ directory contains runnable, no-provider reviewer diagnostics.
Local coordinator routing:
XLA_TARGET=cuda12 mix run examples/local_coordinator_route.exs -- \
--artifact-dir priv/sakana_trinity/adapted_qwen3_0_6b_layer26 \
--prompt "Select a TRINITY role for this reasoning task."Qwen router prompt eval:
XLA_TARGET=cuda12 mix run examples/qwen_router_prompt_eval.exsMock orchestration trace:
XLA_TARGET=cuda12 mix run examples/mock_orchestration_trace.exs -- \
--artifact-dir priv/sakana_trinity/adapted_qwen3_0_6b_layer26 \
--prompt "Select a TRINITY role for this reasoning task." \
--trace-out tmp/examples/mock_orchestration_trace.jsonlThese examples print the prompt, artifact identity, hidden/vector/logit shapes, selected agent/role ids, mock provider turns, and trace summaries.
Before committing changes that affect parity or runtime behavior, run:
mix format --check-formatted
python3 -m py_compile priv/sakana_trinity/scripts/*.py
XLA_TARGET=cuda12 mix test
mix credo --strict
mix dialyzer
mix docsWhen parity code changes, also run:
python3 priv/sakana_trinity/scripts/compare_sakana_parity_reports.py \
--strict-stage-tolerances \
tmp/sakana_parity/python_sample_trace.json \
tmp/sakana_parity/elixir_sample_trace.jsonThis repository is a research implementation inspired by TRINITY: An Evolved LLM Coordinator.[1] The paper motivates the hidden-state router, the Thinker/Worker/Verifier role split, the lightweight coordination head, and the preference for compact local coordination.
This package does not claim to reproduce the paper's reported scores. The active focus is a robust, inspectable Elixir implementation of the Qwen/Sakana coordinator path.
[1] Jinglue Xu, Qi Sun, Peter Schwendeman, Stefan Nielsen, Edoardo Cetin, and Yujin Tang. TRINITY: An Evolved LLM Coordinator. arXiv:2512.04695, 2026. https://arxiv.org/abs/2512.04695
This project is released under the MIT License.