You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-4Lines changed: 20 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,8 +5,8 @@ This repository implements a Cursor-style tri-plane composed of an FP8 MoE train
5
5
## Architecture Overview
6
6
7
7
-**Environment Fleet**: `envd/server.py` provides the gRPC tool surface (read/edit/search/lint/exec) and optional semantic search backed by Qdrant; Firecracker launch scripts in `scripts/firecracker/` create snapshot-based microVMs.
8
-
-**Inference**: `inference/serve.py` bootstraps Ray actors (controller, samplers, env clients) to execute parallel tool plans with straggler mitigation and speculative rollouts.
9
-
-**Trainer**: `trainer/` contains a PPO loop over a lightweight MoE transformer policy, reward shaping utilities, and data helpers suitable for integration with DeepSpeed/Megatron FP8 stacks.
8
+
-**Inference**: `inference/serve.py` bootstraps Ray actors (controller, samplers, env clients) to execute parallel tool plans with straggler mitigation and speculative rollouts, with pluggable samplers (stub or OpenAI-compatible vLLM backend) and rollout persistence (JSONL/S3/ClickHouse).
9
+
-**Trainer**: `trainer/` contains a PPO loop over a lightweight MoE transformer policy plus a DeepSpeed/TransformerEngine FP8 training stack for large-scale runs.
10
10
11
11
## Getting Started
12
12
@@ -31,6 +31,22 @@ This repository implements a Cursor-style tri-plane composed of an FP8 MoE train
0 commit comments