@@ -16,8 +16,8 @@ This file provides context for AI assistants (Claude, Cursor, Copilot) working o
1616
1717| Model | Status | Transfer Time | Notes |
1818| -------| --------| ---------------| -------|
19- | DeepSeek-V3 (671B, FP8) | Working | 40-80s | 681GB across 8 GPUs |
20- | Llama 3.3 70B | Working | ~ 5s | 140GB across 4- 8 GPUs |
19+ | DeepSeek-V3 (671B, FP8) | Working | ~ 40s | 681GB across 8 GPUs @ ~ 112 Gbps |
20+ | Llama 3.3 70B | Working | ~ 5s | 140GB across 8 GPUs @ ~ 112 Gbps |
2121
2222---
2323
@@ -125,7 +125,7 @@ UCX_LOG_LEVEL: "WARN" # DEBUG for troubleshooting
125125
126126` ` `
127127modelexpress/
128- ├── CLAUDE.md # THIS FILE - AI assistant context
128+ ├── CLAUDE.md # THIS FILE (project root) - AI assistant context
129129├── modelexpress_server/ # Rust gRPC server
130130│ └── src/
131131│ ├── main.rs
@@ -162,7 +162,7 @@ Contains custom vLLM model loaders:
162162
163163- ** ` MxSourceModelLoader ` ** : Loads weights from disk, registers with NIXL, publishes metadata
164164- ** ` MxTargetModelLoader ` ** : Creates dummy weights, receives via RDMA, applies FP8 processing
165- - ** ` SourceReadyCoordinator ` ** : Redis -based coordination for source-target synchronization
165+ - ** ` SourceReadyCoordinator ` ** : gRPC -based coordination for source-target synchronization (via MxClient)
166166
167167``` python
168168class MxSourceModelLoader (DefaultModelLoader ):
@@ -214,20 +214,20 @@ Rust gRPC service implementation:
214214### Building Docker Image
215215
216216``` bash
217- cd /home/kavink/work/gitlab /modelexpress
217+ cd path/to /modelexpress
218218
219219# Build client image
220220docker build -f examples/p2p_transfer_k8s/Dockerfile.client \
221- -t nvcr.io/nvidian/dynamo-dev/modelexpress-p2p-client :YOUR_TAG .
221+ -t nvcr.io/nvidian/dynamo-dev/IMAGE_NAME :YOUR_TAG .
222222
223- docker push nvcr.io/nvidian/dynamo-dev/modelexpress-p2p-client :YOUR_TAG
223+ docker push nvcr.io/nvidian/dynamo-dev/IMAGE_NAME :YOUR_TAG
224224```
225225
226226### Deploying to Kubernetes
227227
228228``` bash
229229# Namespace
230- NAMESPACE=kavin
230+ NAMESPACE=< your-namespace >
231231
232232# 1. Flush Redis (clear stale metadata)
233233microk8s kubectl -n $NAMESPACE exec deploy/modelexpress-server -c redis -- redis-cli FLUSHALL
@@ -247,14 +247,14 @@ watch microk8s kubectl -n $NAMESPACE get pods -l 'app in (mx-source, mx-target)'
247247
248248``` bash
249249# Stream logs
250- microk8s kubectl -n kavin logs -f deploy/mx-source
251- microk8s kubectl -n kavin logs -f deploy/mx-target
250+ kubectl -n $NAMESPACE logs -f deploy/mx-source
251+ kubectl -n $NAMESPACE logs -f deploy/mx-target
252252
253253# Check Redis state
254- microk8s kubectl -n kavin exec deploy/modelexpress-server -c redis -- redis-cli KEYS ' *'
254+ kubectl -n $NAMESPACE exec deploy/modelexpress-server -c redis -- redis-cli KEYS ' *'
255255
256256# Test inference
257- microk8s kubectl -n kavin exec deploy/mx-target -- curl -s http://localhost:8000/v1/completions \
257+ kubectl -n $NAMESPACE exec deploy/mx-target -- curl -s http://localhost:8000/v1/completions \
258258 -H " Content-Type: application/json" \
259259 -d ' {"model": "deepseek-ai/DeepSeek-V3", "prompt": "Hello", "max_tokens": 10}'
260260```
@@ -268,8 +268,7 @@ microk8s kubectl -n kavin exec deploy/mx-target -- curl -s http://localhost:8000
268268| Variable | Default | Description |
269269| ----------| ---------| -------------|
270270| ` MX_REGISTER_LOADERS ` | ` 1 ` | Auto-register mx-source/mx-target loaders with vLLM |
271- | ` MX_SERVER_ADDRESS ` | ` modelexpress-server:8001 ` | gRPC server address |
272- | ` MX_REDIS_HOST ` | ` modelexpress-server ` | Redis host for coordination |
271+ | ` MODEL_EXPRESS_URL ` | ` localhost:8001 ` | gRPC server address (also reads ` MX_SERVER_ADDRESS ` for compat) |
273272| ` MX_CONTIGUOUS_REG ` | ` 0 ` | Enable contiguous region registration (experimental) |
274273| ` MX_EXPECTED_WORKERS ` | ` 8 ` | Number of GPU workers to wait for |
275274| ` MX_SYNC_PUBLISH ` | ` 1 ` | Source: wait for all workers before publishing |
0 commit comments