fix: Update DeepSeek-V3 transfer time to ~15s (measured)

KavinKrishnan · cursoragent · KavinKrishnan · commit 9e1ef40b5706 · 2026-02-06T15:35:59.000-08:00
Co-authored-by: Cursor &lt;cursoragent@cursor.com&gt;
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -8,15 +8,15 @@ This file provides context for AI assistants (Claude, Cursor, Copilot) working o
 
 ### Key Value Proposition
 
-- **Speed**: Transfer 681GB (DeepSeek-V3) in ~40-80 seconds vs. ~25 minutes from NVMe storage
+- **Speed**: Transfer 681GB (DeepSeek-V3) in ~15 seconds vs. ~25 minutes from NVMe storage
 - **Efficiency**: GPU-to-GPU transfers via GPUDirect RDMA bypass CPU entirely (zero-copy)
 - **Scalability**: Coordinate transfers across multiple vLLM instances in a cluster
 
 ### Current Status
 
 | Model | Status | Transfer Time | Notes |
 |-------|--------|---------------|-------|
-| DeepSeek-V3 (671B, FP8) | Working | ~40s | 681GB across 8 GPUs @ ~112 Gbps |
+| DeepSeek-V3 (671B, FP8) | Working | ~15s | 681GB across 8 GPUs @ ~112 Gbps per link |
 | Llama 3.3 70B | Working | ~5s | 140GB across 8 GPUs @ ~112 Gbps |
 
 ---
@@ -379,7 +379,7 @@ DeepSeek-V3 takes ~40 minutes to fully warm up (loading + DeepGemm + CUDA graphs
 |--------|-------|
 | Model | DeepSeek-V3 (671B, FP8) |
 | Total Data | 681 GB (8 workers × 85 GB) |
-| Transfer Time | 40-80 seconds (baseline) |
+| Transfer Time | ~15 seconds (8 parallel RDMA streams @ 112 Gbps each) |
 | Per-Worker Speed | 60-112 Gbps |
 | Theoretical Max | 400 Gbps per NIC |