Skip to content

Commit 9e1ef40

Browse files
fix: Update DeepSeek-V3 transfer time to ~15s (measured)
Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 49f6bad commit 9e1ef40

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

CLAUDE.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@ This file provides context for AI assistants (Claude, Cursor, Copilot) working o
88

99
### Key Value Proposition
1010

11-
- **Speed**: Transfer 681GB (DeepSeek-V3) in ~40-80 seconds vs. ~25 minutes from NVMe storage
11+
- **Speed**: Transfer 681GB (DeepSeek-V3) in ~15 seconds vs. ~25 minutes from NVMe storage
1212
- **Efficiency**: GPU-to-GPU transfers via GPUDirect RDMA bypass CPU entirely (zero-copy)
1313
- **Scalability**: Coordinate transfers across multiple vLLM instances in a cluster
1414

1515
### Current Status
1616

1717
| Model | Status | Transfer Time | Notes |
1818
|-------|--------|---------------|-------|
19-
| DeepSeek-V3 (671B, FP8) | Working | ~40s | 681GB across 8 GPUs @ ~112 Gbps |
19+
| DeepSeek-V3 (671B, FP8) | Working | ~15s | 681GB across 8 GPUs @ ~112 Gbps per link |
2020
| Llama 3.3 70B | Working | ~5s | 140GB across 8 GPUs @ ~112 Gbps |
2121

2222
---
@@ -379,7 +379,7 @@ DeepSeek-V3 takes ~40 minutes to fully warm up (loading + DeepGemm + CUDA graphs
379379
|--------|-------|
380380
| Model | DeepSeek-V3 (671B, FP8) |
381381
| Total Data | 681 GB (8 workers × 85 GB) |
382-
| Transfer Time | 40-80 seconds (baseline) |
382+
| Transfer Time | ~15 seconds (8 parallel RDMA streams @ 112 Gbps each) |
383383
| Per-Worker Speed | 60-112 Gbps |
384384
| Theoretical Max | 400 Gbps per NIC |
385385

0 commit comments

Comments
 (0)