|
25 | 25 |
|
26 | 26 | </div>
|
27 | 27 |
|
28 |
| -## Get Started with Colossal-AI Without Setup |
| 28 | +## Instantly Run Colossal-AI on Enterprise-Grade GPUs |
29 | 29 |
|
30 |
| -Access high-end, on-demand compute for your research instantly—no setup needed. |
| 30 | +Skip the setup. Access a powerful, pre-configured Colossal-AI environment on [**HPC-AI Cloud**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai). |
31 | 31 |
|
32 |
| -Sign up now and get $10 in credits! |
| 32 | +Train your models and scale your AI workload in one click! |
33 | 33 |
|
34 |
| -Limited Academic Bonuses: |
| 34 | +* **NVIDIA Blackwell B200s**: Experience the next generation of AI performance ([See Benchmarks](https://hpc-ai.com/blog/b200)). Now available on cloud from **$2.47/hr**. |
| 35 | +* **Cost-Effective H200 Cluster**: Get premier performance with on-demand rental from just **$1.99/hr**. |
35 | 36 |
|
36 |
| -* Top up $1,000 and receive 300 credits |
37 |
| -* Top up $500 and receive 100 credits |
| 37 | +[**Get Started Now & Claim Your Free Credits →**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai) |
38 | 38 |
|
39 | 39 | <div align="center">
|
40 | 40 | <a href="https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai">
|
41 |
| - <img src="https://github.com/hpcaitech/public_assets/blob/main/colossalai/img/2-2.gif" width="850" /> |
| 41 | + <img src="https://github.com/hpcaitech/public_assets/blob/main/colossalai/img/2-3.png" width="850" /> |
42 | 42 | </a>
|
43 | 43 | </div>
|
44 | 44 |
|
| 45 | +### Colossal-AI Benchmark |
| 46 | + |
| 47 | +To see how these performance gains translate to real-world applications, we conducted a large language model training benchmark using Colossal-AI on Llama-like models. The tests were run on both 8-card and 16-card configurations for 7B and 70B models, respectively. |
| 48 | + |
| 49 | +| GPU | GPUs | Model Size | Parallelism | Batch Size per DP | Seqlen | Throughput | TFLOPS/GPU | Peak Mem(MiB) | |
| 50 | +| :-----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: | :-------------: | |
| 51 | +| H200 | 8 | 7B | zero2(dp8) | 36 | 4096 | 17.13 samp/s | 534.18 | 119040.02 | |
| 52 | +| H200 | 16 | 70B | zero2 | 48 | 4096 | 3.27 samp/s | 469.1 | 150032.23 | |
| 53 | +| B200 | 8 | 7B | zero1(dp2)+tp2+pp4 | 128 | 4096 | 25.83 samp/s | 805.69 | 100119.77 | |
| 54 | +| H200 | 16 | 70B | zero1(dp2)+tp2+pp4 | 128 | 4096 | 5.66 samp/s | 811.79 | 100072.02 | |
| 55 | + |
| 56 | +The results from the Colossal-AI benchmark provide the most practical insight. For the 7B model on 8 cards, the **B200 achieved a 50% higher throughput** and a significant increase in TFLOPS per GPU. For the 70B model on 16 cards, the B200 again demonstrated a clear advantage, with **over 70% higher throughput and TFLOPS per GPU**. These numbers show that the B200's performance gains translate directly to faster training times for large-scale models. |
45 | 57 |
|
46 | 58 | ## Latest News
|
47 | 59 | * [2025/02] [DeepSeek 671B Fine-Tuning Guide Revealed—Unlock the Upgraded DeepSeek Suite with One Click, AI Players Ecstatic!](https://company.hpc-ai.com/blog/shocking-release-deepseek-671b-fine-tuning-guide-revealed-unlock-the-upgraded-deepseek-suite-with-one-click-ai-players-ecstatic)
|
|
0 commit comments