Skip to content

Commit 4647ec2

Browse files
authored
[inference] release (#5747)
* [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release
1 parent df67476 commit 4647ec2

File tree

3 files changed

+39
-44
lines changed

3 files changed

+39
-44
lines changed

README.md

Lines changed: 16 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
</div>
2626

2727
## Latest News
28+
* [2024/05] [Large AI Models Inference Speed Doubled, Colossal-Inference Open Source Release](https://hpc-ai.com/blog/colossal-inference)
2829
* [2024/04] [Open-Sora Unveils Major Upgrade: Embracing Open Source with Single-Shot 16-Second Video Generation and 720p Resolution](https://hpc-ai.com/blog/open-soras-comprehensive-upgrade-unveiled-embracing-16-second-video-generation-and-720p-resolution-in-open-source)
2930
* [2024/04] [Most cost-effective solutions for inference, fine-tuning and pretraining, tailored to LLaMA3 series](https://hpc-ai.com/blog/most-cost-effective-solutions-for-inference-fine-tuning-and-pretraining-tailored-to-llama3-series)
3031
* [2024/03] [314 Billion Parameter Grok-1 Inference Accelerated by 3.8x, Efficient and Easy-to-Use PyTorch+HuggingFace version is Here](https://hpc-ai.com/blog/314-billion-parameter-grok-1-inference-accelerated-by-3.8x-efficient-and-easy-to-use-pytorchhuggingface-version-is-here)
@@ -75,11 +76,9 @@
7576
<li>
7677
<a href="#Inference">Inference</a>
7778
<ul>
79+
<li><a href="#Colossal-Inference">Colossal-Inference: Large AI Models Inference Speed Doubled</a></li>
7880
<li><a href="#Grok-1">Grok-1: 314B model of PyTorch + HuggingFace Inference</a></li>
7981
<li><a href="#SwiftInfer">SwiftInfer:Breaks the Length Limit of LLM for Multi-Round Conversations with 46% Acceleration</a></li>
80-
<li><a href="#GPT-3-Inference">GPT-3</a></li>
81-
<li><a href="#OPT-Serving">OPT-175B Online Serving for Text Generation</a></li>
82-
<li><a href="#BLOOM-Inference">176B BLOOM</a></li>
8382
</ul>
8483
</li>
8584
<li>
@@ -377,6 +376,19 @@ Please visit our [documentation](https://www.colossalai.org/) and [examples](htt
377376

378377

379378
## Inference
379+
### Colossal-Inference
380+
<p align="center">
381+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-1.png" width=1000/>
382+
</p>
383+
384+
<p align="center">
385+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-2.png" width=1000/>
386+
</p>
387+
388+
- Large AI models inference speed doubled, compared to the offline inference performance of vLLM in some cases.
389+
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/colossalai/inference)
390+
[[blog]](https://hpc-ai.com/blog/colossal-inference)
391+
380392
### Grok-1
381393
<p id="Grok-1" align="center">
382394
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/grok-1-inference.jpg" width=600/>
@@ -389,30 +401,13 @@ Please visit our [documentation](https://www.colossalai.org/) and [examples](htt
389401
[[HuggingFace Grok-1 PyTorch model weights]](https://huggingface.co/hpcai-tech/grok-1)
390402
[[ModelScope Grok-1 PyTorch model weights]](https://www.modelscope.cn/models/colossalai/grok-1-pytorch/summary)
391403

404+
### SwiftInfer
392405
<p id="SwiftInfer" align="center">
393406
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/SwiftInfer.jpg" width=800/>
394407
</p>
395408

396409
- [SwiftInfer](https://github.com/hpcaitech/SwiftInfer): Inference performance improved by 46%, open source solution breaks the length limit of LLM for multi-round conversations
397410

398-
<p id="GPT-3-Inference" align="center">
399-
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
400-
</p>
401-
402-
- [Energon-AI](https://github.com/hpcaitech/EnergonAI): 50% inference acceleration on the same hardware
403-
404-
<p id="OPT-Serving" align="center">
405-
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20serving.png" width=600/>
406-
</p>
407-
408-
- [OPT Serving](https://colossalai.org/docs/advanced_tutorials/opt_service): Try 175-billion-parameter OPT online services
409-
410-
<p id="BLOOM-Inference" align="center">
411-
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20Inference.PNG" width=800/>
412-
</p>
413-
414-
- [BLOOM](https://github.com/hpcaitech/EnergonAI/tree/main/examples/bloom): Reduce hardware deployment costs of 176-billion-parameter BLOOM by more than 10 times.
415-
416411
<p align="right">(<a href="#top">back to top</a>)</p>
417412

418413
## Installation

colossalai/inference/README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,15 @@
1818

1919

2020
## 📌 Introduction
21-
ColossalAI-Inference is a module which offers acceleration to the inference execution of Transformers models, especially LLMs. In ColossalAI-Inference, we leverage high-performance kernels, KV cache, paged attention, continous batching and other techniques to accelerate the inference of LLMs. We also provide simple and unified APIs for the sake of user-friendliness.
21+
ColossalAI-Inference is a module which offers acceleration to the inference execution of Transformers models, especially LLMs. In ColossalAI-Inference, we leverage high-performance kernels, KV cache, paged attention, continous batching and other techniques to accelerate the inference of LLMs. We also provide simple and unified APIs for the sake of user-friendliness. [[blog]](https://hpc-ai.com/blog/colossal-inference)
2222

23+
<p align="center">
24+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-1.png" width=1000/>
25+
</p>
26+
27+
<p align="center">
28+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-2.png" width=1000/>
29+
</p>
2330

2431
## 🕹 Usage
2532

docs/README-zh-Hans.md

Lines changed: 15 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
</div>
2525

2626
## 新闻
27+
* [2024/05] [Large AI Models Inference Speed Doubled, Colossal-Inference Open Source Release](https://hpc-ai.com/blog/colossal-inference)
2728
* [2024/04] [Open-Sora Unveils Major Upgrade: Embracing Open Source with Single-Shot 16-Second Video Generation and 720p Resolution](https://hpc-ai.com/blog/open-soras-comprehensive-upgrade-unveiled-embracing-16-second-video-generation-and-720p-resolution-in-open-source)
2829
* [2024/04] [Most cost-effective solutions for inference, fine-tuning and pretraining, tailored to LLaMA3 series](https://hpc-ai.com/blog/most-cost-effective-solutions-for-inference-fine-tuning-and-pretraining-tailored-to-llama3-series)
2930
* [2024/03] [314 Billion Parameter Grok-1 Inference Accelerated by 3.8x, Efficient and Easy-to-Use PyTorch+HuggingFace version is Here](https://hpc-ai.com/blog/314-billion-parameter-grok-1-inference-accelerated-by-3.8x-efficient-and-easy-to-use-pytorchhuggingface-version-is-here)
@@ -74,11 +75,9 @@
7475
<li>
7576
<a href="#推理">推理</a>
7677
<ul>
78+
<li><a href="#Colossal-Inference">Colossal-Inference: AI大模型推理速度翻倍</a></li>
7779
<li><a href="#Grok-1">Grok-1: 3140亿参数PyTorch + HuggingFace推理</a></li>
7880
<li><a href="#SwiftInfer">SwiftInfer:打破LLM多轮对话的长度限制,推理加速46%</a></li>
79-
<li><a href="#GPT-3-Inference">GPT-3</a></li>
80-
<li><a href="#OPT-Serving">1750亿参数OPT在线推理服务</a></li>
81-
<li><a href="#BLOOM-Inference">1760亿参数 BLOOM</a></li>
8281
</ul>
8382
</li>
8483
<li>
@@ -370,6 +369,19 @@ Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的
370369

371370

372371
## 推理
372+
### Colossal-Inference
373+
<p align="center">
374+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-1.png" width=1000/>
375+
</p>
376+
377+
<p align="center">
378+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-2.png" width=1000/>
379+
</p>
380+
381+
- AI大模型推理速度部分接近翻倍,与vLLM的离线推理性能相比
382+
[[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/colossalai/inference)
383+
[[博客]](https://hpc-ai.com/blog/colossal-inference)
384+
373385
### Grok-1
374386
<p id="Grok-1" align="center">
375387
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/grok-1-inference.jpg" width=600/>
@@ -388,25 +400,6 @@ Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的
388400

389401
- [SwiftInfer](https://github.com/hpcaitech/SwiftInfer): 开源解决方案打破了多轮对话的 LLM 长度限制,推理性能提高了46%
390402

391-
392-
<p id="GPT-3-Inference" align="center">
393-
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
394-
</p>
395-
396-
- [Energon-AI](https://github.com/hpcaitech/EnergonAI) :用相同的硬件推理加速50%
397-
398-
<p id="OPT-Serving" align="center">
399-
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20serving.png" width=600/>
400-
</p>
401-
402-
- [OPT推理服务](https://colossalai.org/docs/advanced_tutorials/opt_service): 体验1750亿参数OPT在线推理服务
403-
404-
<p id="BLOOM-Inference" align="center">
405-
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20Inference.PNG" width=800/>
406-
</p>
407-
408-
- [BLOOM](https://github.com/hpcaitech/EnergonAI/tree/main/examples/bloom): 降低1760亿参数BLOOM模型部署推理成本超10倍
409-
410403
<p align="right">(<a href="#top">返回顶端</a>)</p>
411404

412405
## 安装

0 commit comments

Comments
 (0)