Skip to content

Commit 56b1f6d

Browse files
committed
Merge remote-tracking branch 'paddlenlp/develop' into dev_20241029_add_memory_count
2 parents 59cea97 + fb60645 commit 56b1f6d

File tree

683 files changed

+82913
-6996
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

683 files changed

+82913
-6996
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,18 @@
11
<!-- Demo: https://github.com/PaddlePaddle/PaddleNLP/pull/26 -->
2+
#### Before submitting
3+
4+
- [ ] Lint code. If there are lint issues, please format the code first.
5+
6+
```shell
7+
# Install and register `pre-commit` in the project folder
8+
pip install pre-commit && pre-commit install
9+
10+
# Process previous code files separately
11+
pre-commit run --file XXXX.py
12+
```
13+
14+
- [ ] Add test cases into `tests` folder. If there are codecov issues, please add tests cases first.
15+
216
### PR types
317
<!-- One of [ New features | Bug fixes | Function optimization | Performance optimization | Breaking changes | Others ] -->
418

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,3 +129,9 @@ FETCH_HEAD
129129
csrc/third_party/
130130
dataset/
131131
output/
132+
133+
# gen codes
134+
autogen/
135+
136+
# cutlass kernel
137+
!csrc/gpu/cutlass_kernels/gemm/collective/builders

.gitmodules

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[submodule "csrc/third_party/cutlass"]
2+
path = csrc/third_party/cutlass
3+
url = https://github.com/NVIDIA/cutlass.git
4+
[submodule "csrc/third_party/nlohmann_json"]
5+
path = csrc/third_party/nlohmann_json
6+
url = https://github.com/nlohmann/json.git

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
exclude: 'slm/model_zoo/gpt-3'
1+
exclude: 'slm/model_zoo/gpt-3;csrc/third_party'
22
repos:
33
# For Python files
44
- repo: https://github.com/psf/black.git
@@ -61,4 +61,4 @@ repos:
6161
entry: python scripts/codestyle/check_dead_links.py
6262
language: python
6363
files: \.(md|markdown|rst)$
64-
pass_filenames: true
64+
pass_filenames: true

README.md

Lines changed: 73 additions & 42 deletions
Large diffs are not rendered by default.

csrc/README.md

Lines changed: 4 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -10,32 +10,15 @@ pip install -r requirements.txt
1010

1111
## 编译 Cuda 算子
1212

13-
生成 FP8的 cutlass 算子
14-
```shell
15-
python utils/auto_gen_fp8_fp8_gemm_fused_kernels.py
16-
17-
python utils/auto_gen_fp8_fp8_dual_gemm_fused_kernels.py
18-
```
19-
20-
编译
21-
```shell
22-
python setup_cuda.py install
23-
```
24-
25-
### 手动安装 Cutlass 库
26-
1. 访问 Cutlass 仓库: [NVIDIA/cutlass](https://github.com/NVIDIA/cutlass)
27-
28-
2. 拉取代码:
29-
git clone -b v3.5.0 --single-branch https://github.com/NVIDIA/cutlass.git
30-
31-
3. 将下载的 `cutlass` 目录放在 `csrc/third_party/cutlass`
32-
33-
4. 重新编译 Cuda 算子
3413
```shell
3514
python setup_cuda.py install
3615
```
3716

3817
### FP8 GEMM 自动调优
18+
19+
确保 `cutlass` 库已经安装,然后执行以下命令进行自动调优。
20+
- 对于89架构的 GPU,CUDA 版本至少为12.4
21+
- 对于90架构的 GPU,CUDA 版本至少为12.0
3922
```shell
4023
sh tune_fp8_gemm.sh
4124
```

csrc/cpu/src/stop_generation_multi_ends.cc

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -15,20 +15,9 @@
1515
#include <stdlib.h>
1616
#include <string.h>
1717

18-
#include "paddle/extension.h"
18+
#include "helper.h"
1919
#include <stdio.h>
2020

21-
22-
bool is_in_end(const int64_t id, const int64_t* end_ids, int length) {
23-
bool flag = false;
24-
for (int i = 0; i < length; i++) {
25-
if (id == end_ids[i]) {
26-
return true;
27-
}
28-
}
29-
return flag;
30-
}
31-
3221
void set_value_by_flags(const bool* stop_flags,
3322
const int64_t* end_ids,
3423
int64_t* topk_ids,

0 commit comments

Comments
 (0)