Skip to content

Commit 6305721

Browse files
authored
[Faster Transformer] Collect faster transformer codes (#528)
* collect faster transformer codes * update from c++ 11 to 14
1 parent dc8c8f2 commit 6305721

26 files changed

+49
-20
lines changed

paddlenlp/ops/CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ find_package(CUDA 10.1 REQUIRED)
1818

1919
INCLUDE(ExternalProject)
2020

21-
set(CXX_STD "11" CACHE STRING "C++ standard")
21+
set(CXX_STD "14" CACHE STRING "C++ standard")
2222

2323
option(ON_INFER "Compile with inference. " OFF)
2424
option(WITH_GPU "Compile with GPU/CPU, default use CPU." ON)
@@ -223,4 +223,4 @@ if(ON_INFER AND WITH_GPT)
223223
)
224224
endif()
225225

226-
add_subdirectory(src)
226+
add_subdirectory(faster_transformer)

paddlenlp/ops/README.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@
44

55
```text
66
.
7-
├── sample/ # 基于 Transformer 机器翻译使用样例(beam search)
8-
├── src/ # 自定义 OP C++ CUDA 代码
9-
└── transformer/ # Python API 封装脚本
7+
├── faster_transformer/ # 基于自定义 op Faster Transformer 子路径
8+
├── sample/ # 基于 Faster Transformer 使用样例
9+
├── src/ # 自定义 OP C++ CUDA 代码
10+
└── transformer/ # Python API 封装脚本
11+
└── patches # 自定义 op 第三方库自定义补丁代码
1012
```
1113

1214
## 使用环境说明
@@ -95,7 +97,7 @@ transformer = FasterTransformer(
9597
use_fp16_decoding=args.use_fp16_decoding)
9698
```
9799

98-
更详细的例子可以参考 `./sample/decoding_sample.py` 以及 `./sample/encoder_decoding_sample.py`,我们提供了更详细用例。
100+
更详细的例子可以参考 `./faster_transformer/sample/decoding_sample.py` 以及 `./sample/encoder_decoding_sample.py`,我们提供了更详细用例。
99101

100102
#### 执行 Transformer decoding on PaddlePaddle
101103

@@ -105,7 +107,7 @@ transformer = FasterTransformer(
105107
export CUDA_VISIBLE_DEVICES=0
106108
export FLAGS_fraction_of_gpu_memory_to_use=0.1
107109
./build/third-party/build/bin/decoding_gemm 32 4 8 64 30000 32 512 0
108-
python sample/decoding_sample.py --config ./sample/config/decoding.sample.yaml --decoding_lib ./build/lib/libdecoding_op.so
110+
python ./faster_transformer/sample/decoding_sample.py --config ./faster_transformer/sample/config/decoding.sample.yaml --decoding_lib ./build/lib/libdecoding_op.so
109111
```
110112

111113
使用 PaddlePaddle 仅执行 decoding 测试(float16):
@@ -115,7 +117,7 @@ python sample/decoding_sample.py --config ./sample/config/decoding.sample.yaml -
115117
export CUDA_VISIBLE_DEVICES=0
116118
export FLAGS_fraction_of_gpu_memory_to_use=0.1
117119
./build/third-party/build/bin/decoding_gemm 32 4 8 64 30000 32 512 1
118-
python sample/decoding_sample.py --config ./sample/config/decoding.sample.yaml --decoding_lib ./build/lib/libdecoding_op.so --use_fp16_decoding
120+
python ./faster_transformer/sample/decoding_sample.py --config ./faster_transformer/sample/config/decoding.sample.yaml --decoding_lib ./build/lib/libdecoding_op.so --use_fp16_decoding
119121
```
120122

121123
其中,`decoding_gemm` 不同参数的意义可以参考 [FasterTransformer 文档](https://github.com/NVIDIA/FasterTransformer/tree/v3.1#execute-the-decoderdecoding-demos)
@@ -151,15 +153,15 @@ gpt = FasterGPT(
151153

152154
目前,GPT-2 的例子仅支持 `batch size``1` 或是 batch 内输入的序列长度相等的情况。并且,仅支持 topk-sampling 和 topp-sampling,不支持 beam-search。
153155

154-
更详细的例子可以参考 `./sample/gpt_sample.py`,我们提供了更详细用例。
156+
更详细的例子可以参考 `./faster_transformer/sample/gpt_sample.py`,我们提供了更详细用例。
155157

156158
#### 执行 GPT-2 decoding on PaddlePaddle
157159

158160
使用 PaddlePaddle 仅执行 decoding 测试(float32):
159161

160162
``` sh
161163
export CUDA_VISIBLE_DEVICES=0
162-
python sample/gpt_sample.py --model_name_or_path gpt2-medium-en --decoding_lib ./build/lib/libdecoding_op.so --batch_size 1 --topk 4 --topp 0.0 --max_out_len 32 --start_token "<|endoftext|>" --end_token "<|endoftext|>" --temperature 1.0
164+
python ./faster_transformer/sample/gpt_sample.py --model_name_or_path gpt2-medium-en --decoding_lib ./build/lib/libdecoding_op.so --batch_size 1 --topk 4 --topp 0.0 --max_out_len 32 --start_token "<|endoftext|>" --end_token "<|endoftext|>" --temperature 1.0
163165
```
164166

165167
其中,各个选项的意义如下:
@@ -204,7 +206,7 @@ cd PaddleNLP/paddlenlp/ops/
204206
``` sh
205207
mkdir build
206208
cd build/
207-
cmake .. -DSM=xx -DCMAKE_BUILD_TYPE=Release -DPADDLE_LIB=/path/to/paddle_inference_lib/ -DDEMO=./demo/transformer_e2e.cc -DWITH_STATIC_LIB=OFF -DON_INFER=ON -DWITH_MKL=ON
209+
cmake .. -DSM=xx -DCMAKE_BUILD_TYPE=Release -DPADDLE_LIB=/path/to/paddle_inference_lib/ -DDEMO=./faster_transformer/src/demo/transformer_e2e.cc -DWITH_STATIC_LIB=OFF -DON_INFER=ON -DWITH_MKL=ON
208210
make -j
209211
cd ../
210212
```
@@ -224,7 +226,7 @@ cd ../
224226
└── threadpool/
225227
└── version.txt
226228
```
227-
* `-DDEMO` 说明预测库使用 demo 的位置。比如指定 -DDEMO=./demo/transformer_e2e.cc 或是 -DDEMO=./demo/gpt.cc。
229+
* `-DDEMO` 说明预测库使用 demo 的位置。比如指定 -DDEMO=./faster_transformer/src/demo/transformer_e2e.cc 或是 -DDEMO=./faster_transformer/src/demo/gpt.cc。
228230
* `-DWITH_GPT`,如果是编译 GPT 的预测库可执行文件,需要加上 `-DWITH_GPT=ON`
229231
* **当使用预测库的自定义 op 的时候,请务必开启 `-DON_INFER=ON` 选项,否则,不会得到预测库的可执行文件。**
230232

@@ -253,10 +255,10 @@ cd bin/
253255

254256
#### 执行 GPT decoding on PaddlePaddle
255257

256-
如果需要使用 Paddle Inference 预测库针对 GPT 进行预测,首先,需要导出预测模型,可以通过 `sample/gpt_export_model_sample.py` 脚本获取预测库用模型,执行方式如下所示:
258+
如果需要使用 Paddle Inference 预测库针对 GPT 进行预测,首先,需要导出预测模型,可以通过 `./faster_transformer/sample/gpt_export_model_sample.py` 脚本获取预测库用模型,执行方式如下所示:
257259

258260
``` sh
259-
python sample/gpt_export_model_sample.py --model_name_or_path gpt2-medium-en --decoding_lib ./build/lib/libdecoding_op.so --batch_size 1 --topk 4 --topp 0.0 --max_out_len 32 --start_token "<|endoftext|>" --end_token "<|endoftext|>" --temperature 1.0 --inference_model_dir ./infer_model/
261+
python ./faster_transformer/sample/gpt_export_model_sample.py --model_name_or_path gpt2-medium-en --decoding_lib ./build/lib/libdecoding_op.so --topk 4 --topp 0.0 --max_out_len 32 --start_token "<|endoftext|>" --end_token "<|endoftext|>" --temperature 1.0 --inference_model_dir ./infer_model/
260262
```
261263

262264
各个选项的意义与上文的 `gpt_sample.py` 的选项相同。额外新增一个 `--inference_model_dir` 选项用于指定保存的模型文件、词表等文件。若是使用的模型是 gpt2-medium-en,保存之后,`./infer_model/` 目录下组织的结构如下:

paddlenlp/ops/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
from .transformer.decoding import *
16-
from .transformer.faster_transformer import *
15+
from .faster_transformer.transformer.decoding import *
16+
from .faster_transformer.transformer.faster_transformer import *
1717
from .einsum import *
1818
from .distributed import *
1919
from . import optimizer
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
add_subdirectory(src)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)