Skip to content

Commit ea28142

Browse files
author
gongenlei
authored
Update codegen doc (#3193)
* update doc * update cod * update perf
1 parent 54df619 commit ea28142

File tree

5 files changed

+133
-147
lines changed

5 files changed

+133
-147
lines changed

examples/code_generation/codegen/README.md

Lines changed: 114 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,20 @@
55
- [简介](#简介)
66
- [特色](#特色)
77
- [效果展示](#效果展示)
8-
- [开箱即用](#开箱即用)
9-
- [支持单条、批量预测](#支持单条批量预测)
10-
- [可配置参数说明](#可配置参数说明)
11-
- [训练定制](#训练定制)
8+
- [Github Copilot插件配置](#GithubCopilot插件配置)
129
- [环境依赖](#环境依赖)
1310
- [代码结构说明](#代码结构说明)
14-
- [数据准备](#数据准备)
15-
- [从本地文件创建数据集](#从本地文件创建数据集)
16-
- [Github Copilot插件配置](#GithubCopilot插件配置)
17-
- [插件环境依赖](#插件环境依赖)
1811
- [启动服务](#启动服务)
19-
- [配置参数](#配置参数说明)
12+
- [配置参数](#配置参数说明)
2013
- [测试服务](#测试服务)
2114
- [配置插件](#配置插件)
2215
- [注意事项](#注意事项)
16+
- [训练定制](#训练定制)
17+
- [数据准备](#数据准备)
18+
- [从本地文件创建数据集](#从本地文件创建数据集)
19+
- [模型训练](#模型训练)
2320
- [TaskFlow调用](#TaskFlow调用)
24-
- [使用案例](#使用案例)
21+
- [更多使用案例](#更多使用案例)
2522
- [模型列表](#模型列表)
2623
- [References](#references)
2724

@@ -41,13 +38,47 @@
4138

4239

4340
## 效果展示
41+
- 解算法题。求解无重复字符的最长子串的长度
42+
```python
43+
from paddlenlp import Taskflow
4444

45-
## 训练定制
45+
prompt = "def lengthOfLongestSubstring(self, s: str) -> int:"
46+
codegen = Taskflow("code_generation", model="Salesforce/codegen-2B-mono",decode_strategy="greedy_search", repetition_penalty=1.0)
47+
print(codegen(prompt))
48+
```
49+
结果输出为:
50+
```python
51+
if not s:
52+
return 0
53+
54+
start = 0
55+
end = 0
56+
max_len = 0
57+
58+
while end < len(s):
59+
if s[end] not in s[start:end]:
60+
max_len = max(max_len, end - start + 1)
61+
end += 1
62+
else:
63+
start += 1
64+
65+
return max_len
66+
```
67+
<p align="center">
68+
<img src="https://user-images.githubusercontent.com/24390500/182512164-946d959c-57b1-49e6-b9a5-be47281d1ee2.png"/> <br />
69+
</p>
70+
71+
72+
## GithubCopilot插件配置
73+
74+
**以VS Code的插件为例**
4675

4776
### 环境依赖
4877
- PaddleNLP >= 2.4.0
4978
- PaddlePaddle >= 2.3.1
5079

80+
其他依赖:`pip install -r requirements.txt`
81+
5182
### 代码结构说明
5283

5384
以下是本项目主要代码结构及说明:
@@ -61,6 +92,77 @@ codegen/
6192
└── README.md # 说明文档
6293
```
6394

95+
### 启动服务
96+
97+
```python
98+
python codegen_server.py
99+
```
100+
101+
##### 配置参数说明
102+
在codegen_server.py中配置如下参数:
103+
- `model_name_or_path`:模型名,默认为 "Salesforce/codegen-2B-mono"
104+
- `device`:运行设备,默认为"gpu"
105+
- `temperature`:解码参数temperature,默认为0.5
106+
- `top_k`:解码参数top_k,默认为10
107+
- `top_p`:解码参数top_p,默认为1.0
108+
- `repetition_penalty`:解码重复惩罚项,默认为1.0
109+
- `min_length`:生成的最小长度,默认为0
110+
- `max_length`:生成的最大长度,默认为16
111+
- `decode_strategy`:解码策略,默认为"sampling"
112+
- `load_state_as_np`:以numpy格式加载模型参数,可节省显存,默认为True
113+
- `use_faster`:是否使用Fastergeneration,可加速推理,默认为True
114+
- `use_fp16_decoding`:是否使用fp16推理,可节省显存和加速推理,默认为True
115+
116+
### 测试服务
117+
```python
118+
import openai
119+
openai.api_key = 'dummy'
120+
openai.api_base = 'http://127.0.0.1:8978'
121+
result = openai.Completion.create(
122+
engine='codegen', prompt='def hello', max_tokens=16, temperature=0.1)
123+
print(result)
124+
'''
125+
<OpenAIObject text_completion id=cmpl-dmhoeHmcw9DJ4NeqOJDQVKv3iivJ0 at 0x7fe7a81d42c0> JSON: {
126+
"id": "cmpl-dmhoeHmcw9DJ4NeqOJDQVKv3iivJ0",
127+
"choices": [
128+
{
129+
"text": "_world():\n print(\"Hello World!\")\n\n\n#",
130+
"index": 0,
131+
"finish_reason": "stop",
132+
"logprobs": null,
133+
}
134+
],
135+
"usage": {
136+
"completion_tokens": null,
137+
"prompt_tokens": null,
138+
"total_tokens": null
139+
}
140+
}
141+
'''
142+
```
143+
**注意**:如果要从本地访问服务器,`127.0.0.1`需要换成服务器的对外IP。
144+
145+
146+
### 配置插件
147+
打开用户设置([settings.json](https://code.visualstudio.com/docs/getstarted/settings#_settings-file-locations)),增加一行配置
148+
```json
149+
"github.copilot.advanced": {
150+
"debug.overrideEngine": "codegen",
151+
"debug.testOverrideProxyUrl": "http://127.0.0.1:8978",
152+
"debug.overrideProxyUrl": "http://127.0.0.1:8978"
153+
},
154+
```
155+
接下来就可以愉快地使用了😊。
156+
157+
158+
#### 注意事项
159+
- 如果使用FasterGeneration,需要设置[codegen_server.py](#配置参数说明)`use_faster=True`,第一次推理会涉及到编译,会耗费一些时间。FasterGeneration的环境依赖参考[这里](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/ops/README.md#%E4%BD%BF%E7%94%A8%E7%8E%AF%E5%A2%83%E8%AF%B4%E6%98%8E)
160+
- 如果要使用自己训练好的模型,可以设置[codegen_server.py](#配置参数说明)`model_name_or_path`为本地模型路径。
161+
- 如果要从本地访问服务器,上述的`127.0.0.1`需要换成服务器的对外IP。
162+
163+
164+
## 训练定制
165+
64166
### 数据准备
65167

66168
#### 从本地文件创建数据集
@@ -137,135 +239,11 @@ python -m paddle.distributed.launch --gpus 0,1 run_clm.py \
137239

138240
**NOTE:** 如需恢复模型训练,`model_name_or_path`配置本地模型的目录地址即可。
139241

140-
## GithubCopilot插件配置
141-
以下以VS Code的插件为例
142-
### 插件环境依赖
143-
- PaddleNLP >= 2.4.0
144-
- PaddlePaddle >= 2.3.1
145-
146-
其他依赖:`pip install -r requirements.txt`
147-
148-
149-
### 启动服务
150-
151-
```python
152-
python codegen_server.py
153-
```
154-
155-
##### 配置参数说明
156-
在codegen_server.py中配置如下参数:
157-
- `model_name_or_path`:模型名,默认为 "Salesforce/codegen-2B-mono"
158-
- `device`:运行设备,默认为"gpu"
159-
- `temperature`:解码参数temperature,默认为0.5
160-
- `top_k`:解码参数top_k,默认为10
161-
- `top_p`:解码参数top_p,默认为1.0
162-
- `repetition_penalty`:解码重复惩罚项,默认为1.0
163-
- `min_length`:生成的最小长度,默认为0
164-
- `max_length`:生成的最大长度,默认为16
165-
- `decode_strategy`:解码策略,默认为"sampling"
166-
- `load_state_as_np`:以numpy格式加载模型参数,可节省显存,默认为True
167-
- `use_faster`:是否使用Fastergeneration,可加速推理,默认为True
168-
- `use_fp16_decoding`:是否使用fp16推理,可节省显存和加速推理,默认为True
169-
170-
### 测试服务
171-
`pip install --upgrade openai`
172-
173-
```python
174-
import openai
175-
openai.api_key = 'dummy'
176-
openai.api_base = 'http://127.0.0.1:8000/v1'
177-
result = openai.Completion.create(
178-
engine='codegen', prompt='def hello', max_tokens=16, temperature=0.1)
179-
print(result)
180-
'''
181-
<OpenAIObject text_completion id=cmpl-dmhoeHmcw9DJ4NeqOJDQVKv3iivJ0 at 0x7fe7a81d42c0> JSON: {
182-
"id": "cmpl-dmhoeHmcw9DJ4NeqOJDQVKv3iivJ0",
183-
"choices": [
184-
{
185-
"text": "_world():\n print(\"Hello World!\")\n\n\n#",
186-
"index": 0,
187-
"finish_reason": "stop",
188-
"logprobs": null,
189-
}
190-
],
191-
"usage": {
192-
"completion_tokens": null,
193-
"prompt_tokens": null,
194-
"total_tokens": null
195-
}
196-
}
197-
'''
198-
199-
```
200-
201-
### 配置插件
202-
打开用户设置([settings.json](https://code.visualstudio.com/docs/getstarted/settings#_settings-file-locations)),增加一行配置
203-
```json
204-
"github.copilot.advanced": {
205-
"debug.overrideEngine": "codegen",
206-
"debug.testOverrideProxyUrl": "http://127.0.0.1:8978",
207-
"debug.overrideProxyUrl": "http://127.0.0.1:8978"
208-
},
209-
```
210-
211-
接下来就可以愉快地使用了😊。
212-
#### 注意事项
213-
- 如果使用FasterGeneration,需要设置[codegen_server.py](#配置参数说明)`use_faster=True`,第一次推理会涉及到编译,会耗费一些时间。FasterGeneration的环境依赖参考[这里](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/ops/README.md#%E4%BD%BF%E7%94%A8%E7%8E%AF%E5%A2%83%E8%AF%B4%E6%98%8E)
214-
- 如果要使用自己训练好的模型,可以设置[codegen_server.py](#配置参数说明)`model_name_or_path`为本地模型路径。
215242

216243
## TaskFlow调用
217244
参考[TaskFlow文档](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md)
218245

219-
## 使用案例
220-
- 解算法题。求解无重复字符的最长子串的长度
221-
```python
222-
import re
223-
import paddle
224-
from paddlenlp.transformers import CodeGenTokenizer, CodeGenForCausalLM
225-
226-
# The supported models are shown in the following table
227-
model_name = 'Salesforce/codegen-2B-mono'
228-
# Init tokenizer
229-
tokenizer = CodeGenTokenizer.from_pretrained(model_name)
230-
# Init model
231-
model = CodeGenForCausalLM.from_pretrained(model_name)
232-
233-
prompt = "def lengthOfLongestSubstring(self, s: str) -> int:"
234-
inputs = tokenizer([prompt])
235-
inputs = {k: paddle.to_tensor(v) for (k, v) in inputs.items()}
236-
# Generate
237-
output, score = model.generate(inputs['input_ids'],
238-
max_length=256,
239-
decode_strategy='greedy_search')
240-
# Decode the result
241-
print(
242-
re.split(
243-
"\nclass|\ndef|\n#|\n@|\nprint|\nif",
244-
tokenizer.decode(output[0],
245-
skip_special_tokens=True,
246-
spaces_between_special_tokens=False))[0].rstrip())
247-
```
248-
结果输出为:
249-
```python
250-
if not s:
251-
return 0
252-
253-
start = 0
254-
end = 0
255-
max_len = 0
256-
257-
while end < len(s):
258-
if s[end] not in s[start:end]:
259-
max_len = max(max_len, end - start + 1)
260-
end += 1
261-
else:
262-
start += 1
263-
264-
return max_len
265-
```
266-
<p align="center">
267-
<img src="https://user-images.githubusercontent.com/24390500/182512164-946d959c-57b1-49e6-b9a5-be47281d1ee2.png"/> <br />
268-
</p>
246+
## 更多使用案例
269247

270248
- 根据注释/功能描述写代码
271249

examples/code_generation/codegen/codegen_server.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424

2525

2626
class DefaultConfig:
27-
model_name_or_path = "Salesforce/codegen-2B-mono"
27+
model_name_or_path = "Salesforce/codegen-350M-mono"
2828
device = "gpu"
2929
temperature = 0.5
3030
top_k = 10

examples/code_generation/codegen/requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ fastapi==0.79.0
22
pydantic==1.9.1
33
python-dotenv==0.20.0
44
sse_starlette==0.10.3
5-
uvicorn==0.17.6
5+
uvicorn==0.17.6
6+
openai==0.8.0

faster_generation/perf/codegen_perf.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,14 +32,14 @@ def query_by_id(gpu_id=2):
3232

3333

3434
def perf_pd(args):
35-
start_mem = query_by_id()
35+
start_mem = query_by_id(args.gpu_id)
3636
place = "gpu"
3737
place = paddle.set_device(place)
3838
tokenizer = CodeGenTokenizer.from_pretrained(args.model_name_or_path)
3939
model = CodeGenForCausalLM.from_pretrained(args.model_name_or_path,
4040
load_state_as_np=True)
4141
model.eval()
42-
load_mem = query_by_id()
42+
load_mem = query_by_id(args.gpu_id)
4343

4444
input_ids_np = [
4545
np.random.choice(list(tokenizer.decoder.keys())[:-1], args.input_len)
@@ -63,7 +63,7 @@ def perf_pd(args):
6363
top_p=args.top_p,
6464
use_faster=args.use_faster,
6565
use_fp16_decoding=args.use_fp16_decoding)
66-
generate_mem = query_by_id()
66+
generate_mem = query_by_id(args.gpu_id)
6767
paddle.device.cuda.synchronize(place)
6868
pd_cost = (time.perf_counter() - start) / (num_loop -
6969
num_loop // 2) * 1000
@@ -73,13 +73,13 @@ def perf_pd(args):
7373
def perf_hf(args):
7474
import torch
7575
from transformers import CodeGenTokenizer as hf_tokenizer, CodeGenForCausalLM as hf_codegen
76-
start_mem = query_by_id()
76+
start_mem = query_by_id(args.gpu_id)
7777
device = torch.device("cuda")
7878
tokenizer = hf_tokenizer.from_pretrained(args.model_name_or_path)
7979
model = hf_codegen.from_pretrained(args.model_name_or_path)
8080
model.to(device)
8181
model.eval()
82-
load_mem = query_by_id()
82+
load_mem = query_by_id(args.gpu_id)
8383

8484
input_ids_np = [
8585
np.random.choice(list(tokenizer.decoder.keys()), args.input_len)
@@ -101,7 +101,7 @@ def perf_hf(args):
101101
min_length=args.generate_len + input_ids.shape[-1],
102102
top_k=args.top_k,
103103
top_p=args.top_p)
104-
generate_mem = query_by_id()
104+
generate_mem = query_by_id(args.gpu_id)
105105
torch.cuda.synchronize()
106106
hf_cost = (time.perf_counter() - start) / (num_loop -
107107
num_loop // 2) * 1000
@@ -148,6 +148,10 @@ def parse_args():
148148
default=20,
149149
type=int,
150150
help="Length of output . ")
151+
parser.add_argument("--gpu_id",
152+
default=2,
153+
type=int,
154+
help="The id of GPU . ")
151155
parser.add_argument(
152156
'--use_faster',
153157
action='store_true',

0 commit comments

Comments
 (0)