Skip to content

Commit a16c4bc

Browse files
authored
Add PPMiniLM class (#1512)
* add ppminilm class * update copyright * add ppminilm tokenizer to __init__ * update ppminilm * remove useless comments, remove ernie * remove useless readme, remove ernie
1 parent 806ff6f commit a16c4bc

File tree

17 files changed

+726
-58
lines changed

17 files changed

+726
-58
lines changed

examples/model_compression/pp-minilm/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -104,16 +104,16 @@ PP-MiniLM 压缩方案以面向预训练模型的任务无关知识蒸馏(Task-a
104104

105105
## 导入 PP-MiniLM
106106

107-
PP-MiniLM 是使用任务无关蒸馏方法,以 `roberta-wwm-ext-large` 做教师模型蒸馏产出的 6 层 ERNIE 模型(即包含 6 层 Transformer Encoder Layer、Hidden Size 为 768 的中文预训练小模型),在 CLUE 上 7 个分类任务上的模型精度超过 BERT<sub>base</sub>、TinyBERT<sub>6</sub>、UER-py RoBERTa L6-H768、RBT6。
107+
PP-MiniLM 是使用任务无关蒸馏方法,以 `roberta-wwm-ext-large` 做教师模型蒸馏产出的含 6 层 Transformer Encoder Layer、Hidden Size 为 768 的预训练小模型,在 CLUE 上 7 个分类任务上的模型精度超过 BERT<sub>base</sub>、TinyBERT<sub>6</sub>、UER-py RoBERTa L6-H768、RBT6。
108108

109109
可以这样导入 PP-MiniLM:
110110

111111
```python
112112

113-
from paddlenlp.transformers import ErnieModel, ErnieForSequenceClassification
113+
from paddlenlp.transformers import PPMiniLMModel, PPMiniLMForSequenceClassification
114114

115-
model = ErnieModel.from_pretrained('ppminilm-6l-768h')
116-
model = ErnieForSequenceClassification.from_pretrained('ppminilm-6l-768h') # 用于分类任务
115+
model = PPMiniLMModel.from_pretrained('ppminilm-6l-768h')
116+
model = PPMiniLMForSequenceClassification.from_pretrained('ppminilm-6l-768h') # 用于分类任务
117117
```
118118

119119
PP-MiniLM 是一个 6 层的预训练模型,使用 `from_pretrained`导入 PP-MiniLM 之后,就可以在自己的数据集上进行 fine-tuning。接下来会介绍如何用下游任务数据在导入的 PP-MiniLM 上进行微调、进一步压缩及推理部署。
@@ -193,7 +193,7 @@ sh run_clue.sh CLUEWSC2020 1e-4 32 50 128 0 ppminilm-6l-768h
193193
假设待导出的模型的地址为 `ppminilm-6l-768h/models/CLUEWSC2020/1e-4_32`,可以运行下方命令将动态图模型导出为可用于部署的静态图模型:
194194

195195
```shell
196-
python export_model.py --model_type ernie --model_path ppminilm-6l-768h/models/CLUEWSC2020/1e-4_32 --output_path fine_tuned_infer_model/float
196+
python export_model.py --model_type ppminilm --model_path ppminilm-6l-768h/models/CLUEWSC2020/1e-4_32 --output_path fine_tuned_infer_model/float
197197
cd ..
198198
```
199199

@@ -221,7 +221,7 @@ cd ..
221221
cd pruning
222222
export FT_MODELS=../finetuning/ppminilm-6l-768h/models/CLUEWSC2020/1e-4_32
223223

224-
sh prune.sh CLUEWSC2020 5e-5 16 50 128 0 ${FT_MODELS} 0.75
224+
sh prune.sh CLUEWSC2020 1e-4 32 50 128 0 ${FT_MODELS} 0.75
225225
```
226226
其中每个参数依次表示:CLUE 中的任务名称、学习率、batch size、epoch 数、最大序列长度、gpu id、学生模型的地址、裁剪后宽度比例列表。执行完成后,模型保存的路径位于 `pruned_models/CLUEWSC2020/0.75/best_model/`
227227

examples/model_compression/pp-minilm/data.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@
1414
import numpy as np
1515

1616
from paddle.metric import Metric, Accuracy
17-
from paddlenlp.transformers import ErnieForSequenceClassification, ErnieTokenizer
17+
from paddlenlp.transformers import PPMiniLMForSequenceClassification, PPMiniLMTokenizer
1818
from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer
1919

2020
MODEL_CLASSES = {
21-
"ernie": (ErnieForSequenceClassification, ErnieTokenizer),
21+
"ppminilm": (PPMiniLMForSequenceClassification, PPMiniLMTokenizer),
2222
"bert": (BertForSequenceClassification, BertTokenizer)
2323
}
2424

examples/model_compression/pp-minilm/finetuning/run_clue.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@
3030

3131
from paddlenlp.datasets import load_dataset
3232
from paddlenlp.data import Stack, Tuple, Pad, Dict
33-
from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer, BertModel
34-
from paddlenlp.transformers import ErnieForSequenceClassification, ErnieTokenizer
3533
from paddlenlp.transformers import LinearDecayWithWarmup
3634

3735
sys.path.append("../")

examples/model_compression/pp-minilm/finetuning/run_clue.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ export CUDA_VISIBLE_DEVICES=$6
88
export MODEL_PATH=$7
99

1010
python -u ./run_clue.py \
11-
--model_type ernie \
11+
--model_type ppminilm \
1212
--model_name_or_path ${MODEL_PATH} \
1313
--task_name ${TASK_NAME} \
1414
--max_seq_length ${MAX_SEQ_LEN} \

examples/model_compression/pp-minilm/inference/infer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ def parse_args():
3939
", ".join(METRIC_CLASSES.keys()), )
4040
parser.add_argument(
4141
"--model_type",
42-
default='ernie',
42+
default='ppminilm',
4343
type=str,
4444
help="Model type selected in the list: " +
4545
", ".join(MODEL_CLASSES.keys()), )

examples/model_compression/pp-minilm/pruning/export.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
MODEL_PATH=$1
1616
TASK_NAME=$2
17-
python export_model.py --model_type ernie \
17+
python export_model.py --model_type ppminilm \
1818
--model_name_or_path ${MODEL_PATH}/${TASK_NAME}/0.75/best_model \
1919
--sub_model_output_dir ${MODEL_PATH}/${TASK_NAME}/0.75/sub/ \
2020
--static_sub_model ${MODEL_PATH}/${TASK_NAME}/0.75/sub_static/float \

examples/model_compression/pp-minilm/pruning/export_all.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ MODEL_PATH=pruned_models
1717
for TASK_NAME in AFQMC TNEWS IFLYTEK CMNLI OCNLI CLUEWSC2020 CSL
1818

1919
do
20-
python export_model.py --model_type ernie \
20+
python export_model.py --model_type ppminilm \
2121
--model_name_or_path ${MODEL_PATH}/${TASK_NAME}/0.75/best_model \
2222
--sub_model_output_dir ${MODEL_PATH}/${TASK_NAME}/0.75/sub/ \
2323
--static_sub_model ${MODEL_PATH}/${TASK_NAME}/0.75/sub_static/float \

examples/model_compression/pp-minilm/pruning/export_model.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import argparse
1616
import logging
1717
import os
18+
import sys
1819
import math
1920
import random
2021
import time
@@ -26,20 +27,21 @@
2627
import paddle.nn as nn
2728
import paddle.nn.functional as F
2829

29-
from paddlenlp.transformers import ErnieModel, ErnieForSequenceClassification, ErnieTokenizer
30+
from paddlenlp.transformers import PPMiniLMModel
3031
from paddlenlp.utils.log import logger
3132
from paddleslim.nas.ofa import OFA, utils
3233
from paddleslim.nas.ofa.convert_super import Convert, supernet
3334
from paddleslim.nas.ofa.layers import BaseBlock
3435

35-
MODEL_CLASSES = {"ernie": (ErnieForSequenceClassification, ErnieTokenizer), }
36+
sys.path.append("../")
37+
from data import MODEL_CLASSES
3638

3739

38-
def ernie_forward(self,
39-
input_ids,
40-
token_type_ids=None,
41-
position_ids=None,
42-
attention_mask=None):
40+
def ppminilm_forward(self,
41+
input_ids,
42+
token_type_ids=None,
43+
position_ids=None,
44+
attention_mask=None):
4345
wtype = self.pooler.dense.fn.weight.dtype if hasattr(
4446
self.pooler.dense, 'fn') else self.pooler.dense.weight.dtype
4547
if attention_mask is None:
@@ -52,7 +54,7 @@ def ernie_forward(self,
5254
return encoded_layer, pooled_output
5355

5456

55-
ErnieModel.forward = ernie_forward
57+
PPMiniLMModel.forward = ppminilm_forward
5658

5759

5860
def parse_args():

examples/model_compression/pp-minilm/pruning/prune.py

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
from paddlenlp.datasets import load_dataset
3232
from paddlenlp.transformers import LinearDecayWithWarmup
3333
from paddlenlp.utils.log import logger
34-
from paddlenlp.transformers import ErnieForSequenceClassification, ErnieTokenizer, ErnieModel
34+
from paddlenlp.transformers import PPMiniLMModel
3535

3636
from paddleslim.nas.ofa import OFA, DistillConfig, utils
3737
from paddleslim.nas.ofa.utils import nlp_utils
@@ -194,11 +194,11 @@ def evaluate(model, metric, data_loader, width_mult, student=False):
194194

195195

196196
### monkey patch for bert forward to accept [attention_mask, head_mask] as attention_mask
197-
def ernie_forward(self,
198-
input_ids,
199-
token_type_ids=None,
200-
position_ids=None,
201-
attention_mask=[None, None]):
197+
def ppminilm_forward(self,
198+
input_ids,
199+
token_type_ids=None,
200+
position_ids=None,
201+
attention_mask=[None, None]):
202202
wtype = self.pooler.dense.fn.weight.dtype if hasattr(
203203
self.pooler.dense, 'fn') else self.pooler.dense.weight.dtype
204204
if attention_mask[0] is None:
@@ -211,7 +211,7 @@ def ernie_forward(self,
211211
return encoded_layer, pooled_output
212212

213213

214-
ErnieModel.forward = ernie_forward
214+
PPMiniLMModel.forward = ppminilm_forward
215215

216216

217217
### reorder weights according head importance and neuron importance
@@ -220,14 +220,15 @@ def reorder_neuron_head(model, head_importance, neuron_importance):
220220
for layer, current_importance in enumerate(neuron_importance):
221221
# reorder heads
222222
idx = paddle.argsort(head_importance[layer], descending=True)
223-
nlp_utils.reorder_head(model.ernie.encoder.layers[layer].self_attn, idx)
223+
nlp_utils.reorder_head(model.ppminilm.encoder.layers[layer].self_attn,
224+
idx)
224225
# reorder neurons
225226
idx = paddle.argsort(
226227
paddle.to_tensor(current_importance), descending=True)
227228
nlp_utils.reorder_neuron(
228-
model.ernie.encoder.layers[layer].linear1.fn, idx, dim=1)
229+
model.ppminilm.encoder.layers[layer].linear1.fn, idx, dim=1)
229230
nlp_utils.reorder_neuron(
230-
model.ernie.encoder.layers[layer].linear2.fn, idx, dim=0)
231+
model.ppminilm.encoder.layers[layer].linear2.fn, idx, dim=0)
231232

232233

233234
def soft_cross_entropy(inp, target):
@@ -305,9 +306,9 @@ def do_train(args):
305306
args.model_name_or_path, num_classes=num_labels)
306307

307308
# Step4: Config about distillation.
308-
mapping_layers = ['ernie.embeddings']
309-
for idx in range(model.ernie.config['num_hidden_layers']):
310-
mapping_layers.append('ernie.encoder.layers.{}'.format(idx))
309+
mapping_layers = ['ppminilm.embeddings']
310+
for idx in range(model.ppminilm.config['num_hidden_layers']):
311+
mapping_layers.append('ppminilm.encoder.layers.{}'.format(idx))
311312

312313
default_distill_config = {
313314
'lambda_distill': 0.1,
@@ -333,8 +334,8 @@ def do_train(args):
333334
ofa_model.model,
334335
dev_data_loader,
335336
loss_fct=criterion,
336-
num_layers=model.ernie.config['num_hidden_layers'],
337-
num_heads=model.ernie.config['num_attention_heads'])
337+
num_layers=model.ppminilm.config['num_hidden_layers'],
338+
num_heads=model.ppminilm.config['num_attention_heads'])
338339
reorder_neuron_head(ofa_model.model, head_importance, neuron_importance)
339340

340341
if paddle.distributed.get_world_size() > 1:

examples/model_compression/pp-minilm/pruning/prune.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ export CUDA_VISIBLE_DEVICES=$6
2121
export STUDENT_DIR=$7
2222
export WIDTH_LIST=$8
2323

24-
python -u ./prune.py --model_type ernie \
24+
python -u ./prune.py --model_type ppminilm \
2525
--model_name_or_path ${STUDENT_DIR} \
2626
--task_name $TASK_NAME --max_seq_length ${SEQ_LEN} \
2727
--batch_size ${BATCH_SIZE} \

0 commit comments

Comments
 (0)