Skip to content

Commit 7393397

Browse files
author
tianxin
authored
Merge pull request PaddlePaddle#1105 from leeyy2020/few_shot_rdrop
add drop for few-shot learning
2 parents 928c34a + 0327121 commit 7393397

File tree

7 files changed

+58
-10
lines changed

7 files changed

+58
-10
lines changed

examples/few_shot/README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,12 @@ Few-Shot Learning 旨在研究如何从少量有监督的训练样本中学习
1111

1212
| 算法 | 预训练模型 | Score | eprstmt | bustm | ocnli | csldcp | tnews | cluewsc | iflytek | csl | chid |
1313
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |------------ | ------------ | ---------- |
14-
| P-tuning | ERNIE1.0 | 55.70 | 83.28 | 63.43 | 35.36 | 60.54 | 50.02 | 54.51 | 50.14 | 54.93 | 41.16 |
15-
| EFL | ERNIE1.0 | 54.47 | 84.10 | 60.10 | 35.12 | 56.61 | 56.57 | 53.59 | 46.37 | 61.21 | 36.56 |
16-
| PET | ERNIE1.0 | 56.63 | 86.88 | 61.90 | 36.90 | 61.10 | 56.51 | 55.02 | 50.31 | 59.72 | 41.35 |
14+
| P-tuning | ERNIE-1.0 | 55.70 | 83.28 | 63.43 | 35.36 | 60.54 | 50.02 | 54.51 | 50.14 | 54.93 | 41.16 |
15+
| P-tuning+R-Drop | ERNIE-1.0 | 56.23 | 83.11 | 64.56 | 35.71 | 61.88 | 57.51 | 54 | 52 | 56.3 | 41 |
16+
| EFL | ERNIE-1.0 | 54.47 | 84.10 | 60.10 | 35.12 | 56.61 | 56.57 | 53.59 | 46.37 | 61.21 | 36.56 |
17+
| EFL+R-Drop | ERNIE-1.0 | 56.94 | 87 | 62.75 | 37.54 | 53.98 | 56.77 | 56.87 | 48.54 | 62.19 | 46.85 |
18+
| PET | ERNIE-1.0 | 56.63 | 86.88 | 61.90 | 36.90 | 61.10 | 56.51 | 55.02 | 50.31 | 59.72 | 41.35 |
19+
| PET+R-Drop | ERNIE-1.0 | 57.37 | 87.54 | 63.66 | 36.46 | 62.5 | 58.91 | 56.25 | 53.46 | 57.22 | 40.31 |
1720

1821
## Models
1922
- [P-tuning](./p-tuning)

examples/few_shot/efl/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,16 @@ python -u -m paddle.distributed.launch --gpus "0" \
3939
--batch_size 32 \
4040
--learning_rate 5E-5 \
4141
--epochs 10 \
42-
--max_seq_length 512
42+
--max_seq_length 512 \
43+
--rdrop_coef 0 \
4344
```
4445
参数含义说明
4546
- `task_name`: FewCLUE 中的数据集名字
4647
- `negative_num`: 负样本采样个数,对于多分类任务,负样本数量对效果影响很大。负样本数量参数取值范围为 [1, class_num - 1]
4748
- `device`: 使用 cpu/gpu 进行训练
4849
- `save_dir`: 模型存储路径
4950
- `max_seq_length`: 文本的最大截断长度
51+
- `rdrop_coef`: R-Drop 策略 Loss 的权重系数,默认为 0, 若为 0 则未使用 R-Drop 策略
5052
5153
模型每训练 1 个 epoch, 会在验证集上进行评估,并针对测试集进行预测存储到预测结果文件。
5254

examples/few_shot/efl/train.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,12 @@ def parse_args():
106106
type=int,
107107
default=100000,
108108
help="Inteval steps to save checkpoint")
109+
parser.add_argument(
110+
"--rdrop_coef",
111+
default=0.0,
112+
type=float,
113+
help="The coefficient of KL-Divergence loss in R-Drop paper, for more detail please refer to https://arxiv.org/abs/2106.14448), if rdrop_coef > 0 then R-Drop works")
114+
109115
return parser.parse_args()
110116

111117

@@ -210,7 +216,7 @@ def do_train():
210216
apply_decay_param_fun=lambda x: x in decay_params)
211217

212218
criterion = paddle.nn.loss.CrossEntropyLoss()
213-
219+
rdrop_loss = ppnlp.losses.RDropLoss()
214220
global_step = 0
215221
tic_train = time.time()
216222
for epoch in range(1, args.epochs + 1):
@@ -222,7 +228,14 @@ def do_train():
222228
prediction_scores = model(
223229
input_ids=src_ids, token_type_ids=token_type_ids)
224230

225-
loss = criterion(prediction_scores, labels)
231+
if args.rdrop_coef > 0:
232+
prediction_scores_2 = model(
233+
input_ids=src_ids, token_type_ids=token_type_ids)
234+
ce_loss = (criterion(prediction_scores, labels) + criterion(prediction_scores_2, labels)) * 0.5
235+
kl_loss = rdrop_loss(prediction_scores, prediction_scores_2)
236+
loss = ce_loss + kl_loss * args.rdrop_coef
237+
else:
238+
loss = criterion(prediction_scores, labels)
226239

227240
global_step += 1
228241
if global_step % 10 == 0 and rank == 0:

examples/few_shot/p-tuning/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,16 @@ python -u -m paddle.distributed.launch --gpus "0" \
3636
--batch_size 32 \
3737
--learning_rate 5E-5 \
3838
--epochs 10 \
39-
--max_seq_length 512
39+
--max_seq_length 512 \
40+
--rdrop_coef 0 \
4041
```
4142
参数含义说明
4243
- `task_name`: FewCLUE 中的数据集名字
4344
- `p_embedding_num`: P-embedding 的个数
4445
- `device`: 使用 cpu/gpu 进行训练
4546
- `save_dir`: 模型存储路径
4647
- `max_seq_length`: 文本的最大截断长度
48+
- `rdrop_coef`: R-Drop 策略 Loss 的权重系数,默认为 0, 若为 0 则未使用 R-Drop 策略
4749
4850
模型每训练 1 个 epoch, 会在验证集和测试集上进行评估。
4951

examples/few_shot/p-tuning/ptuning.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@
5151
parser.add_argument("--seed", type=int, default=1000, help="random seed for initialization")
5252
parser.add_argument('--device', choices=['cpu', 'gpu'], default="gpu", help="Select which device to train model, defaults to gpu.")
5353
parser.add_argument('--save_steps', type=int, default=10000, help="Inteval steps to save checkpoint")
54+
parser.add_argument("--rdrop_coef", default=0.0, type=float, help="The coefficient of KL-Divergence loss in R-Drop paper, for more detail please refer to https://arxiv.org/abs/2106.14448), if rdrop_coef > 0 then R-Drop works")
5455

5556
args = parser.parse_args()
5657
# yapf: enable
@@ -153,6 +154,7 @@ def do_train():
153154
print("warmup from:{}".format(args.init_from_ckpt))
154155

155156
mlm_loss_fn = ErnieMLMCriterion()
157+
rdrop_loss = ppnlp.losses.RDropLoss()
156158

157159
num_training_steps = len(train_data_loader) * args.epochs
158160

@@ -187,7 +189,16 @@ def do_train():
187189
token_type_ids=token_type_ids,
188190
masked_positions=masked_positions)
189191

190-
loss = mlm_loss_fn(prediction_scores, masked_lm_labels)
192+
if args.rdrop_coef > 0:
193+
prediction_scores_2 = model(
194+
input_ids=src_ids,
195+
token_type_ids=token_type_ids,
196+
masked_positions=masked_positions)
197+
ce_loss = (mlm_loss_fn(prediction_scores, masked_lm_labels) + mlm_loss_fn(prediction_scores_2, masked_lm_labels)) * 0.5
198+
kl_loss = rdrop_loss(prediction_scores, prediction_scores_2)
199+
loss = ce_loss + kl_loss * args.rdrop_coef
200+
else:
201+
loss = mlm_loss_fn(prediction_scores, masked_lm_labels)
191202

192203
global_step += 1
193204
if global_step % 10 == 0 and rank == 0:

examples/few_shot/pet/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,15 @@ python -u -m paddle.distributed.launch --gpus "0" \
3939
--epochs 10 \
4040
--max_seq_length 512 \
4141
--language_model "ernie-1.0" \
42+
--rdrop_coef 0 \
4243
```
4344
参数含义说明
4445
- `task_name`: FewCLUE 中的数据集名字
4546
- `device`: 使用 cpu/gpu 进行训练
4647
- `pattern_id` 完形填空的模式
4748
- `save_dir`: 模型存储路径
4849
- `max_seq_length`: 文本的最大截断长度
50+
- `rdrop_coef`: R-Drop 策略 Loss 的权重系数,默认为 0, 若为 0 则未使用 R-Drop 策略
4951
5052
模型每训练 1 个 epoch, 会在验证集上进行评估
5153

examples/few_shot/pet/pet.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ def do_train(args):
167167
print("warmup from:{}".format(args.init_from_ckpt))
168168

169169
mlm_loss_fn = ErnieMLMCriterion()
170-
cross_loss_fn = paddle.nn.CrossEntropyLoss()
170+
rdrop_loss = ppnlp.losses.RDropLoss()
171171
max_test_acc = 0.0
172172
global_step = 0
173173
tic_train = time.time()
@@ -195,7 +195,16 @@ def do_train(args):
195195
input_ids=src_ids,
196196
token_type_ids=token_type_ids,
197197
masked_positions=new_masked_positions)
198-
loss = mlm_loss_fn(prediction_scores, masked_lm_labels)
198+
if args.rdrop_coef > 0:
199+
prediction_scores_2 = model(
200+
input_ids=src_ids,
201+
token_type_ids=token_type_ids,
202+
masked_positions=new_masked_positions)
203+
ce_loss = (mlm_loss_fn(prediction_scores, masked_lm_labels) + mlm_loss_fn(prediction_scores_2, masked_lm_labels)) * 0.5
204+
kl_loss = rdrop_loss(prediction_scores, prediction_scores_2)
205+
loss = ce_loss + kl_loss * args.rdrop_coef
206+
else:
207+
loss = mlm_loss_fn(prediction_scores, masked_lm_labels)
199208

200209
global_step += 1
201210
if global_step % 10 == 0 and rank == 0:
@@ -307,5 +316,11 @@ def do_train(args):
307316
default='ernie-1.0',
308317
choices=['ernie-1.0'],
309318
help="Language model")
319+
parser.add_argument(
320+
"--rdrop_coef",
321+
default=0.0,
322+
type=float,
323+
help="The coefficient of KL-Divergence loss in R-Drop paper, for more detail please refer to https://arxiv.org/abs/2106.14448), if rdrop_coef > 0 then R-Drop works")
324+
310325
args = parser.parse_args()
311326
do_train(args)

0 commit comments

Comments
 (0)