Skip to content

Commit 53f181d

Browse files
authored
add distill example in tutorial, and update augmentation condition (#619)
1 parent ee346c3 commit 53f181d

File tree

2 files changed

+9
-6
lines changed

2 files changed

+9
-6
lines changed

education/day12.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,4 +99,4 @@ from paddlenlp.transformers import ErnieGramForSequenceClassification, ErnieGram
9999
teacher = ErnieGramForSequenceClassification.from_pretrained("./tmp/ChnSentiCorp/best_model")
100100
```
101101

102-
蒸馏的过程同AI studio教程,这里就不再赘述啦~同学们按着与教程相同的步骤进行即可。
102+
蒸馏的过程同AI studio教程,这里就不再赘述啦~同学们按着与教程相同的步骤进行即可。同时,本repo中也提供了一个[从BERT到Bi-LSTM蒸馏](../examples/model_compression/distill_lstm)的例子可供参考。

examples/model_compression/distill_lstm/data.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,8 @@ def apply_data_augmentation_for_cn(data,
147147
new_data = []
148148

149149
for example in data:
150+
if not example['text']:
151+
continue
150152
text_tokenized = list(jieba.cut(example['text']))
151153
lstm_tokens = text_tokenized
152154
bert_tokens = tokenizer.tokenize(example['text'])
@@ -170,11 +172,12 @@ def apply_data_augmentation_for_cn(data,
170172
p_ng, ngram_range)
171173
lstm_tokens, bert_tokens = flatten(lstm_tokens), flatten(
172174
bert_tokens)
173-
new_data.append({
174-
"lstm_tokens": lstm_tokens,
175-
"bert_tokens": bert_tokens,
176-
"label": example['label']
177-
})
175+
if lstm_tokens and bert_tokens:
176+
new_data.append({
177+
"lstm_tokens": lstm_tokens,
178+
"bert_tokens": bert_tokens,
179+
"label": example['label']
180+
})
178181
return new_data
179182

180183

0 commit comments

Comments
 (0)