Skip to content
This repository was archived by the owner on Jan 10, 2025. It is now read-only.

Commit 014db90

Browse files
committed
8-bits precision quantized distilbert
1 parent 57baf48 commit 014db90

File tree

3 files changed

+13
-4
lines changed

3 files changed

+13
-4
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,9 @@ It provides 48 passages from the dataset for users to choose from.
1515
![demo gif](media/distilbert_qa.gif "Demo running offline on a Samsung Galaxy S8")
1616

1717
> Available models:
18-
> * "original" converted DistilBERT (266MB)
19-
> * FP16 post-training-quantized DistilBERT (67MB)
18+
> * "original" converted DistilBERT (254MB)
19+
> * FP16 post-training-quantized DistilBERT (131MB)
20+
> * "hybrid" (8-bits precision weights) post-training-quantized DistilBERT (64MB)
2021
2122
### Coming soon: GPT-2, quantization... and much more!
2223

@@ -81,6 +82,7 @@ To choose which model to use in the app:
8182
```java
8283
"https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-distilled-squad-384.tflite": "model.tflite", // <- "original" converted DistilBERT (default)
8384
// "https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-distilled-squad-384-fp16.tflite": "model.tflite", // <- fp16 quantized version of DistilBERT
85+
// "https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-distilled-squad-384-8bits.tflite": "model.tflite", // <- hybrid quantized version of DistilBERT
8486
```
8587

8688
## Models generation

app/download.gradle

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ apply plugin: 'de.undercouch.download'
33
task downloadLiteModel {
44
def downloadFiles = [
55
'https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-distilled-squad-384.tflite': 'model.tflite',
6-
// 'https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-distilled-squad-384-fp16.tflite': 'model.tflite', // FP16 version
6+
// 'https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-distilled-squad-384-fp16.tflite': 'model.tflite', // FP16 quantization version
7+
// 'https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-distilled-squad-384-8bits.tflite': 'model.tflite', // hybrid quantization version
78
]
89
downloadFiles.each { key, value ->
910
download {

models_generation/distilbert.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,13 @@
1414
# For normal conversion:
1515
converter.target_spec.supported_ops = [tf.lite.OpsSet.SELECT_TF_OPS]
1616

17-
# For FP16 conversion:
17+
# For conversion with FP16 quantization:
18+
# converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
19+
# converter.target_spec.supported_types = [tf.float16]
20+
# converter.optimizations = [tf.lite.Optimize.DEFAULT]
21+
# converter.experimental_new_converter = True
22+
23+
# For conversion with hybrid quantization:
1824
# converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
1925
# converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
2026
# converter.experimental_new_converter = True

0 commit comments

Comments
 (0)