Skip to content

Commit e2f3b09

Browse files
add fixed machine-translation-nar-{ru-en} models (#3127)
* add fixed machine-translation-nar-{ru-en} models * add machine-translation-nar-{en-ru} models to device_support.md * update machine-translation-nar-{} descriptions * update machine-translation-nar-{en-ru} AC configs * Update models/intel/machine-translation-nar-en-ru-0002/README.md Co-authored-by: Vladimir Dudnik <[email protected]>
1 parent 031cfd7 commit e2f3b09

File tree

14 files changed

+92128
-4
lines changed

14 files changed

+92128
-4
lines changed

models/intel/device_support.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,12 @@
5656
| instance-segmentation-person-0007 | YES | | |
5757
| landmarks-regression-retail-0009 | YES | YES | YES |
5858
| license-plate-recognition-barrier-0001 | YES | YES | YES |
59-
| machine-translation-nar-de-en-0002 | YES | YES | |
60-
| machine-translation-nar-en-de-0002 | YES | YES | |
61-
| machine-translation-nar-en-ru-0001 | YES | YES | |
62-
| machine-translation-nar-ru-en-0001 | YES | YES | |
59+
| machine-translation-nar-de-en-0002 | YES | | |
60+
| machine-translation-nar-en-de-0002 | YES | | |
61+
| machine-translation-nar-en-ru-0001 | YES | | |
62+
| machine-translation-nar-ru-en-0001 | YES | | |
63+
| machine-translation-nar-en-ru-0002 | YES | | |
64+
| machine-translation-nar-ru-en-0002 | YES | | |
6365
| noise-suppression-denseunet-ll-0001 | YES | | |
6466
| noise-suppression-poconetlike-0001 | YES | | |
6567
| pedestrian-and-vehicle-detector-adas-0001 | YES | YES | YES |

models/intel/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,8 @@ Deep Learning compressed models
285285
|------------|---------------------|-----------|
286286
| [machine-translation-nar-en-ru-0001](./machine-translation-nar-en-ru-0001/README.md) | 23.17 | 69.29 |
287287
| [machine-translation-nar-ru-en-0001](./machine-translation-nar-ru-en-0001/README.md) | 23.17 | 69.29 |
288+
| [machine-translation-nar-en-ru-0002](./machine-translation-nar-en-ru-0002/README.md) | 23.17 | 69.29 |
289+
| [machine-translation-nar-ru-en-0002](./machine-translation-nar-ru-en-0002/README.md) | 23.17 | 69.29 |
288290
| [machine-translation-nar-en-de-0002](./machine-translation-nar-en-de-0002/README.md) | 23.19 | 77.47 |
289291
| [machine-translation-nar-de-en-0002](./machine-translation-nar-de-en-0002/README.md) | 23.19 | 77.47 |
290292

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# machine-translation-nar-en-ru-0002
2+
3+
## Use Case and High-Level Description
4+
5+
This is a English-Russian machine translation model based on non-autoregressive Transformer topology.
6+
7+
Tokenization occurs using the SentencePieceBPETokenizer (see the demo code for implementation details) and is enclosed in tokenizer_src and tokenizer_tgt folders.
8+
9+
## Specification
10+
11+
| Metric | Value |
12+
|-------------------|-----------------------|
13+
| GOps | 23.17 |
14+
| MParams | 69.29 |
15+
| Source framework | PyTorch\* |
16+
17+
## Accuracy
18+
19+
The quality metrics were calculated on the wmt19-ru-en dataset ("test" split in lower case).
20+
21+
| Metric | Value |
22+
|---------------------------|---------------|
23+
| BLEU | 22.7 % |
24+
25+
## Input
26+
27+
name: `tokens`
28+
shape: `1, 192`
29+
description: sequence of tokens (integer values) representing the tokenized sentence.
30+
The sequence structure is as follows (`<s>`, `</s>` and `<pad>` should be replaced by corresponding token IDs as specified by the dictionary):
31+
`<s>` + *tokenized sentence* + `</s>` + (`<pad>` tokens to pad to the maximum sequence length of 192)
32+
33+
## Output
34+
35+
name: `pred`
36+
shape: `1, 192`
37+
description: sequence of tokens (integer values) representing the tokenized translation.
38+
The sequence structure is as follows (`<s>`, `</s>` and `<pad>` should be replaced by corresponding token IDs as specified by the dictionary):
39+
`<s>` + *tokenized sentence* + `</s>` + (`<pad>` tokens to pad to the maximum sequence length of 192)
40+
41+
## Legal Information
42+
[*] Other names and brands may be claimed as the property of others.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
models:
2+
- name: machine-translation-nar-en-ru-0002
3+
launchers:
4+
- framework: dlsdk
5+
adapter:
6+
type: narnmt
7+
vocabulary_file: tokenizer_tgt/vocab.json
8+
merges_file: tokenizer_tgt/merges.txt
9+
output_name: pred
10+
inputs:
11+
- name: "tokens"
12+
value: 'tokens'
13+
type: INPUT
14+
15+
datasets:
16+
- name: WMT_en_ru
17+
postprocessing:
18+
- type: to_lower_case
19+
- type: remove_repeats
20+
21+
metrics:
22+
- type: bleu
23+
smooth: True
24+
reference: 0.227

0 commit comments

Comments
 (0)