Skip to content

Commit 5df25a5

Browse files
author
Anna Grebneva
authored
Added ability to use custom labels in conll_ner converter (#3089)
1 parent e600c51 commit 5df25a5

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

tools/accuracy_checker/openvino/tools/accuracy_checker/annotation_converters/README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -707,12 +707,17 @@ The main difference between this converter and `super_resolution` in data organi
707707
* `see_in_the_dark` - converts See-in-the-Dark dataset described in the [paper](https://cchen156.github.io/paper/18CVPR_SID.pdf) to `ImageProcessingAnnotation`.
708708
* `annotation_file` - path to image pairs file in txt format.
709709
* `conll_ner` - converts CONLL 2003 dataset for Named Entity Recognition to `BERTNamedEntityRecognitionAnnotation`.
710-
* `annotation_file` - annotation file in txt forma
710+
* `annotation_file` - annotation file in txt format.
711711
* `vocab_file` - vocab file for word piece tokenization.
712712
* `lower_case` - converts all tokens to lower case during tokenization (Optional, default `False`).
713713
* `max_length` - maximal input sequence length (Optional, default 128).
714714
* `pad_input` - allow padding for input sequence if input less that `max_length` (Optional, default `True`).
715715
* `include_special_token_lables` - allow extension original dataset labels with special token labels (`[CLS'`, `[SEP]`]) (Optional, default `False`).
716+
* `labels_file` - path to file with custom labels in json format (Optional).
717+
Example of labels_file content:
718+
```json
719+
{"labels": ["O", "B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC", "B-MISC", "I-MISC"]}
720+
```
716721
* `tacotron2_data_converter` - converts input data for custom tacotron2 pipeline.
717722
* `annotation_file` - tsv file with location input data and reference.
718723
* `noise_suppression_dataset` - converts dataset for audio denoising to `NoiseSuppressionAnnotation`

tools/accuracy_checker/openvino/tools/accuracy_checker/annotation_converters/conll_ner.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
from ._nlp_common import WordPieceTokenizer
1919
from ..config import BoolField, PathField, NumberField
2020
from ..representation import BERTNamedEntityRecognitionAnnotation
21+
from ..utils import read_json
2122

2223

2324
class CONLLDatasetConverter(FileBasedAnnotationConverter):
@@ -36,7 +37,8 @@ def parameters(cls):
3637
),
3738
'include_special_token_labels': BoolField(
3839
optional=True, default=False, description='Should special tokens be included to labels or not'
39-
)
40+
),
41+
'labels_file': PathField(optional=True, description='Path to file with custom labels in json format')
4042
})
4143
return params
4244

@@ -47,6 +49,9 @@ def configure(self):
4749
self.get_value_from_config('vocab_file'),
4850
lower_case=self.get_value_from_config('lower_case'), max_len=self.get_value_from_config('max_len')
4951
)
52+
labels_file = self.get_value_from_config('labels_file')
53+
if labels_file:
54+
self.label_list = read_json(labels_file)['labels']
5055
if self.include_spec:
5156
self.label_list.extend(['[CLS]', '[SEP]'])
5257
self.pad = self.get_value_from_config('pad_input')

0 commit comments

Comments
 (0)