Skip to content

Commit 1785945

Browse files
authored
Update dzner.json
1 parent 6c2a6db commit 1785945

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

datasets/dzner.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
],
1414
"Form": "text",
1515
"Collection Style": [
16-
"crawling"
16+
"crawling",
17+
"human annotation"
1718
],
1819
"Description": "The DzNER dataset is designed for Named Entity Recognition (NER) in the Algerian dialect, a significantly low-resource language in NLP research. This dataset contains over 21,000 manually annotated sentences (220,000+ tokens) derived from Algerian Facebook pages and YouTube channels. DzNER focuses on labeling three types of entities: Person, Location, and Organization.",
1920
"Volume": 220000.0,
@@ -47,4 +48,4 @@
4748
],
4849
"Abstract": "Named Entity Recognition (NER) is a natural language processing (NLP) task that involves assigning labels like Person, Location, and Organization to words in text. While there is a good amount of annotated data available for NER in English and other European languages, this is not the case for Arabic and its dialects. The goal of the paper is to introduce DzNER, an Algerian dataset for NER that consists of more than 21,000 manually annotated sentences (over 220,000 tokens) from Algerian Facebook pages and YouTube channels, with a focus on three prominent classes. In this study, we provide a detailed analysis of the NER tag-set used in the dataset and show that it has a good balance of quantity, diversity, and coverage of different domains. For the proof of resource-effectiveness, we also demonstrate the effectiveness of the dataset by using various language models for the sequence labeling task of NER and comparing the results to existing datasets. According to our research and knowledge, currently no available dataset meets the standards of both variability and volume as well as DzNER. We hope that this dataset and the accompanying code and models will be useful for further research on NLP for Algerian dialect and fill the gap of low resources.",
4950
"Added By": "Abdelhalim Hafedh Dahou"
50-
}
51+
}

0 commit comments

Comments
 (0)