-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Collate data sets for Hindi language and create an aggregated dataset
Original Datasets being used :
https://github.com/TharinduDR/DeepOffense/tree/master/examples/marathi/data
https://github.com/l3cube-pune/MarathiNLP/tree/main/L3Cube-MahaHate
https://hasocfire.github.io/hasoc/2023/dataset.html
Target Dataset format
| Column | Description | Format |
|---|---|---|
| UID | Unique identifier to trace the origin of the dataset and act as index for dataset. | <language_code><train/test/val>_<index_number> |
| text | The text content used for classifier | utf-8 encoded text |
| label_yn | A binary label indicating whether text is classified as hate / non-hate in respective datasets. | 1 - hate 0 - non-hate |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels