Skip to content

Collate Datasets for Marathi and Aggregate them  #4

@sarathsomana

Description

@sarathsomana

Collate data sets for Hindi language and create an aggregated dataset

Original Datasets being used :

https://github.com/TharinduDR/DeepOffense/tree/master/examples/marathi/data
https://github.com/l3cube-pune/MarathiNLP/tree/main/L3Cube-MahaHate
https://hasocfire.github.io/hasoc/2023/dataset.html

Target Dataset format

Column Description Format
UID Unique identifier to trace the origin of the dataset and act as index for dataset. <language_code><train/test/val>_<index_number>
text The text content used for classifier utf-8 encoded text
label_yn A binary label indicating whether text is classified as hate / non-hate in respective datasets. 1 - hate
0 - non-hate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions