file supplementary_notes.pdf

This repository contains all new resources we created for our NAACL 2022 paper "Identifying Implicitly Abusive Remarks about Identity Groups using a Linguistically Informed Approach" by Michael Wiegand, Elisabeth Eder and Josef Ruppenhofer.

The supplementary data for this research includes 4 directories and 2 files whose contents are briefly described below:

file supplementary_notes.pdf

An additional document that specifies details regarding experiments carried out in our paper for which there was insufficient space in the paper.

file data_sheet.pdf

A data sheet providing summarizing important information about the data contained in this repository.

directory LabelledSentences

This directory contains the annotation of the English sentences (sentences.english.csv) and German sentences (sentences.german.csv) extracted from Twitter. These two files represent the central data of this research. Each file includes the annotation of both the main task (column "LABEL") and all component tasks (the name of the respective columns should be self-explanatory).

directory LexiconsForPerpetratorsOrNonConformistViews

This directory contains the lexicon files we created for the detection of perpetrators (i.e. perpetrator-evoking verbs) and non-conformist views (i.e. fine-grained sentiment of the agent towards the patient). The directory includes both the lexicons manually created via crowdsourcing and the extensions that have been built by training a supervised classifier on these manually-compiled lexicons. For the perpetrator-evoking verbs, we also included a file with the invented sentences that the crowdworkers produced.

directory Guidelines

This directory contains the annotation guidelines for building the datasets as presented to the crowdworkers. Notice that the terminology used in the guidelines and the one in the paper may vary slightly since the crowdworkers were not trained linguistics. Therefore, they were not familiar with several technical terms, so we replaced some of them by more common terms (provided that they are nearly synonymous). For example, instead of "agent" and "patient", we refer to "subject" and "object", which, in the context of our annotation, amount the same thing.

directory Code

A re-implementation of the linguistically informed classifier. The original code used some software which is not publicly available. In the re-implemented version, such software has been replaced by publicly available components. Overall, the performance of this version is comparable with the version used in the experiments of the paper.

attribution

This data set is published under Creative Commons Attribution 4.0.

contact information

Please direct any questions that you have about this software to Michael Wiegand at University of Klagenfurt.

Michael Wiegand email: Michael.Wiegand@aau.at

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

file supplementary_notes.pdf

file data_sheet.pdf

directory LabelledSentences

directory LexiconsForPerpetratorsOrNonConformistViews

directory Guidelines

directory Code

attribution

contact information

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Code		Code
Guidelines		Guidelines
LabelledSentences		LabelledSentences
LexiconsForPerpetratorsOrNonConformistViews		LexiconsForPerpetratorsOrNonConformistViews
LICENSE		LICENSE
README.md		README.md
data_sheet.pdf		data_sheet.pdf
supplementary_notes.pdf		supplementary_notes.pdf

Folders and files

Latest commit

History

Repository files navigation

file supplementary_notes.pdf

file data_sheet.pdf

directory LabelledSentences

directory LexiconsForPerpetratorsOrNonConformistViews

directory Guidelines

directory Code

attribution

contact information

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages