Modelo de detección de discursos de odio

Este código implementa un sistema de clasificación de discurso de odio utilizando el modelo RoBERTuito (una versión en español de RoBERTa) para detectar discurso de odio en tuits.

Arquitectura del Modelo

El modelo se basa en pysentimiento/robertuito-base-uncased con las siguientes modificaciones:

Se añadió una capa de clasificación densa sobre el modelo base.
Utiliza IDs de entrada y máscaras de atención como entradas.
Genera una clasificación binaria (odio vs. no odio).

Datasets

Conjunto de Datos de Preentrenamiento: Conjunto de datos multilingüe de sentimiento de tuits de Cardiff NLP (parte en español).

Convertido a clasificación binaria:
Tweets negativos (etiqueta original 0) → Odio (1).
Tweets positivos (etiqueta original 2) → No odio (0).
Tweets neutrales (etiqueta original 1) → No odio (0).

Conjunto de Datos HATEMEDIA: Conjunto de datos personalizado de discurso de odio.

Clasificación binaria:
Odio (1).
No odio (0).

Training Process

Pre-entrenamiento

Batch size: 16
Epochs: 5
Learning rate: 2e-5 with 10% warmup steps
Early stopping with patience=2

Fine-tuning

Batch size: 128
Epochs: 5
Learning rate: 2e-5 with 10% warmup steps
Early stopping with patience=2
Métricas personalizadas:
- Recall for non-hate class
- Precision for hate class
- F1-score (weighted)
- AUC-PR
- Recall at precision=0.9 (non-hate)
- Precision at recall=0.9 (hate)

Métricas de Evaluación

El modelo se evalúa utilizando:

Macro recall, precision, and F1-score
One-vs-Rest AUC
Accuracy
Métricas por clase
Matriz de confusión

Requerimientos

Se requiere los siguientes paquetes de Python (consulte requirements.txt para ver la lista completa):

TensorFlow
Transformers
scikit-learn
pandas
datasets
matplotlib
seaborn

Uso

El modelo espera datos de entrada con las siguientes especificaciones:

Formato de datos:

Archivo CSV o DataFrame de Pandas
Nombre de columna obligatorio: text (tipo cadena)
Nombre de columna opcional: label (tipo entero, 0 o 1) si está disponible para la evaluación

Preprocesamiento de texto:

El texto se convertirá automáticamente a minúsculas durante el procesamiento
Longitud máxima: 128 tokens (los textos más largos se truncarán)
Los caracteres especiales, las URL y los emojis deben permanecer en el texto (el tokenizador los gestiona)

Codificación de etiquetas:

0 = Sin contenido de odio (incluido contenido neutral/positivo)
1 = Incitación al odio

El proceso de creación de este algoritmo se expone en el informe técnico localizado en: Blanco-Valencia, X., De Gregorio-Vicente, O., Ruiz Iniesta, A., & Said-Hung, E. (2025). Algoritmos de detección de odio/no odio, tipo e intensidad – Hatemedia V.2.0 (Version 2). Hatemedia Project. https://doi.org/10.5281/zenodo.16996080

Autores:

Elias Said-Hung
Julio Montero-Díaz
Oscar De Gregorio
Almudena Ruiz
Xiomara Blanco
Juan José Cubillas
Daniel Pérez Palau

Financiado por: MCIN/AEI /10.13039/501100011033

Como citar: Said-Hung, E., Montero-Diaz, J., De Gregorio Vicente, O., Ruiz-Iniesta, A., Blanco Valencia, X., José Cubillas, J., and Pérez Palau, D. (2023), “Algorithm for classifying hate expressions in Spanish”, figshare. https://doi.org/10.6084/m9.figshare.24574906.

Más información:

https://www.hatemedia.es/ o contactar con: [email protected]
Este algoritmo está relacionado con el algoritmo de clasificación de expresiones de odio por intensidad en español, desarrollado también por los autores: https://github.com/esaidh266/Algorithm-for-classifying-hate-expressions-by-intensities-in-Spanish
Este algoritmo está relacionado con el algoritmo de clasificación de expresiones de odio por tipo en español, desarrollado también por los autores: https://github.com/esaidh266/Algorithm-for-classifying-hate-expressions-by-type-in-Spanish

Hate Speech Detection Model

This code implements a hate speech classification system using the RoBERTuito model (a Spanish version of RoBERTa) to detect hate speech in tweets.

Model Architecture

The model is based on pysentimiento/robertuito-base-uncased with the following modifications:

A dense classification layer has been added over the base model.
It uses input IDs and attention masks as inputs.
It generates a binary classification (hate vs. non-hate).

Datasets

Pre-training Dataset: Cardiff NLP multilingual tweet sentiment dataset (Spanish part).

Converted to binary classification:
Negative tweets (original label 0) → Hate (1).
Positive tweets (original label 2) → Non-hate (0).
Neutral tweets (original label 1) → No hate (0).

HATEMEDIA Dataset: Custom hate speech dataset.

Binary classification:
Hate (1).
No hate (0).

Training Process

Pre-workout

Batch size: 16
Epochs: 5
Learning rate: 2e-5 with 10% warmup steps
Early stopping with patience=2

Fine-tuning

Batch size: 128
Epochs: 5
Learning rate: 2e-5 with 10% warmup steps
Early stopping with patience=2
Custom metrics:
Recall for non-hate class
Precision for hate class
F1-score (weighted)
AUC-PR
Recall at precision=0.9 (non-hate)
Precision at recall=0.9 (hate)

Evaluation Metrics

The model is evaluated using:

Macro recall, precision, and F1-score
One-vs-Rest AUC
Accuracy
Metrics by class
Confusion matrix

Requirements

The following Python packages are required (see requirements.txt for the full list):

TensorFlow
Transformers
scikit-learn
pandas
datasets
matplotlib
seaborn

Usage

The model expects input data with the following specifications:

Data Format:

CSV file or Pandas DataFrame
Mandatory column name: text (type string)
Optional column name: label (type integer, 0 or 1) if available for evaluation

Text Preprocessing:

Text will be automatically converted to lowercase during processing
Maximum length: 128 tokens (longer texts will be truncated)
Special characters, URLs, and emojis must remain in the text (the tokenizer handles these)

Label Encoding:

0 = No hateful content (including neutral/positive content)
1 = Hate speech

The process of creating this algorithm is explained in the technical report located at:Blanco-Valencia, X., De Gregorio-Vicente, O., Ruiz Iniesta, A., & Said-Hung, E. (2025). Algoritmos de detección de odio/no odio, tipo e intensidad – Hatemedia V.2.0 (Version 2). Hatemedia Project. https://doi.org/10.5281/zenodo.16996080 Authors:

Elias Said-Hung
Julio Montero-Díaz
Oscar De Gregorio
Almudena Ruiz
Xiomara Blanco
Juan José Cubillas
Daniel Pérez Palau

Funded by: MCIN/AEI/10.13039/501100011033

How to cites: Said-Hung, E., Montero-Diaz, J., De Gregorio Vicente, O., Ruiz-Iniesta, A., Blanco Valencia, X., José Cubillas, J., and Pérez Palau, D. (2023), “Algorithm for classifying hate expressions in Spanish”, figshare. https://doi.org/10.6084/m9.figshare.24574906.

More information:

https://www.hatemedia.es/ or contact: [email protected]
This algorithm is related to the algorithm for classifying hate expressions by intensity in Spanish, also developed by the authors: https://github.com/esaidh266/Algorithm-for-classifying-hate-expressions-by-intensities-in-Spanish
This algorithm is related to the algorithm for classifying hate expressions by type in Spanish, also developed by the authors: https://github.com/esaidh266/Algorithm-for-classifying-hate-expressions-by-type-in-Spanish

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Licence		Licence
Odio_no_odio.ipynb		Odio_no_odio.ipynb
README.md		README.md
requirements.csv		requirements.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Modelo de detección de discursos de odio

Arquitectura del Modelo

Datasets

Training Process

Pre-entrenamiento

Fine-tuning

Métricas de Evaluación

Requerimientos

Uso

Hate Speech Detection Model

Model Architecture

Datasets

Training Process

Pre-workout

Fine-tuning

Evaluation Metrics

Requirements

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

esaidh266/Algorithm-for-detection-of-hate-speech-in-Spanish

Folders and files

Latest commit

History

Repository files navigation

Modelo de detección de discursos de odio

Arquitectura del Modelo

Datasets

Training Process

Pre-entrenamiento

Fine-tuning

Métricas de Evaluación

Requerimientos

Uso

Hate Speech Detection Model

Model Architecture

Datasets

Training Process

Pre-workout

Fine-tuning

Evaluation Metrics

Requirements

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages