Skip to content

nshrhm/llm-emotion-attribution-multilingual

Repository files navigation

Multilingual LLM Emotion Attribution Study

Tests Python Code License Data License Release

This repository contains the public data and code package for a preregistered study of multilingual emotion attribution by large language models (LLMs). It is prepared for reviewers and readers who want to inspect the study design, reproduce the reported analyses, and verify the repair-aware official dataset.

The associated manuscript should be read through the journal submission or publication site. Manuscript source files, review-response materials, and local submission artifacts are intentionally not included in this public repository.

Study Overview

The study examines how six LLMs assign four emotion scores to literary texts under three full language conditions. A language condition changes the instruction language, persona description, title, author name, and text content together.

  • Study ID: models6_multilingual_2026-03-06
  • Design: 6 models x 3 languages x 3 texts x 4 personas x 3 trials
  • Total API calls: 648
  • Languages: Japanese (ja), English (en), Traditional Chinese (zh)
  • Emotions: interesting, surprise, sadness, anger
  • Main model: OLS with HC3 robust standard errors, fitted separately for each emotion

The Japanese source texts are in the public domain. The English and Traditional Chinese versions used in this study were machine-translated by the authors for experimental purposes and are included to support reproducibility.

Official Dataset

Use the repaired combined dataset for the main analysis:

  • outputs/models6_multilingual_2026-03-06_combined01/raw_public.jsonl
  • outputs/models6_multilingual_2026-03-06_combined01/parsed_wide.csv
  • outputs/models6_multilingual_2026-03-06_combined01/parsed_long.csv
  • outputs/models6_multilingual_2026-03-06_combined01/repair_failed_job_keys.txt

Quality status:

  • 648/648 unique job keys
  • 648/648 successful API requests
  • 648/648 parse-success rows
  • all model x lang cells satisfy the preregistered quality threshold
  • provenance: 631 records from the main run and 17 records from the failure-only patch run

raw_public.jsonl is a redacted public raw log. It preserves job metadata, request prompts, raw assistant text, parse results, and basic usage metadata, but removes provider response IDs, provider-side response bodies, fine-grained cost details, and unnecessary transport metadata. The parsed CSV files preserve their original schema for analysis compatibility, but the response_id values are replaced with REDACTED.

Repository Map

  • docs/REVIEWER_GUIDE.md: shortest reviewer-oriented verification path
  • docs/DATA_AND_REPRODUCIBILITY.md: dataset layout and reproduction workflow
  • docs/preregistration/: preregistration and repair amendment
  • emotion_runner/: experiment runner, parser, analysis, and asset generation code
  • outputs/models6_multilingual_2026-03-06_combined01/: official public dataset and analysis outputs
  • prompt_template.md: fixed multilingual prompt and stimulus specification
  • models.yaml: preregistered model list
  • tests/: unit tests for prompt generation, parsing, repair workflow, and analysis helpers

Reviewer Quick Start

Create a Python environment and install the analysis dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run the tests:

pytest -q

Recreate the analysis outputs from the official combined dataset:

python -m emotion_runner analyze \
  --input-wide outputs/models6_multilingual_2026-03-06_combined01/parsed_wide.csv \
  --outdir outputs/models6_multilingual_2026-03-06_combined01/analysis \
  --study-id models6_multilingual_2026-03-06

The repository also includes the runner needed to reproduce or extend the API collection workflow. Re-running the full experiment requires an OpenRouter API key and may produce different outputs because hosted LLMs can change over time. Do not overwrite the official combined01 dataset; use a new output directory for reruns.

License

Source code is released under the MIT License. Datasets, documentation, prompt/stimulus specifications, analysis outputs, and machine-translated texts are released under CC BY 4.0 unless otherwise noted.

Citation

If you use this repository, please cite the associated paper and this repository release. A CITATION.cff file is included for software citation metadata.

About

Data and code for a preregistered multilingual LLM emotion attribution study using literary texts and persona conditioning.

Topics

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE-DATA.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages