Can Language Models Handle a Non-Gregorian Calendar? The Case of the Japanese wareki

This repository contains code and data for the paper:

Can Language Models Handle a Non-Gregorian Calendar? The Case of the Japanese wareki

Our study presents a systematic evaluation of how language models handle the Japanese calendar, a representative non-Gregorian system. We designed three tasks requiring both temporal knowledge and reasoning: Calendar Conversion, Japanese Calendar arithmetic, Birth year recall. We find that while some models succeed in basic calendar conversions, even Japanese-centric LMs and advanced models such as GPT-4o and DeepSeek V3 struggle with Japanese-calendar arithmetic and birth year recall. The findings reveal a strong bias toward the Gregorian calendar and highlight the need for culturally aware temporal reasoning in language models.

🎉This paper has been accepted to AACL-IJCNLP 2025🎉

Requirements

If you use uv, you can set up the environment as follows. Please run the experiments inside Non-Gregorian_Calendar directory:

cd Non-Gregorian_Calendar
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

Models

We evaluated five Japanese-cenric LMs four English-centric LMs, and two state-of-the-art models.

Japanese-centric LMs
- llm-jp-3-13b
- sarashina2-13b
- Swallow-13b
- Swallow-MS-7b
- Llama-3-Swallow-8B
English-centric LMs
- Llama-2-7b
- Llama-2-13b
- Mistral-7B
- Llama-3.1-8B

Dataset

We constructed datasets for three tasks (Calendar Conversion, Japanese Calendar Arithmetic, and Birth Year Recall) covering Japanese five most recent eras: Meiji, Taisho, Showa, Heisei, and Reiwa. For details, please refer to the data directory.

Calendar conversion

You can obtain the results by running the following command.

python src/calendar_conversion/plot_calendar_conversion.py \
  --base_dir out/calendar_conversion/ \
  --output_dir out/calendar_conversion/plot/ \
  --models llm-jp-3-13b sarashina2-13b Swallow-13b-hf Swallow-MS-7b-v0.1 Llama-3-Swallow-8B-v0.1 Llama-2-7b-hf Llama-2-13b-hf Mistral-7B-v0.1 Llama-3.1-8B gpt-4o deepseek-chat

Japanese calendar arithmetic

You can obtain the results by running the following command.

python src/japanese_calendar_arithmetic/plot_japanese_calendar_arithmetic.py \
  --base_dir out/japanese_calendar_arithmetic/ \
  --output_dir out/japanese_calendar_arithmetic//plot/ \
  --task_type after \
  --models llm-jp-3-13b sarashina2-13b Swallow-13b-hf Swallow-MS-7b-v0.1 Llama-3-Swallow-8B-v0.1 Llama-2-7b-hf Llama-2-13b-hf Mistral-7B-v0.1 Llama-3.1-8B gpt-4o deepseek-chat

Birth year recall

You can obtain the results by running the following command.

python src/birth_year_recall/plot_birth_year_recall.py \
  --base_dir out/birth_year_recall/summary/ \
  --output_dir out/birth_year_recall/plot/ \
  --models llm-jp-3-13b sarashina2-13b Swallow-13b-hf Swallow-MS-7b-v0.1 Llama-3-Swallow-8B-v0.1 Llama-2-7b-hf Llama-2-13b-hf Mistral-7B-v0.1 Llama-3.1-8B gpt-4o deepseek-chat

Discussion

We analyzed why the models failed, particularly in Japanese Calendar Arithmetic and Birth Year Recall, from the perspectives of estimated corpus frequency (by using infini-gram[1]) and typical error patterns.

Estimated Corpus frequency

You can obtain the results of error analysis of estimated corpus frequency by running the following command.

Japanese Calendar arithmetic

python src/discussion/plot_corpus_estimated_analysis_japanese_calendar_arithmetic.py \
  --corpus_dir data/infini-gram/ \
  --results_file out/japanese_calendar_arithmetic/llm-jp-3-13b/after10/ja-prompt/results_summary.json \
  --output /home/m_sasaki/Non-Gregorian_Calendar/out/discussion/estimates_corpus_freqency/japanese_calendar_arithmetic/

Birth year recall

python src/discussion/plot_corpus_estimated_analysis_birth_year_recall.py \
  --corpus_dir data/infini-gram/ \
  --results_dir out/birth_year_recall/summary/llm-jp-3-13b/birthyear_wareki/japrompt/ja-ja-name/ \
  --output out/discussion/estimates_corpus_freqency/birth_year_recall/

Typical Errors

Japanese Calendar arithmetic

python src/discussion/plot_typical_error_japanese_calendar_arithmetic.py \
  --base_dir out/Japanese_calendar_arithmetic/ \
  --output_dir out/discussion/typical_error/japanese_calendar_arithmetic/ \
  --models sarashina2-13b Llama-3-Swallow-8B-v0.1 Mistral-7B-v0.1

Birth year recall

python src/discussion/plot_typical_error_birth_year_recall.py \
  --base_dir out/birth_year_recall/summary/ \
  --output_dir out/discussion/typical_error/birth_year_recall/ \
  --models llm-jp-3-13b sarashina2-13b Swallow-13b-hf Swallow-MS-7b-v0.1 Llama-3-Swallow-8B-v0.1

Reference

[1] Liu+ Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens (2024)

@inproceedings{
Liu2024InfiniGram,
title={Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens},
author={Jiacheng Liu and Sewon Min and Luke Zettlemoyer and Yejin Choi and Hannaneh Hajishirzi},
booktitle={First Conference on Language Modeling},
year={2024},
url={https://openreview.net/forum?id=u2vAyMeLMm}
}

Citation

This is the version that has been published on arXiv. It will be updated soon.

@misc{sasaki2025languagemodelshandlenongregorian,
      title={Can Language Models Handle a Non-Gregorian Calendar?}, 
      author={Mutsumi Sasaki and Go Kamoda and Ryosuke Takahashi and Kosuke Sato and Kentaro Inui and Keisuke Sakaguchi and Benjamin Heinzerling},
      year={2025},
      eprint={2509.04432},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.04432}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
out		out
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can Language Models Handle a Non-Gregorian Calendar? The Case of the Japanese wareki

Requirements

Models

Dataset

Calendar conversion

Japanese calendar arithmetic

Birth year recall

Discussion

Estimated Corpus frequency

Typical Errors

Reference

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Can Language Models Handle a Non-Gregorian Calendar? The Case of the Japanese wareki

Requirements

Models

Dataset

Calendar conversion

Japanese calendar arithmetic

Birth year recall

Discussion

Estimated Corpus frequency

Typical Errors

Reference

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages