Skip to content

IA import dropped Unicode characters #11637

@cdrini

Description

@cdrini

Problem

This import: https://openlibrary.org/recentchanges/2025/12/30/edit-book/160483814 dropped unicode characters like é and è. Note the subjects. They appear correct on internet archive though.

Reproducing the bug

  1. Locally, import https://openlibrary.org/import/preview?source=ia:worldalmanacbook1991mark
  • Expected behavior: subjects correctly include unicode characters in subjects, as they appear on ia
  • Actual behavior: unicode characters dropped

Context

  • Browser (Chrome, Safari, Firefox, etc):
  • OS (Windows, Mac, etc):
  • Logged in (Y/N):
  • Environment (prod, dev, local): prod

Breakdown

Requirements Checklist

  • [ ]

Related files

Stakeholders


Instructions for Contributors

  • Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.

Metadata

Metadata

Assignees

Labels

Affects: DataIssues that affect book/author metadata or user/account data. [managed]Lead: @mekarpelesIssues overseen by Mek (Staff: Program Lead) [managed]Needs: BreakdownThis big issue needs a checklist or subissues to describe a breakdown of work. [managed]Priority: 3Issues that we can consider at our leisure. [managed]Theme: MARC recordsType: BugSomething isn't working. [managed]metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions