-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
I love the new titlecase plugin! However, using it at the moment converts things to title case no matter the language. This leads to some bizarre results for, e.g., German, French, or Japanese/Roman (yes, really) text:
- For German, AFAIK, there is no title casing at all. Normal capitalization applies (+ the first word of the sentence, i.e., title is capitalized, probably)
- For French (not a native speaker), there seems to be some rule: https://www.reddit.com/r/French/comments/po1exv/how_does_title_case_in_french_work/
- I encountered an unfortunate edge-case with text that is a mixture of Japanese and Roman numbering (this is via
mbsync, so some other things are changed as well):
川井憲次 - 攻殻機動隊 superb music high resolution USB - 謡Ⅲ-Reincarnation
albumtype: compilation -> album
albumtypes: compilation; album; soundtrack -> album; compilation; soundtrack
title: 謡Ⅲ-Reincarnation -> 謡ⅲ-Reincarnation
(this is https://musicbrainz.org/release/5eecdc57-32dd-4d07-8ee5-043ed051276a)
This occurs even though I have Roman numerals special-cased with preserve and replace (happens with just preserve as well). I think that's because there is no space between the Japanese symbol and the numerals, so it treats the whole thing as a word:
...
preserve:
- "Ⅰ"
- "Ⅱ"
- "Ⅲ"
- "Ⅳ"
- "Ⅴ"
replace:
- "ⅰ": "Ⅰ"
- "ⅱ": "Ⅱ"
- "ⅲ": "Ⅲ"
- "ⅳ": "Ⅳ"
- "ⅴ": "Ⅴ"
- "ⅵ": "Ⅵ"
- "ⅶ": "Ⅶ"
- "ⅷ": "Ⅷ"
...Proposed solution
I think it'd be nice to use langdetect or something similar to detect the (most likely) language of a title (or album, but I've seen albums with mixed language tracks).
And then maybe add a whitelist option to select which languages to apply to (English, by default).
Objective
- Make the titlecase plugin operate on English titles only.
Goals
- Not have non-English titles changed by the titlecase plugin :)
Non-goals
- Long-term it might be cool to extend this all to other languages somehow, but not for now.
EDIT: For the Japanese/Roman non-ASCII cases above, it could also be an option to exempt some Unicode blocks entirely.