fix: improve AniList alternative name handling and reading generation#18
fix: improve AniList alternative name handling and reading generation#18
Conversation
- Parse "Romanized (Japanese)" format in alt names: extract Japanese part as the dictionary term and use the romanized part for reading generation (e.g. "Aoki Umi (碧井海)" → term "碧井海" with reading "あおきうみ") - Filter out non-Japanese aliases to prevent English terms like "Ruri's Mother" from getting dictionary entries - Strip internal whitespace from generated readings to fix issues like "えらいな の はは" → "えらいなのはは" - Add contains_japanese() helper to kana module for detecting Japanese text https://claude.ai/code/session_01NRsURdxvKrse2dQTBW7LZ4
- Add `alternativeSpoiler` to the AniList GraphQL character query - Add `spoiler_aliases` field to Character model - Parse spoiler alternatives from API response, filtering nulls/empties - Merge spoiler aliases (deduplicated union) during character merging - Include spoiler alias entries in dictionary only when user has enabled the existing `show_spoilers` setting - VNDB characters default to empty spoiler_aliases (no equivalent field) https://claude.ai/code/session_01NRsURdxvKrse2dQTBW7LZ4
There was a problem hiding this comment.
Pull request overview
This PR improves character alias ingestion and reading generation when building Yomitan dictionaries, primarily by enhancing AniList alternative-name parsing (including spoiler-marked aliases) and normalizing kana readings produced from romaji.
Changes:
- Add
spoiler_aliasesto theCharactermodel, parse AniListalternativeSpoiler, and gate spoiler alias entries behindshow_spoilers. - Improve alias term generation by parsing
Romanized (Japanese)formats and filtering out non-Japanese aliases from becoming dictionary terms. - Normalize generated readings by stripping internal whitespace, and add
kana::contains_japanese()to support alias filtering.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
yomitan-dict-builder/src/anilist_client.rs |
Fetches/parses AniList alternativeSpoiler into spoiler_aliases and adds tests. |
yomitan-dict-builder/src/models.rs |
Extends Character with spoiler_aliases (serde default). |
yomitan-dict-builder/src/vndb_client.rs |
Sets spoiler_aliases to empty for VNDB characters. |
yomitan-dict-builder/src/kana.rs |
Adds contains_japanese() helper for Japanese-script detection. |
yomitan-dict-builder/src/name_parser.rs |
Strips internal whitespace from readings and adds tests covering the behavior. |
yomitan-dict-builder/src/anilist_name_test_data.rs |
Updates expected readings to reflect whitespace stripping. |
yomitan-dict-builder/src/dict_builder.rs |
Parses Romanized (Japanese) aliases, filters non-Japanese aliases, integrates spoiler aliases, and adds tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if !other.spoiler_aliases.is_empty() { | ||
| let existing: HashSet<String> = base.spoiler_aliases.iter().cloned().collect(); | ||
| for alias in &other.spoiler_aliases { | ||
| if !alias.is_empty() && !existing.contains(alias) { | ||
| base.spoiler_aliases.push(alias.clone()); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
merge_character_fields claims spoiler aliases are deduplicated, but the existing HashSet is never updated as new aliases are pushed. If other.spoiler_aliases contains duplicates (or repeats within the loop), duplicates can still be appended to base.spoiler_aliases. Consider making existing mutable and inserting into it when pushing (e.g., if existing.insert(alias.clone()) { ... }).
| // Include spoiler aliases when the user has enabled spoilers | ||
| let all_aliases: Box<dyn Iterator<Item = &String>> = if self.settings.show_spoilers { | ||
| Box::new(char.aliases.iter().chain(char.spoiler_aliases.iter())) | ||
| } else { | ||
| Box::new(char.aliases.iter()) | ||
| }; |
There was a problem hiding this comment.
Building all_aliases as Box<dyn Iterator<...>> adds a heap allocation and dynamic dispatch on a hot path. This can be avoided by using a concrete iterator (e.g., chaining char.aliases.iter() with char.spoiler_aliases.iter() or std::iter::empty() based on the flag) so the compiler can optimize it.
- Make HashSet mutable in alias/spoiler_alias merging so duplicates within `other` are properly caught during the loop - Replace Box<dyn Iterator> with slice reference + chain for spoiler alias iteration, avoiding unnecessary heap allocation on hot path https://claude.ai/code/session_01NRsURdxvKrse2dQTBW7LZ4
as the dictionary term and use the romanized part for reading generation
(e.g. "Aoki Umi (碧井海)" → term "碧井海" with reading "あおきうみ")
Mother" from getting dictionary entries
"えらいな の はは" → "えらいなのはは"
https://claude.ai/code/session_01NRsURdxvKrse2dQTBW7LZ4