agents.md — Yomitan Character Dictionary Builder

Project Overview

Rust (Axum) web service that generates Yomitan-compatible character name dictionaries from VNDB and AniList APIs. Given a username or media ID, it fetches characters, parses Japanese names into hiragana readings, builds rich popup cards, and packages everything into a Yomitan ZIP.

No database. No auth. Two external API dependencies (VNDB REST, AniList GraphQL).

Tech Stack

Language: Rust (edition 2021)
Web framework: Axum 0.7
Async runtime: Tokio
HTTP client: reqwest 0.12
ZIP: zip crate v2 (requires Cursor<Vec<u8>> for in-memory writes)
Serialization: serde / serde_json
Other: base64, regex, rand, uuid, tower-http (CORS + static files)

Repository Layout

yomitan-dict-builder/
├── Cargo.toml
├── Dockerfile / docker-compose.yml
├── README.md
├── src/
│   ├── main.rs              # Axum routes + orchestration logic
│   ├── models.rs            # Shared types: Character, CharacterData, CharacterTrait, UserMediaEntry
│   ├── vndb_client.rs       # VNDB REST API client (user resolution, VN title, character fetch, image download)
│   ├── anilist_client.rs    # AniList GraphQL client (user list, character fetch, image download)
│   ├── kana.rs              # Low-level kana utilities: kanji detection, romaji→hiragana, katakana↔hiragana, syllable boundary handling
│   ├── name_parser.rs       # Name handling: unified split/reading API with optional romaji hints, honorific suffixes data
│   ├── content_builder.rs   # Yomitan structured content JSON (character popup cards), spoiler stripping, DictSettings struct
│   ├── image_handler.rs     # Base64 data URI → raw bytes + file extension detection
│   ├── image_cache.rs       # SQLite-backed image cache (blob storage, hit counts, eviction)
│   ├── media_cache.rs       # SQLite-backed media/character data cache with TTL expiry
│   ├── anilist_name_test_data.rs # Bulk integration tests for name splitting/reading with real AniList character data
│   └── dict_builder.rs      # ZIP assembly: index.json, tag_bank, term_banks (chunked at 10k), img/ folder
├── static/
│   ├── index.html           # Frontend HTML (two-column layout with interactive Yomitan preview card)
│   ├── style.css            # Frontend styles (Yomitan card mimicry, responsive layout)
│   └── app.js               # Frontend logic (settings toggles, SSE progress, form handling)
└── tests/
    └── integration_tests.rs # HTTP endpoint tests (require running server)

Related docs (in docs/):

docs/plans/plan.md — Exhaustive implementation plan with full API examples, romaji lookup tables, structured content format, test expectations. Read this for any deep implementation questions.
docs/agents_read_me.md — Guide for agents porting this code to other languages/frameworks. Not relevant when working on the Rust codebase itself.

Build & Run

# From yomitan-dict-builder/
cargo build --release
cargo run --release          # Serves on http://localhost:3000

# Docker
docker compose up -d         # Serves on http://localhost:9721

# Tests (549+ unit tests inline, integration tests need running server)
cargo test

Module Dependency Graph

models.rs          ← everything depends on this
    ↓
vndb_client.rs     ← uses models, reqwest, base64
anilist_client.rs  ← uses models, reqwest, base64
    ↓
kana.rs            ← standalone (no external deps beyond std)
    ↓
name_parser.rs     ← uses kana (unified name splitting + reading generation with optional hints, honorific data)
    ↓
content_builder.rs ← uses models, name_parser
image_handler.rs   ← uses base64
image_cache.rs     ← SQLite-backed (tokio, sha2 hashing)
media_cache.rs     ← SQLite-backed (tokio, serde_json, TTL-based expiry)
    ↓
dict_builder.rs    ← uses models, kana, name_parser, content_builder, image_handler, zip
    ↓
main.rs            ← orchestrates everything via Axum routes

Key Data Flow

Username/Media ID
  → vndb_client / anilist_client: fetch character list (paginated, rate-limited)
  → vndb_client / anilist_client: download portrait images → base64 data URIs
  → name_parser: parse Japanese names → hiragana readings
  → content_builder: build structured content JSON cards
  → dict_builder: generate term entries (base + honorifics + aliases), deduplicate, assemble ZIP
  → HTTP response: application/zip

API Endpoints

Endpoint	Description
`GET /`	Static frontend
`GET /api/user-lists?vndb_user=X&anilist_user=Y`	Preview user's in-progress media
`GET /api/generate-stream?vndb_user=X&...`	SSE progress + download token
`GET /api/download?token=UUID`	Download ZIP by token (single-use, 5min expiry)
`GET /api/yomitan-dict?source=vndb&id=v17`	Direct ZIP generation (blocks until done)
`GET /api/yomitan-dict?vndb_user=X&anilist_user=Y`	Username-based ZIP generation
`GET /api/yomitan-index?...`	Lightweight index.json metadata (for Yomitan update checks)

Query Parameters

Parameter	Type	Default	Description
`vndb_user`	string	—	VNDB username or profile URL
`anilist_user`	string	—	AniList username
`source`	string	—	`"vndb"` or `"anilist"` (single-media mode)
`id`	string	—	Media ID, e.g. `v17` or `9253` (single-media mode)
`media_type`	string	`"ANIME"`	`"ANIME"` or `"MANGA"` (AniList only)
`honorifics`	bool	`true`	Generate honorific suffix entries (さん, ちゃん, 先生, etc.)
`image`	bool	`true`	Include character portrait images in the dictionary
`tag`	bool	`true`	Include role badges (Main Character, etc.)
`description`	bool	`true`	Include character descriptions
`traits`	bool	`true`	Include character traits/information
`spoilers`	bool	`true`	Include spoiler content in descriptions and traits

URL-as-Settings Pattern

All dictionary options (usernames, spoiler level, source, media type) are encoded as query parameters in the URL. This is intentional — the URLs themselves act as persistent settings for Yomitan's update mechanism:

User imports a dictionary via the index URL, e.g. http://host/api/yomitan-index?vndb_user=foo&spoilers=false
Yomitan stores that full index URL internally
On update check, Yomitan re-fetches the index URL → the generate_index handler reconstructs a downloadUrl with all the same query params baked in
Yomitan downloads the fresh ZIP from that URL → dictionary is regenerated with the original settings

This means changing a setting (e.g. toggling spoilers) requires re-importing with a new URL. There is no server-side state or user accounts — the URL IS the configuration. When adding new options, they must be added to the DictQuery struct and threaded through generate_index's downloadUrl construction so they survive the update cycle.

Critical Implementation Details

Things that are easy to break and hard to debug:

Name parsing API: name_parser.rs exposes two public functions — split_japanese_name_with_hints() and generate_name_readings(). Both accept optional first_name_hint (given) and last_name_hint (family) romaji hints. When hints are None (VNDB path), they delegate to internal helpers that use VNDB's positional romaji mapping where romanized_parts[0] → family reading, romanized_parts[1] → given reading (Western→Japanese order swap — looks wrong but is correct). When hints are provided (AniList path), they use the hints to split spaceless native names and generate per-part readings directly. The old split_japanese_name() and generate_mixed_name_readings() still exist as private internal helpers but are not part of the public API.
Image flow: Images must be downloaded and base64-encoded BEFORE passing characters to dict_builder. The builder extracts raw bytes from the base64 data URI and writes them as binary files in the ZIP's img/ folder.
ZIP writer needs Seek: Use std::io::Cursor<Vec<u8>>, not bare Vec<u8>.
Term bank chunking: Max 10,000 entries per term_bank_N.json file.
Entry deduplication: All term entries are deduplicated via HashSet<String> on the term+reading key. Family name matching an alias → only one entry.
Characters without name_original are logged and skipped: No Japanese name = no dictionary entries. A warn! log is emitted with the character's ID and romanized name.
Rate limits: VNDB 200ms between paginated requests (200 req/5min). AniList 300ms (90 req/min).
Spoiler stripping: VNDB uses [spoiler]...[/spoiler], AniList uses ~!...!~. Both must be handled.
Revision field: Must be random on every generation (triggers Yomitan update detection). Not deterministic.
VNDB user input parsing: Users paste URLs like https://vndb.org/u306587. Must extract user ID from URL before API calls. See vndb_client.rs::parse_user_input().
Port is configurable: Via PORT env var, defaults to 3000. BASE_URL env var controls auto-update URLs and defaults to http://127.0.0.1:{PORT}.

Honorific Suffixes (257 entries across 14 categories)

Categories: Respectful/Formal, Casual/Friendly, Academic/Educational, Corporate/Business, Government/Political, Military/Law Enforcement, Religious/Spiritual, Medical, Martial Arts/Traditional, Family/Kinship, Historical/Feudal, Fantasy/Fictional, Otaku/Internet/Modern Slang, and kana-form duplicates.

Applied to: family name, given name, combined name, original (with space), and each alias.

Testing Strategy

Unit tests are inline in each module (#[cfg(test)] blocks). Run with cargo test.
Integration tests in tests/integration_tests.rs require a running server instance.
Key test areas: romaji→hiragana conversion, name splitting, spoiler stripping, birthday formatting, structured content shape, entry deduplication.

Common Tasks

Adding a new API source: Create a new client module following the pattern of vndb_client.rs / anilist_client.rs. Must produce CharacterData and a title string. Wire it into main.rs orchestration.

Changing the popup card layout: Edit content_builder.rs. The structured content format is Yomitan-specific JSON using HTML-like tags. See docs/plans/plan.md section 8 for the full spec.

Adding new honorifics: Edit the HONORIFICS constant in name_parser.rs and update dict_builder.rs if the generation logic needs changes.

Modifying term entry generation: Edit dict_builder.rs::add_character(). This is where base names, honorific variants, and alias entries are created.

Yomitan Structured Content — Allowed HTML Tags & CSS Properties

Source of truth: Yomitan source code at github.com/yomidevs/yomitan (master branch), specifically:

ext/data/schemas/dictionary-term-bank-v3-schema.json (JSON Schema)
types/ext/structured-content.d.ts (TypeScript types)
ext/js/display/structured-content-generator.js (rendering engine)

All schemas use "additionalProperties": false, meaning ONLY the properties listed below are accepted. Anything else is silently dropped or rejected.

Allowed HTML Tags (Exhaustive)

Tag	Category	Supports `style`?	Supports `content` (children)?	Notes
`br`	Empty	No	No	Line break only. Supports `data`.
`ruby`	Unstyled container	No	Yes	Ruby annotation base.
`rt`	Unstyled container	No	Yes	Ruby annotation text.
`rp`	Unstyled container	No	Yes	Ruby fallback parenthesis.
`table`	Unstyled container	No	Yes	Wrapped in a `div.gloss-sc-table-container` at render time.
`thead`	Unstyled container	No	Yes	Table head.
`tbody`	Unstyled container	No	Yes	Table body.
`tfoot`	Unstyled container	No	Yes	Table foot.
`tr`	Unstyled container	No	Yes	Table row.
`td`	Table cell	Yes	Yes	Also supports `colSpan`, `rowSpan`.
`th`	Table cell	Yes	Yes	Also supports `colSpan`, `rowSpan`.
`span`	Styled container	Yes	Yes	Inline container. Also supports `title`.
`div`	Styled container	Yes	Yes	Block container. Also supports `title`.
`ol`	Styled container	Yes	Yes	Ordered list. Also supports `title`.
`ul`	Styled container	Yes	Yes	Unordered list. Also supports `title`.
`li`	Styled container	Yes	Yes	List item. Also supports `title`.
`details`	Styled container	Yes	Yes	Collapsible section. Also supports `title`, `open` (boolean).
`summary`	Styled container	Yes	Yes	Summary for `details`. Also supports `title`.
`img`	Image	No (has own props)	No	Requires `path`. See image properties below.
`a`	Link	No	Yes	Requires `href`. URLs starting with `?` are internal dictionary links. External links must match `^(?:https?:

That's it. No <p>, no <h1>–<h6>, no <b>, no <i>, no <em>, no <strong>, no <u>, no <s>, no <sub>, no <sup>, no <pre>, no <code>, no <blockquote>, no <hr>, no <input>, no <button>, no <form>, no <video>, no <audio>, no <canvas>, no <iframe>, no <script>, no <style>.

Common Attributes (All Elements)

Attribute	Type	Description
`tag`	string (required)	The HTML tag name.
`data`	`{[key: string]: string}`	Custom `data-sc*` attributes added to the DOM element.
`lang`	string	Language code (RFC 5646). Sets `lang` attribute on the element.
`content`	string, Element, or Content[]	Child content. Not supported on `br` or `img`.

Allowed CSS Properties in `style` Object (Exhaustive)

Only the styled containers (span, div, ol, ul, li, details, summary) and table cells (td, th) accept a style object. The following properties are the ONLY ones recognized:

Property	Type	Allowed Values
`fontStyle`	string	`"normal"`, `"italic"`
`fontWeight`	string	`"normal"`, `"bold"`
`fontSize`	string	Any CSS font-size string (e.g. `"1.2em"`, `"small"`)
`color`	string	Any CSS color string
`background`	string	Any CSS background shorthand string
`backgroundColor`	string	Any CSS color string
`textDecorationLine`	string or string[]	`"none"`, `"underline"`, `"overline"`, `"line-through"` (or array of the non-none values)
`textDecorationStyle`	string	`"solid"`, `"double"`, `"dotted"`, `"dashed"`, `"wavy"`
`textDecorationColor`	string	Any CSS color string
`borderColor`	string	Any CSS color string
`borderStyle`	string	Any CSS border-style string
`borderRadius`	string	Any CSS border-radius string
`borderWidth`	string	Any CSS border-width string
`clipPath`	string	Any CSS clip-path string
`verticalAlign`	string	`"baseline"`, `"sub"`, `"super"`, `"text-top"`, `"text-bottom"`, `"middle"`, `"top"`, `"bottom"`
`textAlign`	string	`"start"`, `"end"`, `"left"`, `"right"`, `"center"`, `"justify"`, `"justify-all"`, `"match-parent"`
`textEmphasis`	string	Any CSS text-emphasis shorthand string
`textShadow`	string	Any CSS text-shadow string
`margin`	string	Any CSS margin shorthand string
`marginTop`	number or string	Number → converted to `em`. String → used as-is.
`marginLeft`	number or string	Number → converted to `em`. String → used as-is.
`marginRight`	number or string	Number → converted to `em`. String → used as-is.
`marginBottom`	number or string	Number → converted to `em`. String → used as-is.
`padding`	string	Any CSS padding shorthand string
`paddingTop`	string	Any CSS length string
`paddingLeft`	string	Any CSS length string
`paddingRight`	string	Any CSS length string
`paddingBottom`	string	Any CSS length string
`wordBreak`	string	`"normal"`, `"break-all"`, `"keep-all"`
`whiteSpace`	string	Any CSS white-space string
`cursor`	string	Any CSS cursor string
`listStyleType`	string	Any CSS list-style-type string

No display, no position, no float, no width/height (use image props for images), no overflow, no opacity, no transform, no transition, no animation, no z-index, no flex/grid properties, no visibility, no box-shadow, no outline, no max-width/min-width, no font-family, no line-height, no letter-spacing.

Image Element Properties (`tag: "img"`)

Property	Type	Description
`path`	string (required)	Path to image file in the ZIP archive.
`width`	number	Preferred width (minimum 0).
`height`	number	Preferred height (minimum 0).
`title`	string	Hover text.
`alt`	string	Alt text.
`description`	string	Description of the image.
`pixelated`	boolean	Pixelated rendering at larger sizes. Default `false`.
`imageRendering`	string	`"auto"`, `"pixelated"`, `"crisp-edges"`. Supersedes `pixelated`.
`appearance`	string	`"auto"`, `"monochrome"`. Monochrome masks opaque parts with text color.
`background`	boolean	Show background color behind image. Default `true`.
`collapsed`	boolean	Image collapsed by default. Default `false`.
`collapsible`	boolean	Image can be collapsed. Default `true`.
`verticalAlign`	string	Same enum as style verticalAlign.
`border`	string	CSS border shorthand.
`borderRadius`	string	CSS border-radius.
`sizeUnits`	string	`"px"` or `"em"`.

Link Element Properties (`tag: "a"`)

Property	Type	Description
`href`	string (required)	URL. Must match `^(?:https?:
`content`	Content	Child content for the link text.
`lang`	string	Language code (RFC 5646).

Workarounds for Missing Tags

Since <b>, <i>, <em>, <strong>, <u>, <s>, <sub>, <sup> are not available:

Desired Effect	Workaround
Bold	`{"tag": "span", "style": {"fontWeight": "bold"}, "content": "text"}`
Italic	`{"tag": "span", "style": {"fontStyle": "italic"}, "content": "text"}`
Underline	`{"tag": "span", "style": {"textDecorationLine": "underline"}, "content": "text"}`
Strikethrough	`{"tag": "span", "style": {"textDecorationLine": "line-through"}, "content": "text"}`
Subscript	`{"tag": "span", "style": {"verticalAlign": "sub", "fontSize": "smaller"}, "content": "text"}`
Superscript	`{"tag": "span", "style": {"verticalAlign": "super", "fontSize": "smaller"}, "content": "text"}`
Heading-like	`{"tag": "span", "style": {"fontWeight": "bold", "fontSize": "1.2em"}, "content": "text"}`
Paragraph spacing	`{"tag": "div", "style": {"marginBottom": 0.5}, "content": "text"}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

agents.md — Yomitan Character Dictionary Builder

Project Overview

Tech Stack

Repository Layout

Build & Run

Module Dependency Graph

Key Data Flow

API Endpoints

Query Parameters

URL-as-Settings Pattern

Critical Implementation Details

Honorific Suffixes (257 entries across 14 categories)

Testing Strategy

Common Tasks

Yomitan Structured Content — Allowed HTML Tags & CSS Properties

Allowed HTML Tags (Exhaustive)

Common Attributes (All Elements)

Allowed CSS Properties in `style` Object (Exhaustive)

Image Element Properties (`tag: "img"`)

Link Element Properties (`tag: "a"`)

Workarounds for Missing Tags

Uh oh!

FilesExpand file tree

agents.md

Latest commit

History

agents.md

File metadata and controls

agents.md — Yomitan Character Dictionary Builder

Project Overview

Tech Stack

Repository Layout

Build & Run

Module Dependency Graph

Key Data Flow

API Endpoints

Query Parameters

URL-as-Settings Pattern

Critical Implementation Details

Honorific Suffixes (257 entries across 14 categories)

Testing Strategy

Common Tasks

Yomitan Structured Content — Allowed HTML Tags & CSS Properties

Allowed HTML Tags (Exhaustive)

Common Attributes (All Elements)

Allowed CSS Properties in style Object (Exhaustive)

Image Element Properties (tag: "img")

Link Element Properties (tag: "a")

Workarounds for Missing Tags

Allowed CSS Properties in `style` Object (Exhaustive)

Image Element Properties (`tag: "img"`)

Link Element Properties (`tag: "a"`)