feat(accent): reading override layer + streaming MarkAccent endpoint by torrid-fish · Pull Request #47 · sessatakuma/API-tools

torrid-fish · 2026-05-20T12:44:52Z

目的

針對 Yahoo Furigana / OJAD 在日期、星期、期間、年齡等 context-sensitive 讀音上的錯誤，加上一層 regex-based override；同時把 /api/MarkAccent/ 升級成可串流的版本，並把 furigana endpoint 合併進來統一維護。

Base = refactor/accent-package (#52)。等 #52 merge 後改 base 到 main。

Rebase 更新 (2026-05-21)

Rebased onto PR #52 (refactor/accent-package)。原本一坨堆在 api/accent_marker.py 的內容，已依照新的 package layout 重新分布：

regex override engine → api/accent/reading_overrides.py
streaming endpoint (/MarkAccent/stream/) → api/accent/routes.py + helper in api/accent/pipeline.py
DP aligner upgrade (greedy → Needleman-Wunsch with rendaku fold) → api/accent/align.py
URL / non-JP preprocessing + sentence splitting → api/accent/pipeline.py

3 commits on top of refactor/accent-package:

801aff4 feat(accent): add /MarkAccent/stream/ NDJSON endpoint + dev helpers
27d9da9 feat(accent): regex reading-override layer + URL/non-JP preprocessing
d659feb refactor(accent): replace greedy aligner with Needleman-Wunsch DP + rendaku fold

Previous tip b87ebe9 reachable via reflog if needed. Old base (feat/docker-compose) replaced with refactor/accent-package.

Verified locally:

ruff format --check, ruff check, mypy all pass on 9 source files
3月5日(土) → date override fires (いつか), weekday bracket override fires (ど)
/MarkAccent/stream/ returns NDJSON, one object per line

下方原始說明仍然描述設計意圖（演算法、規則表、串流分段策略），檔案路徑請對照上方新 layout。

方法／實作說明

1. Reading override 層（now `api/accent/reading_overrides.py`）

Yahoo Furigana 對日期／星期沒有上下文判斷（5日 → にち、(土) → つち），OJAD 對數字 token 也常 misalign。新增一層套在 Yahoo 之後、OJAD alignment 之後各跑一次的 regex override：

資料結構：FuriganaOverride(pattern, replacements, description) + ReplacementToken(furigana, surface, accent)，引擎完全 generic，跟領域無關。
比對演算法：_collect_matches 收集所有 hit 後依 (start, -length) 排序、丟掉 overlap；同位置「較長 match 勝出」自然讓 N日間 (3-4 chars) 蓋過 N日 (2-3 chars)。
boundary check：match 一定要落在 Yahoo token 邊界上，沒對齊就 warn 並跳過 — 不會弄壞 Yahoo 原本的 token list。

涵蓋規則：

日期 1-31日（_date_overrides）：1-10日、14日、20日、24日給正確的 irregular 讀音（ついたち、ふつか、…、はつか）；11-31日給確定的 regular 讀音，避開 Yahoo 對數字 token 不回 furigana 的問題。
星期 (月)-(日)（_day_of_week_overrides）：括號內單漢字星期。
期間 1-31日間（_duration_overrides）：避免 Yahoo 把 1日間 切成 [1, 日間] 後 surface 變 1にちかん。7日間 採しちにちかん（現代偏好），1日間 採いちにちかん（不能跟ついたち撞）。
年齡 20歳/才（_age_overrides）：はたち（頭高）。

2. 數字變體 helper

_int_to_kanji(n) + _numeric_pattern(n) 把 (arabic, full-width, kanji) 三種寫法從整數自動展開，新增 N-prefixed 規則只需要寫 (n, 讀音)，不用再手寫 (?:漢字|全形|半形) alternation。同時 honor feedback_japanese_text_variants — 任何 JP regex 都會 cover 三種變體。

3. 對齊演算法升級（now `api/accent/align.py`）

Needleman-Wunsch DP align_accent 取代原本的 greedy alignment — 解決長段落裡一個 misalignment 連鎖污染後續所有 token 的問題。
Rendaku-tolerant：DP cost 容許が↔か、だ↔た等濁音／清音互換。
長音字不再被跳過：原本 alignment 對「動画」、「映像」這類有長音的詞會直接 skip，現在能正確對到。
furigana override 在 OJAD alignment 之前套用，讓 OJAD 看到的是已修正的 token。

4. 串流 endpoint `/api/MarkAccent/stream/`

依 \n 與全形句點 (。／！／？) 切 chunk，每段獨立打 Yahoo + OJAD。
NDJSON 回傳，一個 chunk 一行 {"chunk": N, "subchunk": M, "status": ..., "result": [...], "error": ...}。
非日文段落、純 URL／純標點段落會被 skip 而不打外部 API。
單一 chunk 失敗不會中斷整個 stream，會以 status=500 回該 chunk 然後繼續。
長段落 stability fix：DP table 大小、subchunk 切點都有對應的 safety bound。

5. furigana → accent 合併

刪掉舊的 api/furigana_marker.py 與 /api/MarkFurigana/ route。所有用 furigana 的內部呼叫改走 _fetch_yahoo_raw + apply_furigana_overrides（now under api/accent/）。
_fetch_yahoo_raw 改成回傳 list[WordResult] | _YahooFetchError（frozen dataclass sentinel），避免外層 try/except 把 408 timeout 吞成 opaque 500。
override module 從 config/furigana_overrides.py → api/accent/reading_overrides.py（git mv，保留歷史） — config/ 只該放真 config，不該放轉換邏輯。

6. 開發工具

test.sh：local API smoke test，支援 STREAM=1 ./test.sh 跑串流，per-line (surface|furigana|accent_marking_type) 輸出。
.gitignore：把 data/ 跟 output/ 排除掉，避免 test fixture 進 git。

Diff 範圍

3 commits ahead of refactor/accent-package (#52)，主要動到：

api/accent/align.py（DP align + rendaku fold）
api/accent/reading_overrides.py（新檔，從 config/furigana_overrides.py rename + 大幅擴充）
api/accent/pipeline.py（streaming chunk orchestrator、URL/non-JP preprocess）
api/accent/routes.py（/MarkAccent/stream/ route）
test.sh（新增）

關聯

Implements Rule-based post accent correction #45 (Rule-based post accent correction) — override 層即此 issue 描述的 rule-based correction 機制，套用在 Yahoo furigana 與 OJAD accent 兩條 path 上。
疊在 refactor(accent): split into api/accent/ package (no behavior change) #52 (refactor/accent-package) 上。
跟 fix: let model can output numeric's furigana #44（numeric token furigana）方向一致 — 此 PR 用 override 層提供更廣的 deterministic fallback。

附註

Draft 狀態：streaming endpoint 的 stability fix 跟 override 表都還有可能再迭代（例如 日後／日前／日中 等期間訊號目前刻意暫不處理，等實際 case 出現再加）。
apply_*_overrides 已經涵蓋 furigana 跟 accent 兩條 path，新加 rule 不用動 pipeline 主流程。

🤖 Generated with Claude Code

…endaku fold The greedy aligner had two failure modes that cascaded across whole sentences: a numeric anchor that over-consumed when Yahoo and OJAD disagreed on phrase boundary, and a +1 fallback path that turned a single mismatch into type-0 fallback for every downstream token. Replaces it with a global DP over (yahoo_token, ojad_entry) pairs: each Yahoo token consumes k ∈ [0, K_MAX] contiguous OJAD entries, with per-token cost computed via shape (punct/numeric/kana) and edit distance over rendaku-folded strings for kana tokens. Sub cost (0.4) is lower than ins/del (1.0) so the DP prefers same-length spans with substitutions over shorter spans with deletions — fixes the case where OJAD's `う` from `等→とう` leaked onto the next token. Adds a voicing-fold table so Yahoo's dictionary-form readings (ふんかん) align against OJAD's pronounced readings with rendaku (ぷんかん). All comparisons under this fold; ぱ/ば/ぷ/ぶ all alias to は/ふ. Refs #47.

Add api/accent/reading_overrides.py — a context-blind correction layer sitting between Yahoo Furigana and OJAD alignment. Each override is a regex on the concatenated surface text plus the replacement tokens that should appear instead. Covers: - 曜日 brackets: (月)/（月）→ げつ, (土) → ど, etc. for all 7 weekdays. - All 31 day-of-month readings: 1日 → ついたち (atamadaka), 5日 → いつか, 14日 → じゅうよっか, 20日 → はつか, etc. - N日間 durations 1-31: 1日間 → いちにちかん (NOT ついたちかん since the 1st-of-month reading is impossible for a duration), 7日間 → しちにちかん (modern technical writing preference over なのかかん). - 20歳 / 二十歳 / 20才 → はたち (the only irregular age reading). Patterns accept arabic / full-width / kanji numeral variants of the same N so `3月5日(土)` / `３月５日（土）` / `三月五日（土）` all trigger the same overrides. Order-of-overrides matters: duration list precedes date list so `N日間` wins over `N日` at the same start (longer match breaks ties in _collect_matches). apply_furigana_overrides runs BEFORE align_accent so merged spans like `5日→いつか` reach OJAD as a single token whose furigana matches OJAD's phrase reading (the numeric-anchor logic in align_accent otherwise cascades-fails because numeric tokens lack any Yahoo furigana). apply_accent_overrides runs AFTER align to re-stamp both furigana and accent on the same matched spans, so the response is consistent. Adds URL preprocessing: each https?:// is swapped for the placeholder "URLPLACEHOLDER" before the pipeline runs (Yahoo fragments URLs across several alphabet tokens; OJAD's phrasing scraper produces noise for Latin punctuation runs — both drag alignment off-rail). Placeholders are walked back to the originals in order after alignment. URL body stops at whitespace, any Japanese char, or `,()<>[]"'` so embedded URLs strip cleanly. Adds a non-Japanese short-circuit: if (after URL stripping) the chunk contains no hiragana / katakana / CJK ideograph, skip Yahoo + OJAD entirely and echo the chunk back as a single token. Lets pure-URL / pure-English lines stream through cheaply. Also adds stream_accent_chunks() to pipeline.py as a helper used by the streaming endpoint added in the next commit. Splits the input on \n then on full-width sentence terminators (。！？．) — long paragraphs degrade OJAD's phrasing predictor and parallelising across sentences caps the latency. In-flight work is bounded by a semaphore (concurrency=4) because OJAD's u-tokyo backend falls over with 30+ parallel scrapes. main.py docstring updated to reflect /MarkAccent/stream/. Refs #47.

Add a streaming variant of /MarkAccent/ that processes the input as a sequence of (line, sentence) chunks and emits one NDJSON object per chunk in input order. Each line carries `{"chunk": line_idx, "subchunk": sub_idx, ...AccentResponse}` so clients can render output incrementally while keeping document position. Underlying chunk-fanout and concurrency limiting live in pipeline.stream_accent_chunks; the route is a thin StreamingResponse wrapper. Streaming benefits compound: OJAD's phrasing predictor degrades on long inputs (a single misaligned mora cascades across the paragraph), so per-sentence chunks both stay short enough for OJAD to handle and fan out under the bounded semaphore. Also adds test.sh — a small bash smoke-test helper that POSTs a sample text to either /MarkAccent/ or /MarkFurigana/ and pretty-prints the per-moji (surface|furigana|accent_marking_type) rows. STREAM=1 switches to the streaming endpoint, ENDPOINT= picks which router. Useful while iterating on overrides; not wired into CI. .gitignore adds data/ and output/ for ad-hoc test fixtures we don't want committed. Refs #47.

torrid-fish · 2026-05-21T06:42:47Z

Rebased onto PR #52 (refactor/accent-package). Previous monolithic api/accent_marker.py changes redistributed across the new package layout:

regex override engine → api/accent/reading_overrides.py
streaming endpoint (/MarkAccent/stream/) → api/accent/routes.py + helper in api/accent/pipeline.py
DP aligner upgrade (greedy → Needleman-Wunsch with rendaku fold) → api/accent/align.py
URL/non-JP preprocessing + sentence splitting → api/accent/pipeline.py

3 commits on top of refactor/accent-package:

801aff4 feat(accent): add /MarkAccent/stream/ NDJSON endpoint + dev helpers
27d9da9 feat(accent): regex reading-override layer + URL/non-JP preprocessing
d659feb refactor(accent): replace greedy aligner with Needleman-Wunsch DP + rendaku fold

Previous tip b87ebe9 reachable via reflog if needed. Old base (feat/docker-compose) replaced with refactor/accent-package.

Verified locally:

ruff format --check, ruff check, mypy all pass on 9 source files
3月5日(土) → date override fires (いつか), weekday bracket override fires (ど)
/MarkAccent/stream/ returns NDJSON, one object per line

PR body still describes the old monolithic structure — will refresh if you want.

Tighten the no-behavior-change refactor based on Copilot's review on PR #52. Active findings: - pipeline.py: rename unused `ojad_surface` to `_ojad_surface` so F841 catches it if Ruff's unused-binding rule ever lands; the OJAD echo string isn't consumed here. - furigana.py: wrap `response.json()` in try/except. The docstring promised malformed payloads would surface via the FuriganaResponse envelope, but an invalid Content-Type / non-JSON body would have raised through. Catch ValueError and return a 500 envelope. - ojad.py: switch `raise e` -> bare `raise` and `logger.error(f"...")` -> `logger.exception(...)` to preserve the original traceback. - models.py: "describe" -> "describes" (x3 occurrences); "givent" -> "given". Low-confidence findings also addressed: - align.py module docstring used to claim `punctuation_marks`, `skip_marks`, and `clean_query` are consumed by alignment. They aren't — they're carried over from the pre-refactor module for the downstream PR #47 to use. Reword to reflect that. - clean_query docstring overclaimed punctuation stripping; it only filters ASCII letters. Reword + rename the local comprehension var from `chr` to `char` to stop shadowing the builtin. Verified: uv run ruff check api/accent/ main.py # all passed uv run ruff format --check api/accent/ main.py # 8/8 formatted uv run mypy api/accent/ main.py # 8 files, no issues POST /api/MarkAccent/ + /MarkFurigana/ routes still register Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two cleanups from Copilot's follow-up review on the fix commit: - README mermaid diagram labels `align_accent` as "DP alignment", but the surrounding prose (lines 99/107) correctly calls it "single-pass greedy". DP is a future PR (#47). Rename the node to "Greedy alignment" so the diagram matches the implementation. - `align.py:122` had a comment typo "regard as punchutation" → "regarded as punctuation". No code changes; docstring / comment / diagram only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…endaku fold The greedy aligner had two failure modes that cascaded across whole sentences: a numeric anchor that over-consumed when Yahoo and OJAD disagreed on phrase boundary, and a +1 fallback path that turned a single mismatch into type-0 fallback for every downstream token. Replaces it with a global DP over (yahoo_token, ojad_entry) pairs: each Yahoo token consumes k ∈ [0, K_MAX] contiguous OJAD entries, with per-token cost computed via shape (punct/numeric/kana) and edit distance over rendaku-folded strings for kana tokens. Sub cost (0.4) is lower than ins/del (1.0) so the DP prefers same-length spans with substitutions over shorter spans with deletions — fixes the case where OJAD's `う` from `等→とう` leaked onto the next token. Adds a voicing-fold table so Yahoo's dictionary-form readings (ふんかん) align against OJAD's pronounced readings with rendaku (ぷんかん). All comparisons under this fold; ぱ/ば/ぷ/ぶ all alias to は/ふ. Refs #47.

Add api/accent/reading_overrides.py — a context-blind correction layer sitting between Yahoo Furigana and OJAD alignment. Each override is a regex on the concatenated surface text plus the replacement tokens that should appear instead. Covers: - 曜日 brackets: (月)/（月）→ げつ, (土) → ど, etc. for all 7 weekdays. - All 31 day-of-month readings: 1日 → ついたち (atamadaka), 5日 → いつか, 14日 → じゅうよっか, 20日 → はつか, etc. - N日間 durations 1-31: 1日間 → いちにちかん (NOT ついたちかん since the 1st-of-month reading is impossible for a duration), 7日間 → しちにちかん (modern technical writing preference over なのかかん). - 20歳 / 二十歳 / 20才 → はたち (the only irregular age reading). Patterns accept arabic / full-width / kanji numeral variants of the same N so `3月5日(土)` / `３月５日（土）` / `三月五日（土）` all trigger the same overrides. Order-of-overrides matters: duration list precedes date list so `N日間` wins over `N日` at the same start (longer match breaks ties in _collect_matches). apply_furigana_overrides runs BEFORE align_accent so merged spans like `5日→いつか` reach OJAD as a single token whose furigana matches OJAD's phrase reading (the numeric-anchor logic in align_accent otherwise cascades-fails because numeric tokens lack any Yahoo furigana). apply_accent_overrides runs AFTER align to re-stamp both furigana and accent on the same matched spans, so the response is consistent. Adds URL preprocessing: each https?:// is swapped for the placeholder "URLPLACEHOLDER" before the pipeline runs (Yahoo fragments URLs across several alphabet tokens; OJAD's phrasing scraper produces noise for Latin punctuation runs — both drag alignment off-rail). Placeholders are walked back to the originals in order after alignment. URL body stops at whitespace, any Japanese char, or `,()<>[]"'` so embedded URLs strip cleanly. Adds a non-Japanese short-circuit: if (after URL stripping) the chunk contains no hiragana / katakana / CJK ideograph, skip Yahoo + OJAD entirely and echo the chunk back as a single token. Lets pure-URL / pure-English lines stream through cheaply. Also adds stream_accent_chunks() to pipeline.py as a helper used by the streaming endpoint added in the next commit. Splits the input on \n then on full-width sentence terminators (。！？．) — long paragraphs degrade OJAD's phrasing predictor and parallelising across sentences caps the latency. In-flight work is bounded by a semaphore (concurrency=4) because OJAD's u-tokyo backend falls over with 30+ parallel scrapes. main.py docstring updated to reflect /MarkAccent/stream/. Refs #47.

Add a streaming variant of /MarkAccent/ that processes the input as a sequence of (line, sentence) chunks and emits one NDJSON object per chunk in input order. Each line carries `{"chunk": line_idx, "subchunk": sub_idx, ...AccentResponse}` so clients can render output incrementally while keeping document position. Underlying chunk-fanout and concurrency limiting live in pipeline.stream_accent_chunks; the route is a thin StreamingResponse wrapper. Streaming benefits compound: OJAD's phrasing predictor degrades on long inputs (a single misaligned mora cascades across the paragraph), so per-sentence chunks both stay short enough for OJAD to handle and fan out under the bounded semaphore. Also adds test.sh — a small bash smoke-test helper that POSTs a sample text to either /MarkAccent/ or /MarkFurigana/ and pretty-prints the per-moji (surface|furigana|accent_marking_type) rows. STREAM=1 switches to the streaming endpoint, ENDPOINT= picks which router. Useful while iterating on overrides; not wired into CI. .gitignore adds data/ and output/ for ad-hoc test fixtures we don't want committed. Refs #47.

github-actions · 2026-05-27T10:52:23Z

🛡️ PR Quality Check Summary

✅ PR Title: Passed (Length: 68/75, Format: OK). feat(accent): reading override layer + streaming MarkAccent endpoint
✅ Branch Name: Follows naming convention (feat/reading-overrides)
❌ Commit Messages: 1 of 3 commit(s) failed validation
✅ Conflicts: No merge conflict markers found
✅ Python Quality: All checks passed.

📋 Click for detailed commit validation report

Expected format: `type(scope): description` (max 75 chars)
Valid types: build|chore|ci|docs|feat|fix|hotfix|perf|refactor|revert|style|test

Failed commits:
- [`c84bc65`] `refactor(accent): replace greedy aligner with Needleman-Wunsch DP + rendaku fold`
  ↳ Title is too long (is **80** chars, max is **75**)

⚠️ Please fix the failing checks (❌) before merging.

torrid-fish · 2026-05-27T11:09:34Z

Closing in favour of #51, which now targets main directly. The reading-override layer, Needleman-Wunsch DP aligner, and /MarkAccent/stream/ endpoint from this branch are all included in #51 as the foundation of the local fugashi + UniDic migration — the Yahoo-backed intermediate is a stepping stone and won't ship independently, so it's folded into the single migration PR rather than merged separately.

The greedy aligner had two failure modes that cascaded across whole sentences: a numeric anchor that over-consumed when Yahoo and OJAD disagreed on phrase boundary, and a +1 fallback path that turned a single mismatch into type-0 fallback for every downstream token. Replaces it with a global DP over (yahoo_token, ojad_entry) pairs: each Yahoo token consumes k ∈ [0, K_MAX] contiguous OJAD entries, with per-token cost computed via shape (punct/numeric/kana) and edit distance over rendaku-folded strings for kana tokens. Sub cost (0.4) is lower than ins/del (1.0) so the DP prefers same-length spans with substitutions over shorter spans with deletions — fixes the case where OJAD's `う` from `等→とう` leaked onto the next token. Adds a voicing-fold table so Yahoo's dictionary-form readings (ふんかん) align against OJAD's pronounced readings with rendaku (ぷんかん). All comparisons under this fold; ぱ/ば/ぷ/ぶ all alias to は/ふ. Refs #47.

Add api/accent/reading_overrides.py — a context-blind correction layer sitting between Yahoo Furigana and OJAD alignment. Each override is a regex on the concatenated surface text plus the replacement tokens that should appear instead. Covers: - 曜日 brackets: (月)/（月）→ げつ, (土) → ど, etc. for all 7 weekdays. - All 31 day-of-month readings: 1日 → ついたち (atamadaka), 5日 → いつか, 14日 → じゅうよっか, 20日 → はつか, etc. - N日間 durations 1-31: 1日間 → いちにちかん (NOT ついたちかん since the 1st-of-month reading is impossible for a duration), 7日間 → しちにちかん (modern technical writing preference over なのかかん). - 20歳 / 二十歳 / 20才 → はたち (the only irregular age reading). Patterns accept arabic / full-width / kanji numeral variants of the same N so `3月5日(土)` / `３月５日（土）` / `三月五日（土）` all trigger the same overrides. Order-of-overrides matters: duration list precedes date list so `N日間` wins over `N日` at the same start (longer match breaks ties in _collect_matches). apply_furigana_overrides runs BEFORE align_accent so merged spans like `5日→いつか` reach OJAD as a single token whose furigana matches OJAD's phrase reading (the numeric-anchor logic in align_accent otherwise cascades-fails because numeric tokens lack any Yahoo furigana). apply_accent_overrides runs AFTER align to re-stamp both furigana and accent on the same matched spans, so the response is consistent. Adds URL preprocessing: each https?:// is swapped for the placeholder "URLPLACEHOLDER" before the pipeline runs (Yahoo fragments URLs across several alphabet tokens; OJAD's phrasing scraper produces noise for Latin punctuation runs — both drag alignment off-rail). Placeholders are walked back to the originals in order after alignment. URL body stops at whitespace, any Japanese char, or `,()<>[]"'` so embedded URLs strip cleanly. Adds a non-Japanese short-circuit: if (after URL stripping) the chunk contains no hiragana / katakana / CJK ideograph, skip Yahoo + OJAD entirely and echo the chunk back as a single token. Lets pure-URL / pure-English lines stream through cheaply. Also adds stream_accent_chunks() to pipeline.py as a helper used by the streaming endpoint added in the next commit. Splits the input on \n then on full-width sentence terminators (。！？．) — long paragraphs degrade OJAD's phrasing predictor and parallelising across sentences caps the latency. In-flight work is bounded by a semaphore (concurrency=4) because OJAD's u-tokyo backend falls over with 30+ parallel scrapes. main.py docstring updated to reflect /MarkAccent/stream/. Refs #47.

Add a streaming variant of /MarkAccent/ that processes the input as a sequence of (line, sentence) chunks and emits one NDJSON object per chunk in input order. Each line carries `{"chunk": line_idx, "subchunk": sub_idx, ...AccentResponse}` so clients can render output incrementally while keeping document position. Underlying chunk-fanout and concurrency limiting live in pipeline.stream_accent_chunks; the route is a thin StreamingResponse wrapper. Streaming benefits compound: OJAD's phrasing predictor degrades on long inputs (a single misaligned mora cascades across the paragraph), so per-sentence chunks both stay short enough for OJAD to handle and fan out under the bounded semaphore. Also adds test.sh — a small bash smoke-test helper that POSTs a sample text to either /MarkAccent/ or /MarkFurigana/ and pretty-prints the per-moji (surface|furigana|accent_marking_type) rows. STREAM=1 switches to the streaming endpoint, ENDPOINT= picks which router. Useful while iterating on overrides; not wired into CI. .gitignore adds data/ and output/ for ad-hoc test fixtures we don't want committed. Refs #47.

torrid-fish force-pushed the feat/reading-overrides branch from b87ebe9 to 801aff4 Compare May 21, 2026 06:41

torrid-fish changed the base branch from feat/docker-compose to refactor/accent-package May 21, 2026 06:42

Base automatically changed from refactor/accent-package to main May 27, 2026 10:40

torrid-fish added 3 commits May 27, 2026 18:49

torrid-fish force-pushed the feat/reading-overrides branch from 801aff4 to c9b321d Compare May 27, 2026 10:51

torrid-fish closed this May 27, 2026

torrid-fish mentioned this pull request May 27, 2026

feat(accent): local UniDic + POS-driven patches #53

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(accent): reading override layer + streaming MarkAccent endpoint#47

feat(accent): reading override layer + streaming MarkAccent endpoint#47
torrid-fish wants to merge 3 commits into
mainfrom
feat/reading-overrides

torrid-fish commented May 20, 2026 •

edited

Loading

Uh oh!

torrid-fish commented May 21, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

torrid-fish commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

torrid-fish commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

目的

Rebase 更新 (2026-05-21)

方法／實作說明

1. Reading override 層（now api/accent/reading_overrides.py）

2. 數字變體 helper

3. 對齊演算法升級（now api/accent/align.py）

4. 串流 endpoint /api/MarkAccent/stream/

5. furigana → accent 合併

6. 開發工具

Diff 範圍

關聯

附註

Uh oh!

torrid-fish commented May 21, 2026

Uh oh!

github-actions Bot commented May 27, 2026

🛡️ PR Quality Check Summary

Uh oh!

torrid-fish commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

torrid-fish commented May 20, 2026 •

edited

Loading

1. Reading override 層（now `api/accent/reading_overrides.py`）

3. 對齊演算法升級（now `api/accent/align.py`）

4. 串流 endpoint `/api/MarkAccent/stream/`