refactor(accent): split into api/accent/ package (no behavior change)#52
Conversation
Pure structural split of the monolithic api/accent_marker.py (493L) +
api/furigana_marker.py (152L) into a focused package layout. Zero
behavior change — verified byte-identical /MarkAccent/ and
/MarkFurigana/ responses before and after.
Layout:
api/accent/
__init__.py re-exports accent_router, furigana_router
README.md pipeline + accent_marking_type semantics
models.py Request, ErrorInfo, WordResult, WordAccentResult,
AccentInfo, FuriganaResponse, AccentResponse
furigana.py Yahoo Furigana HTTP client (data layer)
ojad.py OJAD scrape (HTTP + BS4 parse)
align.py align_accent + numeric_pattern + punctuation_marks
+ skip_marks + clean_query + is_kana_or_kanji
pipeline.py MarkAccent orchestrator (extracted from inline
mark_accent handler)
routes.py FastAPI routers + thin endpoint handlers
main.py: drop `accent_marker` and `furigana_marker` from the
`from api import ...` line; add `from api.accent import accent_router,
furigana_router`; rename the two `include_router` calls accordingly.
README.md documents the data flow (mermaid), AccentInfo
accent_marking_type semantics (0=LOW, 1=HIGH, 2=FALL), heiban /
atamadaka / nakadaka / odaka detection rules, file responsibilities,
and the alignment algorithm. Forms the foundation for #47/#51 to
extend with override / patch / postprocess layers.
Verification:
uv run ruff check api/accent/ main.py # all passed
uv run mypy api/accent/ main.py # 8 files, no issues
POST /api/MarkAccent/ → byte-identical to pre-refactor
POST /api/MarkFurigana/ → byte-identical to pre-refactor
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure whitespace — single-line collapse where the joined signature / call fits within ruff's 88-char limit. No semantic change; mypy + ruff check still pass; runtime behavior unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the no-behavior-change refactor based on Copilot's review on PR #52. Active findings: - pipeline.py: rename unused `ojad_surface` to `_ojad_surface` so F841 catches it if Ruff's unused-binding rule ever lands; the OJAD echo string isn't consumed here. - furigana.py: wrap `response.json()` in try/except. The docstring promised malformed payloads would surface via the FuriganaResponse envelope, but an invalid Content-Type / non-JSON body would have raised through. Catch ValueError and return a 500 envelope. - ojad.py: switch `raise e` -> bare `raise` and `logger.error(f"...")` -> `logger.exception(...)` to preserve the original traceback. - models.py: "describe" -> "describes" (x3 occurrences); "givent" -> "given". Low-confidence findings also addressed: - align.py module docstring used to claim `punctuation_marks`, `skip_marks`, and `clean_query` are consumed by alignment. They aren't — they're carried over from the pre-refactor module for the downstream PR #47 to use. Reword to reflect that. - clean_query docstring overclaimed punctuation stripping; it only filters ASCII letters. Reword + rename the local comprehension var from `chr` to `char` to stop shadowing the builtin. Verified: uv run ruff check api/accent/ main.py # all passed uv run ruff format --check api/accent/ main.py # 8/8 formatted uv run mypy api/accent/ main.py # 8 files, no issues POST /api/MarkAccent/ + /MarkFurigana/ routes still register Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups from Copilot's follow-up review on the fix commit: - README mermaid diagram labels `align_accent` as "DP alignment", but the surrounding prose (lines 99/107) correctly calls it "single-pass greedy". DP is a future PR (#47). Rename the node to "Greedy alignment" so the diagram matches the implementation. - `align.py:122` had a comment typo "regard as punchutation" → "regarded as punctuation". No code changes; docstring / comment / diagram only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wade00754
left a comment
There was a problem hiding this comment.
LGTM. But I have identified some additional issues in align.py and will review whether they still exist in future PRs.
|
@copilot resolve the merge conflicts in this pull request |
Head branch was pushed to by a user without write access
Resolved by merging |
0a403cd to
fa473c4
Compare
🛡️ PR Quality Check Summary✅ PR Title: Passed (Length: 69/75, Format: OK). 🎉 All checks passed! |
目的
把
api/accent_marker.py(493L) +api/furigana_marker.py(152L) 兩個 monolithic 檔案拆成api/accent/package,並加上README.md把 marking pipeline +accent_marking_type約定寫清楚。純結構性 refactor,正常路徑 JSON 回應與 refactor 前 byte-identical。
Layout
依賴方向(無循環):
main.py 變更
剩下的 main.py 改動是 0 — auth、middleware、其他 router 都原封不動。跟 PR #46 在 main.py 的改動(移除 auth/security 區塊)在不同行,rebase 無衝突。
命名說明
兩個 endpoint 共用
{status, result, error}envelope shape,只是resultelement type 不同。為了 FastAPIresponse_modeltyping 明確,仍保留兩個 Response class:FuriganaResponse(result: list[WordResult] | None)AccentResponse(result: list[WordAccentResult] | None)JSON 輸出 byte-identical with 原本兩個 module 各自定義的 `Response` class。
README 內容
包含:
WordAccentResult/AccentInfo欄位與accent_marking_type0/1/2 對應 OJAD CSS class 的對照表Verification
下游 PR 影響
feat/docker-compose)feat/reading-overrides)api/accent/<新檔>,新增reading_overrides.py放進 packagespike/local-unidic)postprocess.pyOut of scope
align_accent內部演算法(後續 PR 會升級為 Needleman-Wunsch DP)🤖 Generated with Claude Code