refactor(accent): split into api/accent/ package (no behavior change) by torrid-fish · Pull Request #52 · sessatakuma/API-tools

torrid-fish · 2026-05-21T05:00:18Z

目的

把 api/accent_marker.py (493L) + api/furigana_marker.py (152L) 兩個 monolithic 檔案拆成 api/accent/ package，並加上 README.md 把 marking pipeline + accent_marking_type 約定寫清楚。

純結構性 refactor，正常路徑 JSON 回應與 refactor 前 byte-identical。

Asterisk on "0 behavior change"：為了配合 Copilot review 反饋，api/accent/furigana.py 在 response.json() 外多包了一層 try/except ValueError，把「Yahoo 回傳 invalid JSON」這條冷門 error path 從 unhandled exception 改成回 FuriganaResponse(status=500, error=...) envelope。pre-refactor 的 furigana_marker.py 在這條 path 上會直接 raise。其它所有 endpoint behavior 完全 byte-identical。

Base = main，跟 PR #46 (feat/docker-compose) 是 sibling，merge 順序無關。下游 #47 (feat/reading-overrides) 與 #51 (spike/local-unidic) merge 之後再 rebase。

Layout

api/accent/
├── __init__.py     # re-exports accent_router, furigana_router (供 main.py 使用)
├── README.md       # marking pipeline + AccentInfo semantics
├── models.py       # Pydantic schemas (兩 endpoint 共用)
├── furigana.py     # Yahoo Furigana HTTP client (data layer)
├── ojad.py         # OJAD scrape (HTTP + BeautifulSoup parse)
├── align.py        # align_accent + 相關 constants & helpers
├── pipeline.py     # MarkAccent orchestrator (從 mark_accent handler 抽出)
└── routes.py       # FastAPI routers + thin endpoint handlers

依賴方向（無循環）：

flowchart LR
  routes[routes.py]
  pipeline[pipeline.py]
  align[align.py]
  furigana[furigana.py]
  ojad[ojad.py]
  models[models.py]

  routes --> pipeline
  routes --> furigana
  pipeline --> align
  pipeline --> furigana
  pipeline --> ojad
  align -.-> models
  furigana -.-> models
  ojad -.-> models
  pipeline -.-> models
  routes -.-> models

main.py 變更

- from api import accent_marker, dict_query, furigana_marker, sentence_query, usage_query
+ from api import dict_query, sentence_query, usage_query
+ from api.accent import accent_router, furigana_router

  app.include_router(
-     accent_marker.router, prefix=\"/api\", dependencies=[Depends(get_api_key)]
+     accent_router, prefix=\"/api\", dependencies=[Depends(get_api_key)]
  )
  app.include_router(
-     furigana_marker.router, prefix=\"/api\", dependencies=[Depends(get_api_key)]
+     furigana_router, prefix=\"/api\", dependencies=[Depends(get_api_key)]
  )

剩下的 main.py 改動是 0 — auth、middleware、其他 router 都原封不動。跟 PR #46 在 main.py 的改動（移除 auth/security 區塊）在不同行，rebase 無衝突。

命名說明

兩個 endpoint 共用 {status, result, error} envelope shape，只是 result element type 不同。為了 FastAPI response_model typing 明確，仍保留兩個 Response class：

FuriganaResponse (result: list[WordResult] | None)
AccentResponse (result: list[WordAccentResult] | None)

JSON 輸出 byte-identical with 原本兩個 module 各自定義的 `Response` class。

README 內容

包含：

兩個 endpoint 的用途
Mermaid data flow diagram（text → Yahoo + OJAD → align → output）
WordAccentResult / AccentInfo 欄位與 accent_marking_type 0/1/2 對應 OJAD CSS class 的對照表
Accent shape 速查（平板 / 頭高 / 中高 / 尾高偵測條件）
各檔案責任分工 + 依賴方向
alignment 演算法概述（含 numeric / punct / kana 分支）
擴充指引（之後 feat(accent): reading override layer + streaming MarkAccent endpoint #47 的 override layer、feat(accent): local UniDic + POS-driven patches (closes #48, #50) #51 的 POS-driven patches 都會在此補章節）

Verification

uv run ruff check api/accent/ main.py        # ✓ All checks passed
uv run mypy api/accent/ main.py              # ✓ no issues in 8 files

# byte-identical output check (input: 「3月5日(土)に学校で会いましょう。今日は寒いですね。」)
diff baseline_accent.json   refactor_accent.json    # ✓ no diff
diff baseline_furigana.json refactor_furigana.json  # ✓ no diff

下游 PR 影響

PR	行動
#46 (`feat/docker-compose`)	對 post-refactor main rebase。main.py 改動在不同區塊，預期 auto-merge
#47 (`feat/reading-overrides`)	重做：regex override 層的改動 map 到 `api/accent/<新檔>`，新增 `reading_overrides.py` 放進 package
#51 (`spike/local-unidic`)	重做：fugashi swap + POS patches + 一系列 polish 改動 map 到新 layout；postprocess passes 拆到新檔 `postprocess.py`

Out of scope

改 endpoint API contract / response schema
重寫 align_accent 內部演算法（後續 PR 會升級為 Needleman-Wunsch DP）
加 unit test (main 沒有 test 基礎，留給後續)
把舊 module 留成 backward-compat shim（main.py 是唯一 Python consumer，直接更新）

🤖 Generated with Claude Code

Pure structural split of the monolithic api/accent_marker.py (493L) + api/furigana_marker.py (152L) into a focused package layout. Zero behavior change — verified byte-identical /MarkAccent/ and /MarkFurigana/ responses before and after. Layout: api/accent/ __init__.py re-exports accent_router, furigana_router README.md pipeline + accent_marking_type semantics models.py Request, ErrorInfo, WordResult, WordAccentResult, AccentInfo, FuriganaResponse, AccentResponse furigana.py Yahoo Furigana HTTP client (data layer) ojad.py OJAD scrape (HTTP + BS4 parse) align.py align_accent + numeric_pattern + punctuation_marks + skip_marks + clean_query + is_kana_or_kanji pipeline.py MarkAccent orchestrator (extracted from inline mark_accent handler) routes.py FastAPI routers + thin endpoint handlers main.py: drop `accent_marker` and `furigana_marker` from the `from api import ...` line; add `from api.accent import accent_router, furigana_router`; rename the two `include_router` calls accordingly. README.md documents the data flow (mermaid), AccentInfo accent_marking_type semantics (0=LOW, 1=HIGH, 2=FALL), heiban / atamadaka / nakadaka / odaka detection rules, file responsibilities, and the alignment algorithm. Forms the foundation for #47/#51 to extend with override / patch / postprocess layers. Verification: uv run ruff check api/accent/ main.py # all passed uv run mypy api/accent/ main.py # 8 files, no issues POST /api/MarkAccent/ → byte-identical to pre-refactor POST /api/MarkFurigana/ → byte-identical to pre-refactor Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pure whitespace — single-line collapse where the joined signature / call fits within ruff's 88-char limit. No semantic change; mypy + ruff check still pass; runtime behavior unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tighten the no-behavior-change refactor based on Copilot's review on PR #52. Active findings: - pipeline.py: rename unused `ojad_surface` to `_ojad_surface` so F841 catches it if Ruff's unused-binding rule ever lands; the OJAD echo string isn't consumed here. - furigana.py: wrap `response.json()` in try/except. The docstring promised malformed payloads would surface via the FuriganaResponse envelope, but an invalid Content-Type / non-JSON body would have raised through. Catch ValueError and return a 500 envelope. - ojad.py: switch `raise e` -> bare `raise` and `logger.error(f"...")` -> `logger.exception(...)` to preserve the original traceback. - models.py: "describe" -> "describes" (x3 occurrences); "givent" -> "given". Low-confidence findings also addressed: - align.py module docstring used to claim `punctuation_marks`, `skip_marks`, and `clean_query` are consumed by alignment. They aren't — they're carried over from the pre-refactor module for the downstream PR #47 to use. Reword to reflect that. - clean_query docstring overclaimed punctuation stripping; it only filters ASCII letters. Reword + rename the local comprehension var from `chr` to `char` to stop shadowing the builtin. Verified: uv run ruff check api/accent/ main.py # all passed uv run ruff format --check api/accent/ main.py # 8/8 formatted uv run mypy api/accent/ main.py # 8 files, no issues POST /api/MarkAccent/ + /MarkFurigana/ routes still register Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two cleanups from Copilot's follow-up review on the fix commit: - README mermaid diagram labels `align_accent` as "DP alignment", but the surrounding prose (lines 99/107) correctly calls it "single-pass greedy". DP is a future PR (#47). Rename the node to "Greedy alignment" so the diagram matches the implementation. - `align.py:122` had a comment typo "regard as punchutation" → "regarded as punctuation". No code changes; docstring / comment / diagram only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wade00754

LGTM. But I have identified some additional issues in align.py and will review whether they still exist in future PRs.

torrid-fish · 2026-05-27T09:12:24Z

@copilot resolve the merge conflicts in this pull request

Copilot · 2026-05-27T09:18:55Z

@copilot resolve the merge conflicts in this pull request

Resolved by merging origin/main and fixing the main.py conflict to keep the api.accent router refactor while aligning with current main behavior. Included in commit 0a403cd.

github-actions · 2026-05-27T09:26:44Z

🛡️ PR Quality Check Summary

✅ PR Title: Passed (Length: 69/75, Format: OK). refactor(accent): split into api/accent/ package (no behavior change)
✅ Branch Name: Follows naming convention (refactor/accent-package)
✅ Commit Messages: All 5 commit(s) passed (Length, Format, Case)
✅ Conflicts: No merge conflict markers found
✅ Python Quality: All checks passed.

🎉 All checks passed!

torrid-fish and others added 2 commits May 21, 2026 12:59

This was referenced May 21, 2026

feat(accent): reading override layer + streaming MarkAccent endpoint #47

Closed

feat(accent): local UniDic + POS-driven patches (closes #48, #50) #51

Closed

torrid-fish marked this pull request as ready for review May 21, 2026 20:00

torrid-fish requested review from minto1226 and wade00754 as code owners May 21, 2026 20:00

Copilot AI review requested due to automatic review settings May 21, 2026 20:00

Copilot started reviewing on behalf of torrid-fish May 21, 2026 20:00 View session

torrid-fish enabled auto-merge May 21, 2026 20:01

This comment was marked as resolved.

Sign in to view

torrid-fish requested a review from Copilot May 21, 2026 20:26

Copilot started reviewing on behalf of torrid-fish May 21, 2026 20:26 View session

This comment was marked as resolved.

Sign in to view

sessatakuma deleted a comment from chatgpt-codex-connector Bot May 21, 2026

wade00754 previously approved these changes May 22, 2026

View reviewed changes

Copilot started work on behalf of torrid-fish May 27, 2026 09:12 View session

auto-merge was automatically disabled May 27, 2026 09:18
Head branch was pushed to by a user without write access

Copilot AI dismissed wade00754’s stale review via 0a403cd May 27, 2026 09:18

Copilot finished work on behalf of torrid-fish May 27, 2026 09:19

fix: resolve conflicts with main by merging

fa473c4

torrid-fish force-pushed the refactor/accent-package branch from 0a403cd to fa473c4 Compare May 27, 2026 09:26

torrid-fish enabled auto-merge May 27, 2026 09:28

wade00754 approved these changes May 27, 2026

View reviewed changes

torrid-fish added this pull request to the merge queue May 27, 2026

Merged via the queue into main with commit e652594 May 27, 2026
6 checks passed

torrid-fish deleted the refactor/accent-package branch May 27, 2026 10:40

torrid-fish mentioned this pull request May 27, 2026

feat(accent): local UniDic + POS-driven patches #53

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(accent): split into api/accent/ package (no behavior change)#52

refactor(accent): split into api/accent/ package (no behavior change)#52
torrid-fish merged 5 commits into
mainfrom
refactor/accent-package

torrid-fish commented May 21, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

wade00754 left a comment

Uh oh!

torrid-fish commented May 27, 2026

Uh oh!

Copilot AI commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

torrid-fish commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

目的

Layout

main.py 變更

命名說明

README 內容

Verification

下游 PR 影響

Out of scope

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

wade00754 left a comment

Choose a reason for hiding this comment

Uh oh!

torrid-fish commented May 27, 2026

Uh oh!

Copilot AI commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

🛡️ PR Quality Check Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

torrid-fish commented May 21, 2026 •

edited

Loading