Skip to content

refactor(accent): split into api/accent/ package (no behavior change)#52

Merged
torrid-fish merged 5 commits into
mainfrom
refactor/accent-package
May 27, 2026
Merged

refactor(accent): split into api/accent/ package (no behavior change)#52
torrid-fish merged 5 commits into
mainfrom
refactor/accent-package

Conversation

@torrid-fish
Copy link
Copy Markdown
Member

@torrid-fish torrid-fish commented May 21, 2026

目的

api/accent_marker.py (493L) + api/furigana_marker.py (152L) 兩個 monolithic 檔案拆成 api/accent/ package,並加上 README.md 把 marking pipeline + accent_marking_type 約定寫清楚。

純結構性 refactor,正常路徑 JSON 回應與 refactor 前 byte-identical。

Asterisk on "0 behavior change":為了配合 Copilot review 反饋,api/accent/furigana.pyresponse.json() 外多包了一層 try/except ValueError,把「Yahoo 回傳 invalid JSON」這條冷門 error path 從 unhandled exception 改成回 FuriganaResponse(status=500, error=...) envelope。pre-refactor 的 furigana_marker.py 在這條 path 上會直接 raise。其它所有 endpoint behavior 完全 byte-identical。

Base = main,跟 PR #46 (feat/docker-compose) 是 sibling,merge 順序無關。下游 #47 (feat/reading-overrides) 與 #51 (spike/local-unidic) merge 之後再 rebase。

Layout

api/accent/
├── __init__.py     # re-exports accent_router, furigana_router (供 main.py 使用)
├── README.md       # marking pipeline + AccentInfo semantics
├── models.py       # Pydantic schemas (兩 endpoint 共用)
├── furigana.py     # Yahoo Furigana HTTP client (data layer)
├── ojad.py         # OJAD scrape (HTTP + BeautifulSoup parse)
├── align.py        # align_accent + 相關 constants & helpers
├── pipeline.py     # MarkAccent orchestrator (從 mark_accent handler 抽出)
└── routes.py       # FastAPI routers + thin endpoint handlers

依賴方向(無循環):

flowchart LR
  routes[routes.py]
  pipeline[pipeline.py]
  align[align.py]
  furigana[furigana.py]
  ojad[ojad.py]
  models[models.py]

  routes --> pipeline
  routes --> furigana
  pipeline --> align
  pipeline --> furigana
  pipeline --> ojad
  align -.-> models
  furigana -.-> models
  ojad -.-> models
  pipeline -.-> models
  routes -.-> models
Loading

main.py 變更

- from api import accent_marker, dict_query, furigana_marker, sentence_query, usage_query
+ from api import dict_query, sentence_query, usage_query
+ from api.accent import accent_router, furigana_router

  app.include_router(
-     accent_marker.router, prefix=\"/api\", dependencies=[Depends(get_api_key)]
+     accent_router, prefix=\"/api\", dependencies=[Depends(get_api_key)]
  )
  app.include_router(
-     furigana_marker.router, prefix=\"/api\", dependencies=[Depends(get_api_key)]
+     furigana_router, prefix=\"/api\", dependencies=[Depends(get_api_key)]
  )

剩下的 main.py 改動是 0 — auth、middleware、其他 router 都原封不動。跟 PR #46 在 main.py 的改動(移除 auth/security 區塊)在不同行,rebase 無衝突。

命名說明

兩個 endpoint 共用 {status, result, error} envelope shape,只是 result element type 不同。為了 FastAPI response_model typing 明確,仍保留兩個 Response class:

  • FuriganaResponse (result: list[WordResult] | None)
  • AccentResponse (result: list[WordAccentResult] | None)

JSON 輸出 byte-identical with 原本兩個 module 各自定義的 `Response` class。

README 內容

包含:

Verification

uv run ruff check api/accent/ main.py        # ✓ All checks passed
uv run mypy api/accent/ main.py              # ✓ no issues in 8 files

# byte-identical output check (input: 「3月5日(土)に学校で会いましょう。今日は寒いですね。」)
diff baseline_accent.json   refactor_accent.json    # ✓ no diff
diff baseline_furigana.json refactor_furigana.json  # ✓ no diff

下游 PR 影響

PR 行動
#46 (feat/docker-compose) 對 post-refactor main rebase。main.py 改動在不同區塊,預期 auto-merge
#47 (feat/reading-overrides) 重做:regex override 層的改動 map 到 api/accent/<新檔>,新增 reading_overrides.py 放進 package
#51 (spike/local-unidic) 重做:fugashi swap + POS patches + 一系列 polish 改動 map 到新 layout;postprocess passes 拆到新檔 postprocess.py

Out of scope

  • 改 endpoint API contract / response schema
  • 重寫 align_accent 內部演算法(後續 PR 會升級為 Needleman-Wunsch DP)
  • 加 unit test (main 沒有 test 基礎,留給後續)
  • 把舊 module 留成 backward-compat shim(main.py 是唯一 Python consumer,直接更新)

🤖 Generated with Claude Code

torrid-fish and others added 2 commits May 21, 2026 12:59
Pure structural split of the monolithic api/accent_marker.py (493L) +
api/furigana_marker.py (152L) into a focused package layout. Zero
behavior change — verified byte-identical /MarkAccent/ and
/MarkFurigana/ responses before and after.

Layout:
  api/accent/
    __init__.py     re-exports accent_router, furigana_router
    README.md       pipeline + accent_marking_type semantics
    models.py       Request, ErrorInfo, WordResult, WordAccentResult,
                    AccentInfo, FuriganaResponse, AccentResponse
    furigana.py     Yahoo Furigana HTTP client (data layer)
    ojad.py         OJAD scrape (HTTP + BS4 parse)
    align.py        align_accent + numeric_pattern + punctuation_marks
                    + skip_marks + clean_query + is_kana_or_kanji
    pipeline.py     MarkAccent orchestrator (extracted from inline
                    mark_accent handler)
    routes.py       FastAPI routers + thin endpoint handlers

main.py: drop `accent_marker` and `furigana_marker` from the
`from api import ...` line; add `from api.accent import accent_router,
furigana_router`; rename the two `include_router` calls accordingly.

README.md documents the data flow (mermaid), AccentInfo
accent_marking_type semantics (0=LOW, 1=HIGH, 2=FALL), heiban /
atamadaka / nakadaka / odaka detection rules, file responsibilities,
and the alignment algorithm. Forms the foundation for #47/#51 to
extend with override / patch / postprocess layers.

Verification:
  uv run ruff check api/accent/ main.py        # all passed
  uv run mypy api/accent/ main.py              # 8 files, no issues
  POST /api/MarkAccent/  → byte-identical to pre-refactor
  POST /api/MarkFurigana/ → byte-identical to pre-refactor

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure whitespace — single-line collapse where the joined signature /
call fits within ruff's 88-char limit. No semantic change; mypy + ruff
check still pass; runtime behavior unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This comment was marked as resolved.

Tighten the no-behavior-change refactor based on Copilot's review on
PR #52.

Active findings:
- pipeline.py: rename unused `ojad_surface` to `_ojad_surface` so F841
  catches it if Ruff's unused-binding rule ever lands; the OJAD echo
  string isn't consumed here.
- furigana.py: wrap `response.json()` in try/except. The docstring
  promised malformed payloads would surface via the FuriganaResponse
  envelope, but an invalid Content-Type / non-JSON body would have
  raised through. Catch ValueError and return a 500 envelope.
- ojad.py: switch `raise e` -> bare `raise` and `logger.error(f"...")`
  -> `logger.exception(...)` to preserve the original traceback.
- models.py: "describe" -> "describes" (x3 occurrences); "givent"
  -> "given".

Low-confidence findings also addressed:
- align.py module docstring used to claim `punctuation_marks`,
  `skip_marks`, and `clean_query` are consumed by alignment. They
  aren't — they're carried over from the pre-refactor module for the
  downstream PR #47 to use. Reword to reflect that.
- clean_query docstring overclaimed punctuation stripping; it only
  filters ASCII letters. Reword + rename the local comprehension var
  from `chr` to `char` to stop shadowing the builtin.

Verified:
  uv run ruff check api/accent/ main.py        # all passed
  uv run ruff format --check api/accent/ main.py  # 8/8 formatted
  uv run mypy api/accent/ main.py              # 8 files, no issues
  POST /api/MarkAccent/ + /MarkFurigana/ routes still register

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This comment was marked as resolved.

Two cleanups from Copilot's follow-up review on the fix commit:

- README mermaid diagram labels `align_accent` as "DP alignment", but
  the surrounding prose (lines 99/107) correctly calls it "single-pass
  greedy". DP is a future PR (#47). Rename the node to "Greedy
  alignment" so the diagram matches the implementation.
- `align.py:122` had a comment typo "regard as punchutation" →
  "regarded as punctuation".

No code changes; docstring / comment / diagram only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sessatakuma sessatakuma deleted a comment from chatgpt-codex-connector Bot May 21, 2026
wade00754
wade00754 previously approved these changes May 22, 2026
Copy link
Copy Markdown
Member

@wade00754 wade00754 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I have identified some additional issues in align.py and will review whether they still exist in future PRs.

@torrid-fish
Copy link
Copy Markdown
Member Author

@copilot resolve the merge conflicts in this pull request

auto-merge was automatically disabled May 27, 2026 09:18

Head branch was pushed to by a user without write access

Copy link
Copy Markdown
Contributor

Copilot AI commented May 27, 2026

@copilot resolve the merge conflicts in this pull request

Resolved by merging origin/main and fixing the main.py conflict to keep the api.accent router refactor while aligning with current main behavior. Included in commit 0a403cd.

@torrid-fish torrid-fish force-pushed the refactor/accent-package branch from 0a403cd to fa473c4 Compare May 27, 2026 09:26
@github-actions
Copy link
Copy Markdown

🛡️ PR Quality Check Summary

PR Title: Passed (Length: 69/75, Format: OK). refactor(accent): split into api/accent/ package (no behavior change)
Branch Name: Follows naming convention (refactor/accent-package)
Commit Messages: All 5 commit(s) passed (Length, Format, Case)
Conflicts: No merge conflict markers found
Python Quality: All checks passed.


🎉 All checks passed!

@torrid-fish torrid-fish enabled auto-merge May 27, 2026 09:28
@torrid-fish torrid-fish added this pull request to the merge queue May 27, 2026
Merged via the queue into main with commit e652594 May 27, 2026
6 checks passed
@torrid-fish torrid-fish deleted the refactor/accent-package branch May 27, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants