A linguistic linter for Traditional Chinese (zh-TW) that enforces Taiwan Ministry of Education (MoE) standards on vocabulary, punctuation, and character shapes. It plugs into AI coding assistants through the Model Context Protocol (MCP) and catches Mainland Chinese (zh-CN) regional drift before it reaches the user.
The tool enforces three official Taiwan standards:
- Revised Handbook of Punctuation (《重訂標點符號手冊》修訂版) -- punctuation marks
- Standard Form of National Characters (《國字標準字體》) -- character shapes
- Cross-strait vocabulary normalization, grounded in OpenCC's TWPhrases/TWVariants datasets -- word choices
Over 1000 vocabulary rules and 15 casing rules are compiled into the binary. For ambiguous terms, the server asks the AI assistant it runs inside for help deciding -- no extra API keys required.
In the late Qing dynasty, scholars had to express Western concepts in a writing system with no native vocabulary for them. Whether coining new words or importing translations via Japanese (和製漢語), they assembled a literary system under enormous time pressure. Many translated terms were inconsistent, ambiguous, or contradictory. The Chinese-speaking world has lived with these deficiencies for over a century.
The PRC simplification effort reduced not just stroke counts but vocabulary precision. Terms that should vary by domain got flattened into single catch-all translations. Many PRC translations were coined hastily: if a term worked in one context, it spread uncritically to others.
AI language models learn from web text where Simplified Chinese vastly outweighs Traditional Chinese (roughly 2.6:1 in CC-100). Major datasets like CulturaX do not even track Traditional Chinese separately. A FAccT 2025 study confirmed that most models favor zh-CN terminology when asked to write zh-TW. The output looks plausible but is not how people in Taiwan actually write.
This goes beyond character conversion. The same word often means different things across the strait:
| English | zh-CN | zh-TW | Why it matters |
|---|---|---|---|
| concurrency | 並發 | 並行 | In zh-CN, 並行 means "parallel" -- a different concept entirely |
| parallel | 並行 | 平行 | zh-CN 並行 = "parallel"; in Taiwan, 並行 = "concurrent" |
| process (OS) | 進程 | 行程 | 進程 in Taiwan means "progress," not an OS process |
| file / document | 文件 / 文檔 | 檔案 / 文件 | 文件 in China = "file"; in Taiwan = "document" |
| render | 渲染 | 算繪 | 渲染 in Taiwan = "exaggerate" (a painting technique) |
| traverse | 遍歷 | 走訪 | 遍歷 in Taiwan is reserved for Ergodic theory (遍歷理論) |
Automatically check and correct zh-TW text produced by AI, catching cross-strait terminology leaks:
- Half-width punctuation (
,.:) that should be full-width (,。:) - Mainland-style
""curly quotes replaced with Taiwan-style「」corner brackets - Missing or extra CJK-Latin/digit spacing
- Mainland vocabulary -- 軟件→軟體, 內存→記憶體, 默認→預設, etc.
- Non-standard character variants -- 裏→裡, 着→著 per MoE standard forms
- Politically colored terms -- 祖國, 內地
- Casing -- JavaScript, GitHub, macOS
Three profiles (default, strict_moe, ui_strings) control which rules apply. See docs/rules.md for the full rule reference.
This project follows BCP 47. The region subtag comes from ISO 3166-1 alpha-2, where "region" can denote a sovereign state, territory, or economic area -- not necessarily a "country."
zh-CN: Chinese as written in the CN region (Simplified)zh-TW: Chinese as written in the TW region (Traditional)
Throughout the codebase, cn and tw denote regional writing conventions, not a political statement.
Requires stable Rust 1.91+.
makeThe binary is at target/release/zhtw-mcp.
The quickest way to build, install to ~/.local/bin, and register with Claude Code:
make install # build release, install binary, register MCP server
make uninstall # remove binary and MCP registration
make status # check binary, process, and registration stateFor manual setup or other MCP clients:
# Claude Code
claude mcp add zhtw-mcp -- /path/to/zhtw-mcp
# OpenCode
opencode mcp add zhtw-mcp /path/to/zhtw-mcpCodex CLI or other MCP clients -- add to .mcp.json in your project root:
{
"mcpServers": {
"zhtw-mcp": {
"command": "/path/to/zhtw-mcp",
"args": []
}
}
}Replace /path/to/zhtw-mcp with the actual binary path (e.g., target/release/zhtw-mcp).
zhtw-mcp lint README.md # lint a file
zhtw-mcp lint file.md --fix # auto-fix in place
zhtw-mcp lint file.md --fix --dry-run # preview fixesSee docs/cli.md for the full CLI reference and docs/mcp.md for MCP tool/resource/prompt details.
When running as an MCP server, you interact through natural language. The assistant translates your intent into zhtw tool calls.
| Intent | Say | Result |
|---|---|---|
| Lint text | "Check this paragraph for mainland terms" | Returns issues with location and suggestions |
| Auto-fix | "Fix the zh-TW issues in this document" | Safe fixes applied, corrected text returned |
| Quality gate | "Reject if more than 3 zh-TW errors" | Accept/reject verdict via max_errors |
| Strict mode | "Check this with strict MoE rules" | Enables variant and full punctuation enforcement |
| Markdown-aware | "Lint this markdown, skip code blocks" | Excludes fenced code and HTML |
The server also exposes two read-only resources for assistants to consult: zh-tw://style-guide/moe (MoE standards) and zh-tw://dictionary/ambiguous (cross-strait term disambiguation). See docs/mcp.md for the full prompt catalog.
- docs/cli.md -- full CLI reference, config files, CI/CD integration, S2T conversion
- docs/mcp.md -- MCP tool parameters, resources, prompts, sampling, usage examples
- docs/internals.md -- processing pipeline, script detection, design decisions, testing
- docs/rules.md -- rule type reference, extending the ruleset, runtime overrides
zhtw-mcp is available under a permissive MIT-style license.
Use of this source code is governed by a MIT license that can be found in the LICENSE file.