Skip to content

christopherball/japaneseDeinflector

Repository files navigation

Japanese Deinflector

Japanese Deinflector is a static, browser-based Japanese morphology explorer. It is built around a reverse deinflection engine that walks an inflected surface form back through explicit intermediate steps instead of jumping straight from the surface to a dictionary entry.

The app is designed to be explainable, offline-friendly, and easy to deploy as a plain static site.

What It Does

  • Analyzes inflected Japanese forms and short fixed-pattern phrases
  • Shows explicit intermediate chain steps such as 食べる → 食べさせる → 食べさせられる
  • Renders entry cards with headword, reading, alternate spellings, grammar line, and sense list
  • Uses learner-facing tooltip copy for inflection nodes
  • Highlights common parses with a star badge
  • Ships with an offline JMdict-derived lexicon plus a small morphology supplement
  • Works entirely as a static front-end once the built files are served

Example chain:

やめる → やめて → やめておく → やめとく → やめとけ

Tech Stack

  • React 18
  • TypeScript
  • Vite
  • Custom reverse morphology engine
  • Vitest + Testing Library
  • JMdict / EDRDG-derived lexicon data

Local Development

Install dependencies:

npm install

Start the dev server:

npm run dev

Type-check:

npm run check

Run tests:

npm run test:run

Build the production bundle:

npm run build

Preview the built static site locally:

npm run preview

Build Output And dist/

npm run build produces a fully static site in dist/.

Typical output includes:

  • dist/index.html
  • dist/assets/index-*.js
  • dist/assets/index-*.css
  • dist/data/lexicon.generated.json

Important notes:

  • dist/ is generated output, not source.
  • dist/ is ignored by .gitignore in this repository.
  • You generally should not hand-edit anything inside dist/.
  • For source control, commit the source files plus public/data/lexicon.generated.json, not the generated dist/ directory.
  • For deployment to a generic static host, upload the contents of dist/.

Vite is configured with:

  • base: '/' during vite serve
  • base: './' during vite build

That means the built app uses relative asset URLs, so serving dist/ directly works both for local static servers and when the bundle is hosted from a subdirectory.

Lexicon Data

The checked-in runtime lexicon source lives at:

public/data/lexicon.generated.json

During the production build, Vite copies that file into:

dist/data/lexicon.generated.json

The project also ships a small supplement in:

src/data/lexiconSupplement.ts

This supplement covers morphology-critical helper entries such as:

  • ない
  • いる
  • おく
  • そう
  • そうだ
  • みたいだ

Regenerating The Lexicon

Regenerate the bundled lexicon from a local JMdict snapshot with:

npm run build:lexicon -- /path/to/JMdict_e.gz

If no path is provided, the script defaults to:

/tmp/JMdict_e.gz

JMdict / EDRDG references:

Current Built-In Examples

The app currently ships these example searches:

  • やめとけ
  • 食べさせられなかった
  • 行かされていた
  • やめさせられちゃった
  • 見られなくなってきた
  • 行っておかなきゃ
  • 読んどいてくれ
  • 行けなくはない
  • 食べないではいられない
  • 行かざるを得ない
  • 見ないわけにはいかない
  • 食べられなくもない
  • しなければならなかった
  • させられっぱなし
  • 行っとけばよかった
  • ではありませんでした
  • じゃなくて
  • 高くなきゃいけない
  • 食べさせといて
  • 見ていられない

Supported Coverage

The current rule set covers a fairly broad slice of common learner-facing morphology, including:

  • Ichidan, godan, する, 来る, and 行く behavior where special handling is needed
  • Core verb inflections such as negative, past, polite, imperative, volitional, potential, passive, causative, and desiderative
  • Conjunctive chaining through て / で
  • Auxiliary chains such as いる, おく, しまう, くる, and なる
  • Common contractions such as てる, とく, といて, ちゃう, じゃ, and なきゃ
  • I-adjective connective and obligation-building patterns
  • Copula chains such as , です, でした, では, ではありません, ではありませんでした, じゃない, and じゃなくて
  • Fixed patterns such as:
    • obligation
    • internal compulsion
    • external compulsion
    • soft double negation
    • regret / hindsight via conditional よい plus past
    • request
    • hearsay
    • seeming / appearance
    • っぱなし

The project intentionally prefers transparent intermediate analysis over maximal compression.

Architecture

Key files:

  • src/morphology/rules.ts Declarative reverse rules for inflection, auxiliaries, fixed patterns, and contractions.
  • src/morphology/analysis.ts Reverse search, path ranking, analyzable chunk selection, and graph construction.
  • src/morphology/tooltips.ts Tooltip titles and learner-facing explanation copy for node labels.
  • src/morphology/pathLabels.ts UI-facing label formatting for inflection edges.
  • src/morphology/examples.ts Built-in example list and expected example paths.
  • src/morphology/lexicon.ts Runtime lexicon loading and indexing.
  • src/components/GraphView.tsx Result grouping, entry-card rendering, inflection-chain rendering, and tooltip behavior.
  • src/app/App.tsx Top-level app shell, search state, example picker, and lexicon loading.
  • src/styles.css App styling, responsive layout, tooltip presentation, and mobile behavior.
  • scripts/buildLexicon.ts JMdict-to-runtime JSON conversion script.

Testing

The test suite currently covers:

  • engine-level deinflection path expectations
  • invalid chain prevention
  • regression cases for impossible stacks
  • UI rendering for paths, labels, tooltips, examples, and empty states

Main test files:

  • src/morphology/__tests__/engine.test.ts
  • src/app/App.test.tsx

Recommended Source-Control Contents

Before pushing, the important files to keep are:

  • source under src/
  • public/data/lexicon.generated.json
  • scripts/buildLexicon.ts
  • package.json
  • package-lock.json
  • vite.config.ts
  • tsconfig.json
  • README.md

Generated files you typically should not commit:

  • dist/
  • node_modules/
  • coverage or cache directories

Current Limitations

The analyzer is intentionally conservative. Known limitations include:

  • honorific and humble systems are not modeled comprehensively
  • sentence segmentation remains lightweight and focuses on the analyzable chunk
  • dialect-heavy colloquials are only partially covered
  • coverage is strongest for common educational morphology, not every literary or highly idiomatic construction
  • some ambiguous surfaces still legitimately produce multiple plausible paths

Adding Or Adjusting Rules

When extending the project:

  1. update src/morphology/rules.ts
  2. add or update tests in src/morphology/__tests__/engine.test.ts
  3. update tooltip copy in src/morphology/tooltips.ts if a new learner-facing node appears
  4. update src/morphology/pathLabels.ts if the UI label should differ from the internal rule label
  5. update src/morphology/examples.ts if the new behavior deserves a built-in showcase or regression example
  6. regenerate public/data/lexicon.generated.json only if the lexicon itself changed

Rule-authoring guidelines:

  • Prefer explicit intermediate steps over shortcut analyses.
  • Model contractions as their own edges when that improves teachability.
  • Avoid impossible morphological stacks even if they produce superficially neat paths.
  • Keep tooltip copy independently instructive and learner-friendly.

About

Japanese Deinflector is a static, browser-based Japanese morphology explorer. It is built around a reverse deinflection engine that walks an inflected surface form back through explicit intermediate steps instead of jumping straight from the surface to a dictionary entry.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors