Speedup frequency retrieval for heavy workflows by ShanaryS · Pull Request #2305 · yomidevs/yomitan

ShanaryS · 2026-02-14T18:00:47Z

This is essentially a follow up to #2251 where batch tokenization was added along with passing some /termEntries fields. This is the same for frequencies allowing a single batched /tokenize call to return this information. The benchmarks are the same as lemmatize in the table of the previous PR.

The full changes are:

Returning the frequencies matching the headwordIndex along with the filtered headwords for /tokenize
Added TermHeadword.headwordIndex which tracks the TermDictionaryEntry.headwords array. TermHeadword.index was not updated in _removeUnusedHeadwords() unlike the headwordIndex for the related arrays such as TermFrequency
- The comment on TermHeadword.index suggest that it's intentional so I didn't consider it a bug
Added TermFrequency.frequencyMode which exposes the frequency dictionary type so that consumers can process accordingly
- This is cached to reduce db queries however it's using Summary.title as the key which may not be sufficient and needs dev input
Added optional FindTermsOptions.useAllFrequencyDictionaries so that we can process all frequency dictionaries even in simple mode when the request is through the /tokenize endpoint

Outstanding questions are the confirmation if Summary.title is sufficient as a cache key. Summary.revision seems like it should be added but I can't find a good way to get it.

Docs: yomidevs/yomitan-api#18

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3556c05efb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ext/js/language/translator.js

ShanaryS requested a review from a team as a code owner February 14, 2026 18:00

ShanaryS mentioned this pull request Feb 14, 2026

Expose frequencies to tokenize yomidevs/yomitan-api#18

Open

chatgpt-codex-connector bot reviewed Feb 14, 2026

View reviewed changes

ext/js/language/translator.js Show resolved Hide resolved

expose frequencies to tokenize

e9e4769

ShanaryS force-pushed the frequency-speedup branch from 3556c05 to e9e4769 Compare February 14, 2026 20:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup frequency retrieval for heavy workflows#2305

Speedup frequency retrieval for heavy workflows#2305
ShanaryS wants to merge 1 commit intoyomidevs:masterfrom
ShanaryS:frequency-speedup

ShanaryS commented Feb 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ShanaryS commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ShanaryS commented Feb 14, 2026 •

edited

Loading