Skip to content

Speedup frequency retrieval for heavy workflows#2305

Open
ShanaryS wants to merge 1 commit intoyomidevs:masterfrom
ShanaryS:frequency-speedup
Open

Speedup frequency retrieval for heavy workflows#2305
ShanaryS wants to merge 1 commit intoyomidevs:masterfrom
ShanaryS:frequency-speedup

Conversation

@ShanaryS
Copy link

@ShanaryS ShanaryS commented Feb 14, 2026

This is essentially a follow up to #2251 where batch tokenization was added along with passing some /termEntries fields. This is the same for frequencies allowing a single batched /tokenize call to return this information. The benchmarks are the same as lemmatize in the table of the previous PR.

The full changes are:

  • Returning the frequencies matching the headwordIndex along with the filtered headwords for /tokenize
  • Added TermHeadword.headwordIndex which tracks the TermDictionaryEntry.headwords array. TermHeadword.index was not updated in _removeUnusedHeadwords() unlike the headwordIndex for the related arrays such as TermFrequency
    • The comment on TermHeadword.index suggest that it's intentional so I didn't consider it a bug
  • Added TermFrequency.frequencyMode which exposes the frequency dictionary type so that consumers can process accordingly
    • This is cached to reduce db queries however it's using Summary.title as the key which may not be sufficient and needs dev input
  • Added optional FindTermsOptions.useAllFrequencyDictionaries so that we can process all frequency dictionaries even in simple mode when the request is through the /tokenize endpoint

Outstanding questions are the confirmation if Summary.title is sufficient as a cache key. Summary.revision seems like it should be added but I can't find a good way to get it.

Docs: yomidevs/yomitan-api#18

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3556c05efb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant