Optimize Dictionary Import Memory Use by Casheeew · Pull Request #2319 · yomidevs/yomitan

Casheeew · 2026-02-25T13:26:13Z

The massive memory usage by loading term files all into memory may cause Chrome to be slower and sometimes have issues such as #1420 .

This PR instead achieves a constant memory usage via using a streaming pattern:
This PR uses zip.js' getData() stream to pipe into a custom JSON parser, and emits onBatch events whenever there are maxTransactionLength entries. This effectively reduces the memory usage by half even when the term files are small (at 1000 terms / term bank), and caps the memory usage at about 40 MB.

works flawlessly even when combining pixiv full into one humongous term bank

Based on my comprehensive testing, there is negligible difference in import time. this is slightly faster due to less GC

Maybe a partial fix to the many "Unknown error" issues... #536

Casheeew · 2026-02-25T13:32:18Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1694dafcd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ext/js/dictionary/dictionary-importer.js

Casheeew · 2026-02-25T14:00:13Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00ca6997b5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ext/js/dictionary/dictionary-importer.js

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1981762c53

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ext/js/dictionary/dictionary-importer.js

test/database.test.js

The test directories were gitignored by the `dictionaries/` rule and needed to be force-added, matching how existing fixtures are tracked. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Casheeew · 2026-02-26T03:00:47Z

Should be good to go. We don't have to cover all edge cases since realistically dictionaries should be for the large part generated by code, so there is little reason to check every malformed JSON. The main check is the AJV layer and if the dictionary passes, it should not bork the user in any way with malformed data.

Casheeew · 2026-02-26T03:00:55Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdf7851b2b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ext/js/dictionary/dictionary-importer.js

Kuuuube · 2026-02-26T03:11:59Z

Based on my comprehensive testing, there is negligible difference in import time. this is slightly faster due to less GC

Can you give a few timed tests of imports and the peak memory usage youre seeing before and after?

Casheeew · 2026-02-26T05:15:14Z

Okay, I tested again. For dictionaries which term files have 1000 entries the performance is comparable (both versions are loading 1000 entries into memory, I probably got the impression that it was halved from a wrong tab..)
import time is quite high variance (few seconds for EN-KRDICT, >10s for Pixiv)

PixivFull before (210.54s)
validation step

import step

PixivFull after (203.00s)
validation step

import step

for EN-KRDICT where each term file has a different number of entries, the memory is effectively halved:

Before (12.41s):

After (13.20s):

bonus: Pixiv with 1 term bank (700k+ entries)

Before: ∞ seconds
import error
After: 205.44s
validation step

import step

basically equivalent to when split into 1000 entry term banks

Casheeew added 4 commits February 25, 2026 15:33

perf: stream parse JSON to reduce overhead

c2b12ab

fix: validate before import

e666a02

fix: add more checks to stream parser

53df2ac

feat: add byte-based progress bar

bd91084

Casheeew requested a review from a team as a code owner February 25, 2026 13:26

fix: change JSON stream parser to match error message with AJV

b1694da

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

ext/js/dictionary/dictionary-importer.js Show resolved Hide resolved

fix: check trailing chars after closing ]

00ca699

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

ext/js/dictionary/dictionary-importer.js Show resolved Hide resolved

ext/js/dictionary/dictionary-importer.js Show resolved Hide resolved

fix: handle edge cases and add tests

1981762

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

ext/js/dictionary/dictionary-importer.js Outdated Show resolved Hide resolved

ext/js/dictionary/dictionary-importer.js Show resolved Hide resolved

test/database.test.js Show resolved Hide resolved

Casheeew and others added 4 commits February 25, 2026 23:50

fix: add missing test fixture data for invalid-dictionary7-12

760ca2c

The test directories were gitignored by the `dictionaries/` rule and needed to be force-added, matching how existing fixtures are tracked. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

please lint

8440abe

please json

54acb19

add more tests and handle edge cases

fdf7851

chatgpt-codex-connector bot reviewed Feb 26, 2026

View reviewed changes

ext/js/dictionary/dictionary-importer.js Show resolved Hide resolved

ext/js/dictionary/dictionary-importer.js Show resolved Hide resolved

Conversation

Casheeew commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Casheeew commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Casheeew commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Casheeew commented Feb 26, 2026

Uh oh!

Casheeew commented Feb 26, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Kuuuube commented Feb 26, 2026

Uh oh!

Casheeew commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Casheeew commented Feb 25, 2026 •

edited

Loading