Fixed issue with `examples/llm_inference/js` failing to load larger models by implementing buffered loading system allowing for the loading of larger models like Gemma 3n E2B & E4B by RandomGamingDev · Pull Request #640 · google-ai-edge/mediapipe-samples

RandomGamingDev · 2025-11-25T07:58:39Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.

Fixes # (issue)

Checklist

Please ensure the following items are complete before submitting a pull request:

My code follows the code style of the project.
I have updated the documentation (if applicable).
I have added tests to cover my changes.

Type of Change

Please check the relevant option below:

Bug fix (non-breaking change which fixes an issue)
Documentation update (non-breaking change which updates documentation)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Additional Notes

Now capable of loading larger models (i.e. Google's E2B & E4B) without just completely crashing.

…dels by implementing buffered loading system allowing for the loading of larger models like Gemma 3n E2B & E4B

RandomGamingDev · 2025-11-25T07:59:48Z

@tyrmullen
Apologies beforehand for the ping, but it seems like a lot of people struggled with the same issue, not being able to get past it, and you're the most active contributor on the repository (which I appreciate, I love Google's work on AI efficiency, and especially edge AI :D).

tyrmullen · 2026-01-10T01:20:57Z

Thanks! For this particular sample/demo, we're hoping to keep things as simple as possible for first-time users. In particular, we wanted to have this demo be (a) super-easy to run, and (b) have the code be pretty minimal.

To that end, we just submitted a few changes to simplify things. Instead of requiring users serve models themselves locally with Python, we switched to a simple "file chooser" button, so now users can simply open the demo .html file in their browsers and immediately try out any models they have (no Python required). As a side-effect of this, the models will skip the remote fetching and thus the streaming loading should be very fast (and we also ensure streaming loading occurs for the file upload by using modelAssetBuffer:modelStream.getReader() to create the LlmInference).

For more advanced users, like those interested in local caching, the other demos (LLM chat and 3N) should be a better and more complete reference, so we're focusing our local caching examples there just to keep this code a little smaller (since local caching can be a bit tricky, is domain-specific, can be done in a few different ways, and isn't for everyone).

RandomGamingDev · 2026-01-12T01:10:43Z

Thanks! For this particular sample/demo, we're hoping to keep things as simple as possible for first-time users. In particular, we wanted to have this demo be (a) super-easy to run, and (b) have the code be pretty minimal.

To that end, we just submitted a few changes to simplify things. Instead of requiring users serve models themselves locally with Python, we switched to a simple "file chooser" button, so now users can simply open the demo .html file in their browsers and immediately try out any models they have (no Python required). As a side-effect of this, the models will skip the remote fetching and thus the streaming loading should be very fast (and we also ensure streaming loading occurs for the file upload by using modelAssetBuffer:modelStream.getReader() to create the LlmInference).

For more advanced users, like those interested in local caching, the other demos (LLM chat and 3N) should be a better and more complete reference, so we're focusing our local caching examples there just to keep this code a little smaller (since local caching can be a bit tricky, is domain-specific, can be done in a few different ways, and isn't for everyone).

That sounds great! This issue should be free to be closed now then.

RandomGamingDev added 2 commits November 25, 2025 02:55

fixed issue with mediapipe/llm_inference/js failing to load larger mo…

3f4252a

…dels by implementing buffered loading system allowing for the loading of larger models like Gemma 3n E2B & E4B

updated comment on commented url modelFileName

356b45c

RandomGamingDev mentioned this pull request Nov 25, 2025

modelAssetPath method of loading in LlmInference.createFromOptions results in inefficient loading and crashes google-ai-edge/mediapipe#6160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed issue with `examples/llm_inference/js` failing to load larger models by implementing buffered loading system allowing for the loading of larger models like Gemma 3n E2B & E4B#640

Fixed issue with `examples/llm_inference/js` failing to load larger models by implementing buffered loading system allowing for the loading of larger models like Gemma 3n E2B & E4B#640
RandomGamingDev wants to merge 2 commits intogoogle-ai-edge:mainfrom
RandomGamingDev:main

RandomGamingDev commented Nov 25, 2025

Uh oh!

RandomGamingDev commented Nov 25, 2025

Uh oh!

tyrmullen commented Jan 10, 2026

Uh oh!

RandomGamingDev commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RandomGamingDev commented Nov 25, 2025

Description

Checklist

Type of Change

Additional Notes

Uh oh!

RandomGamingDev commented Nov 25, 2025

Uh oh!

tyrmullen commented Jan 10, 2026

Uh oh!

RandomGamingDev commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants