Fixed issue with examples/llm_inference/js failing to load larger models by implementing buffered loading system allowing for the loading of larger models like Gemma 3n E2B & E4B#640
Conversation
…dels by implementing buffered loading system allowing for the loading of larger models like Gemma 3n E2B & E4B
|
@tyrmullen |
|
Thanks! For this particular sample/demo, we're hoping to keep things as simple as possible for first-time users. In particular, we wanted to have this demo be (a) super-easy to run, and (b) have the code be pretty minimal. To that end, we just submitted a few changes to simplify things. Instead of requiring users serve models themselves locally with Python, we switched to a simple "file chooser" button, so now users can simply open the demo .html file in their browsers and immediately try out any models they have (no Python required). As a side-effect of this, the models will skip the remote fetching and thus the streaming loading should be very fast (and we also ensure streaming loading occurs for the file upload by using For more advanced users, like those interested in local caching, the other demos (LLM chat and 3N) should be a better and more complete reference, so we're focusing our local caching examples there just to keep this code a little smaller (since local caching can be a bit tricky, is domain-specific, can be done in a few different ways, and isn't for everyone). |
That sounds great! This issue should be free to be closed now then. |
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
Fixes # (issue)
Checklist
Please ensure the following items are complete before submitting a pull request:
Type of Change
Please check the relevant option below:
Additional Notes
Now capable of loading larger models (i.e. Google's E2B & E4B) without just completely crashing.