-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Note: This issue was copied from ggml-org#6311
Original Author: @asg017
Original Issue Number: ggml-org#6311
Created: 2024-03-26T02:03:02Z
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Feature Description
There should be a llama_load_model_from_buffer() function added to llama.h/llama.cpp to compliment llama_load_model_from_file(). Instead of loading a model from a file, it should instead read the model from a user-provided buffer.
Motivation
I'm working on a tool that can load multiple llama models from different sources. Ideally, I'd like to store these models in a SQLite database, and load them in full from memory. However, since the only way to load llama models is with llama_load_model_from_file(), I'll need to serialize them to disk first and pass in a path to that file. That's pretty wasteful, as they are already in memory and don't need to persist them to disk.
In my case, I'm working with small embedding models (10's to 100's of MB), but I'm sure this can be useful for larger models on larger computers.
Possible Implementation
Hmm looks like gguf_init_from_buffer() has ben commented out from ggml.h. So maybe this will be more difficult than I thought?