`modelsio` providers for `models` resource #1528

JamesKunstle · 2025-03-10T21:33:08Z

JamesKunstle
Mar 10, 2025

Llama Stack has prioritized Llama models for good reasons, but recent contributions have seen the project go the direction of more model neutrality.

Users may want to bring model parameters from cloud object storage:

S3
HuggingFace
OCI registry

and use those modules uniformly across LLS.

client.modelsio.register(
    modelsio_id="mys3-us-east-1",
    provider_id="s3-tar-to-hf",
    provider_credentials={...},
)

client.models.register(
    model_id="mys3::ibm-granite/granite-3.1-8b-instruct",
    provider_id="mys3-us-east-1",
    model_type="instruct",
    ...
)

client.evaluations.evaluate(
    model_id="mys3::ibm-granite/granite-3.1-8b-instruct"
    ...
 )

modelsio providers would expose a few endpoints so they can bridge to remote providers as well:

-X GET /modelsio/<model-ref>/artifact # returns tar-ed model params
-X GET /modelsio/<model-ref>/ref # returns URI so authenticated user can get model from source
-X POST -F <blob file> /modelsio/<provider-type, e.g. `localfs-huggingface`>/<model-format>

The abstraction offered by inline vs. remote providers breaks down a bit when plumbing large artifacts around, but this way, we can implement the following user behavior:

User has LoRA-tuned a checkpoint on their own hardware on data they can't share anywhere. They want to evaluate the model to make sure it still works, but they need to use their own internal evaluation harness (LLS with an eval provider).
User POSTs the model checkpoint
curl -X 'POST' -F "./my-hf-model.tar" -d '{"unique_model_uri": "localfs-huggingface-myusername-my-hf-model"}' https://myllamastack/api/modelsio/localfs-huggingface/tar
If the model is too big for the POST request (maybe 3.5GB or something) the user will have to user an independent COS solution like S3. Assuming the file is <3.5GB, though:
User can then register their model via artifact:
curl -X 'POST' -d '{"provider_id": "localfs-huggingface", "model_id": "localfs-huggingface-myusername-my-hf-model"} https://myllamastack/api/models/'
User can then user that registered model for another task:
curl -X 'POST' -d '{...}' https://myllamastack/api/evaluation/<evaluation_type>

The example workflow is pretty far from the existing APIs but explains the flow.

Gaps

If we let users upload large blobs to the host, how do we manage them so the FS doesn't get filled?
How are blobs "owned" by posting users? Do all users see all blobs?
How do we manage large I/O spikes from GET requests for a given model checkpoint? Rate limiting?

SLR722 · 2025-03-10T23:06:47Z

SLR722
Mar 10, 2025

Can /files api https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/files/files.py fulfill the modelsio job here?

0 replies

cdoern · 2025-03-11T00:49:13Z

cdoern
Mar 11, 2025

one part of this that I think will be tricky is:

client.models.register

which in this proposal seems to be a model retriever. But won't we also then want models.register to modify, say, an inference provider to use this new model?

If we go down the route of "registration" being a downloader in a way, we might not want to overload it and also make a new API for plopping a new model at an inference endpoint.

I could also be missing something here

2 replies

JamesKunstle Mar 11, 2025
Author

I appreciate the comment! I'm thinking of this being layered abstraction. A model can be a reference only, e.g. meta-llama/llama-3.1-8b-instruct or whatever, similar to a dataset. It's just bookkeeping. Then, a modelio provider is the "backend" of the model, like different backends for Torch tensors.

If you register a model you can link a modelio provider to it that defines how that model gets pulled and cached on the LLS host.

The analog is the datasetio provider for the dataset resource; there's the localfs provider and the huggingface provider, inline and remote respectively, that each expose the get_rows_paginated endpoint for getting rows of a dataset. I think this is a cool pattern because if a user wanted to add an S3 dataset backend, they just need to implement grabbing the dataset and paginating the rows, and it's a drop-in replacement for the existing providers.

The same workflow for models makes sense to me by extension. I don't know what the models equivalent to "list of json rows" is (there are transformers torch modules to load params into, torchtune, etc, so I think there's another potential translation layer in the middle maybe) but raising the discussion seemed useful.

syntaxsdev May 28, 2025

What's the status on this? @JamesKunstle
The modelsio concept is a great pluggable solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`modelsio` providers for `models` resource #1528

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

modelsio providers for models resource #1528

Uh oh!

JamesKunstle Mar 10, 2025

Gaps

Replies: 2 comments · 2 replies

Uh oh!

SLR722 Mar 10, 2025

Uh oh!

Uh oh!

cdoern Mar 11, 2025

Uh oh!

JamesKunstle Mar 11, 2025 Author

Uh oh!

syntaxsdev May 28, 2025

`modelsio` providers for `models` resource #1528

JamesKunstle
Mar 10, 2025

Replies: 2 comments 2 replies

SLR722
Mar 10, 2025

cdoern
Mar 11, 2025

JamesKunstle Mar 11, 2025
Author