Separate images for runtime

There are 2 distinct scenarios.

1. Someone downloads a model from HF or OCI in onnx format, runs onnxruntime to compile to `.bundle`, save to local cache, and then run it
2. Someone downloads a model from HF or OCI in `.bundle` format, runs onnxruntime to run it, no compilation necessary.

There are two image-related issues with the second scenario.

First, the image is _much_ larger than it needs to be. It still needs the inference server, and maybe neuralizer, but it does not need glow or gcc or llvm, etc. This makes it harder to use and is a worse user experience.

Second, the image name is confusing. It no longer is an _onnx_runner; it is a _bundle_runner. This is equivalent to using gcc tools to run an executable. Sure, developers can (and sometimes do) use gdb to run something, but for the vast majority of users, they do not need or want the toolchain; they just want the compiled binary.

We probably need to separate the image into two:

* bundlerunner - runs a compiled `.bundle`
* onnxruntime - compiles an onnx model into a `.bundle` and then optionally runs it

We can discuss if `onnxruntime` should be able to run the compiled bundle, or if the correct process is to say "run this" and then "run that". Since it all is wrapped in `nekko`, I would favour separate.

cc @jerenkrantz  @rvs 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate images for runtime #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Separate images for runtime #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions