Skip to content

Separate images for runtime #4

@deitch

Description

@deitch

There are 2 distinct scenarios.

  1. Someone downloads a model from HF or OCI in onnx format, runs onnxruntime to compile to .bundle, save to local cache, and then run it
  2. Someone downloads a model from HF or OCI in .bundle format, runs onnxruntime to run it, no compilation necessary.

There are two image-related issues with the second scenario.

First, the image is much larger than it needs to be. It still needs the inference server, and maybe neuralizer, but it does not need glow or gcc or llvm, etc. This makes it harder to use and is a worse user experience.

Second, the image name is confusing. It no longer is an _onnx_runner; it is a _bundle_runner. This is equivalent to using gcc tools to run an executable. Sure, developers can (and sometimes do) use gdb to run something, but for the vast majority of users, they do not need or want the toolchain; they just want the compiled binary.

We probably need to separate the image into two:

  • bundlerunner - runs a compiled .bundle
  • onnxruntime - compiles an onnx model into a .bundle and then optionally runs it

We can discuss if onnxruntime should be able to run the compiled bundle, or if the correct process is to say "run this" and then "run that". Since it all is wrapped in nekko, I would favour separate.

cc @jerenkrantz @rvs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions