Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Discussion: Cortex.cpp Model Loading and Inference Errors #1091

@freelerobot

Description

@freelerobot

At the moment we fail silently and users have to send us logs. "model failed to load"

Can we get a handle on all the potential reasons why their model failed to load, and discuss how to handle each issue?

Goal:

  1. Graceful failures
  2. Predefined errors
  3. Though there are endless errors, lets adopt the Pareto Rule, as 80% of our bugs are due to 20% common model loading challenges

Examples

  1. Model won't fit in RAM/VRAM
  2. Another model is running... other edge cases & race conditions
  3. Wrong model format (i.e. unsupported runtime)
  4. Version conflicts (in trt-llm engine scneario)
  5. Missing model.yaml, template, key input/configs
  6. Corrupted or missing model binaries
  7. Incompat hardware. See

Questions:

  1. What are the other common issues?
  2. We support various engines, but should we standardize failure modes? This allows us to offer better dx/ux down the road.
  3. What are the various ways that llamacpp, trtllm, directml currently handle errors? Do they have a predefined, neat list we can adopt?

Related issues:

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions