At the moment we fail silently and users have to send us logs. "model failed to load"
Can we get a handle on all the potential reasons why their model failed to load, and discuss how to handle each issue?
Goal:
- Graceful failures
- Predefined errors
- Though there are endless errors, lets adopt the Pareto Rule, as 80% of our bugs are due to 20% common model loading challenges
Examples
- Model won't fit in RAM/VRAM
- Another model is running... other edge cases & race conditions
- Wrong model format (i.e. unsupported runtime)
- Version conflicts (in trt-llm engine scneario)
- Missing model.yaml, template, key input/configs
- Corrupted or missing model binaries
- Incompat hardware. See
Questions:
- What are the other common issues?
- We support various engines, but should we standardize failure modes? This allows us to offer better dx/ux down the road.
- What are the various ways that llamacpp, trtllm, directml currently handle errors? Do they have a predefined, neat list we can adopt?
Related issues: