Discussion: Cortex.cpp Model Loading and Inference Errors

At the moment we fail silently and users have to send us logs. "model failed to load"

Can we get a handle on all the potential reasons why their model failed to load, and discuss how to handle each issue?

**Goal**: 
1. Graceful failures
2. Predefined errors
3. Though there are endless errors, lets adopt the Pareto Rule, as 80% of our bugs are due to 20% common model loading challenges

**Examples**
1. Model won't fit in RAM/VRAM
2. Another model is running... other edge cases & race conditions
3. Wrong model format (i.e. unsupported runtime)
4. Version conflicts (in trt-llm engine scneario)
5. Missing model.yaml, template, key input/configs
6. Corrupted or missing model binaries
7. Incompat hardware. See 

Questions: 
1. What are the other common issues?
2. We support various engines, but should we **standardize** failure modes? This allows us to offer better dx/ux down the road.
2. What are the various ways that llamacpp, trtllm, directml currently handle errors? Do they have a predefined, neat list we can adopt?

Related issues: 
- #130 
- https://github.com/janhq/jan/issues/2556
- https://github.com/janhq/jan/issues/3517

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion: Cortex.cpp Model Loading and Inference Errors #1091

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion: Cortex.cpp Model Loading and Inference Errors #1091

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions