You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: automatically adapt to current free VRAM state (#182)
* feat: read tensor info from `gguf` files
* feat: `inspect gguf` command
* feat: `inspect measure` command
* feat: `readGgufFileInfo` function
* feat: GGUF file info on `LlamaModel`
* feat: estimate VRAM usage of the model and context with certain options to adapt to current VRAM state and set great defaults for `gpuLayers` and `contextSize`. no manual configuration of those options is needed anymore to maximize performance
* feat: `JinjaTemplateChatWrapper`
* feat: use the `tokenizer.chat_template` header from the `gguf` file when available - use it to find a better specialized chat wrapper or use `JinjaTemplateChatWrapper` with it as a fallback
* feat: improve `resolveChatWrapper`
* feat: simplify generation CLI commands: `chat`, `complete`, `infill`
* feat: read GPU device names
* feat: get token type
* refactor: gguf
* test: separate gguf tests to model dependent and model independent tests
* test: switch to new vitest test signature
* fix: use the new `llama.cpp` CUDA flag
* fix: improve chat wrappers tokenization
* fix: bugs
0 commit comments