changelog : `libllama` API

**Note: This issue was copied from [https://github.com/ggml-org/llama.cpp/issues/9289](https://github.com/ggml-org/llama.cpp/issues/9289)**

**Original Author:** @ggerganov
**Original Issue Number:** #9289
**Created:** 2024-09-03T06:48:45Z

---

# Overview

This is a list of changes to the public interface of the `llama` library. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the `master` branch.

If you are building a 3rd party project that relies on `libllama`, it is recommended to follow this issue and check it before upgrading to new versions.

See also:

- [Changelog for `llama-server` REST API](https://github.com/ggerganov/llama.cpp/issues/9291)

## Recent API changes (most recent at the top)

| version | PR  | desc |
| ---     | --- | ---  |
| TBD   | #15665 | Remove `llama_sampler_init_softmax()` + `dist` sampler no longer implicitly sorts |
| b6239 | #15472 | Remove `llama_kv_self_...` API |
| b6157 | #15293 | Add `llama_state_seq_..._ext` API | 
| b5913 | #14363 | Update `llama_context_params` - add `bool kv_unified` |
| b5740 | #13037 | Update `llama_model_quantize_params` |
| b5870 | #14631 | Remove `enum llama_vocab_pre_type` |
| b5435 | #13653 | Remove `llama_kv_cache_view_*` API |
| b5429 | #13194 | Update `llama_context_params` - add `bool swa_full` |
| b5311 | #13284 | Update `llama_context_params` - remove `logits_all` + rearrange flags |
| b5125 | #12511 | Update `llama_model_quantize_params` |
| b5028 | #11397 | Update `llama_model_params` |
| b4882 | #12181 | Change `llama_kv_cache_...` -> `llama_kv_self_...` |
| b4599 | #9639 | Add llama_sampler_init_grammar_lazy to support lazy grammars w/ trigger words & tokens |
| b4524 | #11016 | Add name parameter to llama_model_chat_template (uses default template if NULL) |
| b4501  | #11262 | Remove `rpc_servers` from `llama_model` and `llama_model_params` |
| b4464 | #11110 | Add `llama_vocab` and rename various structs and calls |
| b4424 | #11063 | Update `llama_model` API naming | 
| b4357 | #10784 | Remove `llama_model_get_tensor()` |
| b4337 | #10803 | Change `llama_sampler_init_penalties()` |
| b4282 | #10446 | Remove support for `Q4_0_N_M` model files in favor of automatic repacking of `Q4_0` |
| b4167 | #10497 | Add `devices` to `llama_model_params` |
| b3948 | #9897 | Deprecate `softmax` sampler and update `dist` sampler` |
| b3988 | #10071 | Remove Tail-Free sampling |
| b3943 | #9745 | Remove `all_pos_0, all_pos_1, all_seq_id` from `llama_batch` |
| b3908 | #9798 | Update FIM-related API |
| b3841 | #9510 | Add `LLAMA_POOLING_TYPE_RANK` |
| b3774 | #9512 | Add `llama_n_head()` |
| b3750 | #9355 | Add `llama_perf` API + param to disable internal profiling |
| b3749 | #9445 | Add `llama_sampler_chain_remove()` |
| b3681 | #9294 | Major changes to the sampling API (see PR for more info)|
| b3651 | #8980 | Add `LLAMA_VOCAB_TYPE_RWKV` enum value |
| b3644 | #8672 | Add `llama_threadpool` API + change `uint32_t` -> `int32_t` |
| b3614 | #8526 | Add `llama_model_is_recurrent` |

*For older changes, use:*

```bash
git log --oneline -p b3614 -- include/llama.h
```

(For collaborators) To link between PR number vs Build number:

```bash
git log --oneline | tail -r | nl
```

## Upcoming API changes

- TBD


version	PR	desc
TBD	ggml-org#15665	Remove `llama_sampler_init_softmax()` + `dist` sampler no longer implicitly sorts
b6239	ggml-org#15472	Remove `llama_kv_self_...` API
b6157	ggml-org#15293	Add `llama_state_seq_..._ext` API
b5913	ggml-org#14363	Update `llama_context_params` - add `bool kv_unified`
b5740	ggml-org#13037	Update `llama_model_quantize_params`
b5870	ggml-org#14631	Remove `enum llama_vocab_pre_type`
b5435	ggml-org#13653	Remove `llama_kv_cache_view_*` API
b5429	ggml-org#13194	Update `llama_context_params` - add `bool swa_full`
b5311	ggml-org#13284	Update `llama_context_params` - remove `logits_all` + rearrange flags
b5125	ggml-org#12511	Update `llama_model_quantize_params`
b5028	ggml-org#11397	Update `llama_model_params`
b4882	ggml-org#12181	Change `llama_kv_cache_...` -> `llama_kv_self_...`
b4599	ggml-org#9639	Add llama_sampler_init_grammar_lazy to support lazy grammars w/ trigger words & tokens
b4524	ggml-org#11016	Add name parameter to llama_model_chat_template (uses default template if NULL)
b4501	ggml-org#11262	Remove `rpc_servers` from `llama_model` and `llama_model_params`
b4464	ggml-org#11110	Add `llama_vocab` and rename various structs and calls
b4424	ggml-org#11063	Update `llama_model` API naming
b4357	ggml-org#10784	Remove `llama_model_get_tensor()`
b4337	ggml-org#10803	Change `llama_sampler_init_penalties()`
b4282	ggml-org#10446	Remove support for `Q4_0_N_M` model files in favor of automatic repacking of `Q4_0`
b4167	ggml-org#10497	Add `devices` to `llama_model_params`
b3948	ggml-org#9897	Deprecate `softmax` sampler and update `dist` sampler`
b3988	ggml-org#10071	Remove Tail-Free sampling
b3943	ggml-org#9745	Remove `all_pos_0, all_pos_1, all_seq_id` from `llama_batch`
b3908	ggml-org#9798	Update FIM-related API
b3841	ggml-org#9510	Add `LLAMA_POOLING_TYPE_RANK`
b3774	ggml-org#9512	Add `llama_n_head()`
b3750	ggml-org#9355	Add `llama_perf` API + param to disable internal profiling
b3749	ggml-org#9445	Add `llama_sampler_chain_remove()`
b3681	ggml-org#9294	Major changes to the sampling API (see PR for more info)
b3651	ggml-org#8980	Add `LLAMA_VOCAB_TYPE_RWKV` enum value
b3644	ggml-org#8672	Add `llama_threadpool` API + change `uint32_t` -> `int32_t`
b3614	ggml-org#8526	Add `llama_model_is_recurrent`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

changelog : `libllama` API #246

Overview

Recent API changes (most recent at the top)

Upcoming API changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

changelog : libllama API #246

Description

Overview

Recent API changes (most recent at the top)

Upcoming API changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

changelog : `libllama` API #246