04 Mar 03:35

zhudotexe

81a07ba

v1.9.0 Latest

Latest

v1.9.0 - OpenAI Responses API

New Features

OpenAI: Added support for using the Responses API by setting api_type="responses" when initializing an OpenAIEngine. The OpenAIEngine will automatically default to the Responses API for "deep-research" style reasoning models.

Fixes et al

HF: Fixed an issue when using multi-turn tool calling with Qwen-3 series models where the thinking content from earlier tool calling turns would not be passed to subsequent turns within the same round
OpenAI: Improved the ergonomics of the Kani → OpenAI translation somewhat to allow for easier overrides for custom engines (e.g., the vLLM OpenAI-compatible API)

Assets 2

15 Jan 22:19

zhudotexe

v1.8.0

90d3055

v1.8.0 - MCP Tools

Model Context Protocol (MCP) is a standard for defining tools and making them available to LLMs over the Internet or
local communication. Kani now supports using local or remote MCP tools, so that you can use the broad MCP ecosystem with
the flexibility of Kani.

In order to use MCP tools, first specify a list of MCP servers to connect to. Then, use the tools_from_mcp_servers
context manager to connect to these servers and retrieve the list of available tools. You can pass these tools like
normal Kani AIFunctions.

Check out the MCP tool docs at https://kani.readthedocs.io/en/latest/function_calling.html#mcp-tools for more
information!

New Features

MCP: added tools_from_mcp_servers context manager to retrieve MCP tools from a remote and provide them as Kani
AIFunctions
Improved automatic serialization of AIFunction returns: if an AIFunction returns a Pydantic model or dict/list,
automatically cast it to JSON instead of naively calling str()
Allowed AIFunctions to return ChatMessages or list[MessagePart] directly for multimodal function returns
Extension packages (i.e., those using the kani.ext.* namespace) can now define a list of CLI_PROVIDERS for use
with the kani CLI
Improved error logging when context length counting fails

Fixes

HF: Fixes an issue where certain prompts would be built suboptimally in the face of chat template restrictions
Qwen 3 (HF): Ensure the correct thinking parser is applied to all thinking models
OpenAI: Fixed an issue when using Kani.save() on messages with attached extra metadata from OpenAI
OpenAI: Fixed an issue with empty tool call deltas while streaming with tools

Assets 2

30 Oct 19:26

zhudotexe

v1.7.0

5f854cb

v1.7.0 - Token Counting Refactor, HF Multimodal Inputs

Token Counting Refactor

Under the hood, kani now uses full prompts (i.e., a list of messages + functions) to count tokens, rather than summing
the token counts of messages individually. This makes token counting more reliable for models which do not expose their
tokenizer (e.g. Claude and Gemini) and models with strict chat templates (HF transformers, llama.cpp).

If you do not manually count tokens by using Kani.message_token_len, BaseEngine.message_len,
BaseEngine.token_reserve or BaseEngine.function_token_reserve, no change is needed.

If you have implemented your own engine, the methods above are now deprecated. To implement token counting
functionality, replace BaseEngine.message_len, .token_reserve, and .function_token_reserve with the new method
.prompt_len(messages, functions). This method may be asynchronous.

This change aims to make implementing new engines simpler and more streamlined, as prompt-wise token counting can reuse
much of the same code as inference.

Breaking Changes

Deprecated Kani.message_token_len - use await Kani.prompt_token_len instead
Deprecated BaseEngine.message_len, .token_reserve, .function_token_reserve - implement BaseEngine.prompt_len
instead
AIFunction.auto_truncate now truncates to a certain number of characters instead of tokens

New Features

Added BaseEngine.prompt_len and Kani.prompt_token_len
Added native multimodal support to the HuggingEngine
Added TextPart message part for rich extras
Added documentation for low-level hackability overrides for certain API-based engines (e.g., for server-side tool
calling)

Fixes

Fixed a case where decoding params specified in multiple places could overlap and cause errors
Fixed an issue where a Kani instance's property getters would be called on construction while searching for
AIFunctions
Fixed the process hanging forever if the call to PreTrainedModel.generate crashes during a call to
HuggingEngine.stream
Fixed an issue where the LlamaCppEngine would not release its resources immediately when closed
Fixed an issue when parsing Mistral-style function calls for functions without arguments
Fixed an issue where certain base models would not be found when trying to automatically identify base models for
quantized variants
Removed some unused exception types after the HTTPEngine was removed
GoogleAIEngine: Raise better warnings when the Gemini API returns an unexpectedly empty response

Assets 2

18 Sep 20:40

zhudotexe

v1.6.1

58eb03f

v1.6.1

Removed deprecated HTTPClient (since v1.0.0)
Increased default desired_response_tokens to 10% of the model's max context length or 8192 tokens, whichever is shorter
Anthropic: reasoning is now returned in a ReasoningPart instead of an AnthropicUnknownPart
Anthropic: increased default max_tokens to 2048
Anthropic: fixed tool results not being sent to the model correctly
OpenAI: better warning about which tokenizer is used when a model ID is not found
Google AI: reasoning is now returned in a ReasoningPart instead of a str and is forwarded correctly for multi-turn function calls

Assets 2

28 Aug 18:42

zhudotexe

v1.6.0

827ca5b

v1.6.0

New feature: Multimodal inputs

kani-multimodal-core should be installed alongside the core kani install using an extra:

$ pip install "kani[multimodal]"

However, you can also explicitly specify a version and install the core package itself:

$ pip install kani-multimodal-core

Features

This package provides the core multimodal extensions that engine implementations can use -- it does not provide any
engine implementations on its own.

The package adds support for:

Images (kani.ext.multimodal_core.ImagePart)
Audio (kani.ext.multimodal_core.AudioPart)
Video (kani.ext.multimodal_core.VideoPart)
Other binary files, such as PDFs (kani.ext.multimodal_core.BinaryFilePart)

When installed, these core kani engines will automatically use the multimodal parts:

OpenAIEngine
AnthropicEngine
GoogleAIEngine

Additionally, the core kani chat_in_terminal method will support attaching multimodal data from a local drive or
from the internet using @/path/to/media or @https://example.com/media.

Message Parts

The main feature you need to be familiar with is the MessagePart, the core way of sending messages to the engine.
To do this, when you call the kani round methods (i.e. Kani.chat_round or Kani.full_round or their str variants),
pass a list of multimodal parts rather than a string:

from kani import Kani
from kani.engines.openai import OpenAIEngine
from kani.ext.multimodal_core import ImagePart

engine = OpenAIEngine(model="gpt-4.1-nano")
ai = Kani(engine)

# notice how the arg is a list of parts rather than a single str!
msg = await ai.chat_round_str([
    "Please describe this image:",
    ImagePart.from_file("path/to/image.png")
])
print(msg)

See the docs (https://kani-multimodal-core.readthedocs.io) for more information about the provided message parts.

Terminal Utility

When installed, kani-multimodal-core augments the chat_in_terminal utility provided by kani.

This utility allows you to provide multimodal media on your disk or on the internet inline by prepending it with an
@ symbol:

>>> from kani import chat_in_terminal
>>> chat_in_terminal(ai)
USER: Please describe this image: @path/to/image.png and also this one: @https://example.com/image.png

Added native support for multimodal (image, video, audio) models using the kani-multimodal-core package (https://github.com/zhudotexe/kani-multimodal-core)!
- The AnthropicEngine, OpenAIEngine, and GoogleAIEngine will automatically support multimodal inputs when kani-multimodal-core is installed

New feature: Native Google Gemini support

$ pip install "kani[google]"

from kani import Kani
from kani.engines.google import GoogleAIEngine

engine = GoogleAIEngine(model="gemini-2.5-flash")

This engine supports all Google AI models through the Google AI Studio API.

See https://ai.google.dev/gemini-api/docs/models for a list of available models.

Multimodal support: images, audio, video.

Added the GoogleAIEngine for Google Gemini support - supports function calling & multimodal inputs

New feature: `kani` CLI tool

When kani is installed, you can run $ kani provider:model-id to begin chatting with a model in your terminal!

Examples:

$ kani openai:gpt-4.1-nano
$ kani huggingface:meta-llama/Meta-Llama-3-8B-Instruct
$ kani anthropic:claude-sonnet-4-0
$ kani google:gemini-2.5-flash

This CLI helper automatically creates a Engine and Kani instance, and calls chat_in_terminal() so you can test LLMs faster. Just as with chat_in_terminal(), you can use @/path/to/file or @https://example.com/file to attach multimodal parts to your CLI inputs.

Added a kani CLI tool for easy chatting in terminal
- Use @/path/to/file or @https://example.com/file to upload a multimodal file to your CLI

Additional features & fixes

Added Message.extras to store arbitrary additional information with a Message object
- Certain engines will set an extra to store the raw response returned by the API (see engine docs)
- For example, to access the detailed usage object returned by an OpenAIEngine, you can use:

msg = await ai.chat_round(...)  # or .full_round
openai_usage = msg.extra["openai_usage"]

Added a save_format parameter to Kani.save() to allow saving to a .kani file instead of .json
- Saving to a .kani file is used by default unless the filename given to Kani.save() ends with .json
- A .kani file is a ZIP file containing the saved chat state of the Kani instance
- Certain extensions (e.g., kani-multimodal-core) may save additional files to the .kani archive to save multimodal MessageParts without inflating the size of the saved JSON file
Fixed the kani CLI not always quitting on ^C

Engine-specific

Anthropic: Handle PDF inputs using kani.ext.multimodal_core.BinaryFilePart
Hugging Face: Load on MPS by default when detected on a macOS system

Assets 2

08 Aug 22:16

zhudotexe

v1.5.1

8fd760e

v1.5.1

Fixes an issue where the reasoning and final output would not be split when using GPT-OSS with no tools provided

Assets 2

07 Aug 22:14

zhudotexe

v1.5.0

95de371

v1.5.0

GPT-OSS and GPT-5

GPT-OSS and GPT-5 are now supported in kani>=1.5.0! You can use this code to get started with full function calling + reasoning capabilities:

from kani import Kani, chat_in_terminal
from kani.engines.huggingface import HuggingEngine
from kani.model_specific.gpt_oss import GPTOSSParser
# this method is the same for the 20B and 120B variants - simply replace the model ID!
model = HuggingEngine(
    model_id="openai/gpt-oss-20b",
    chat_template_kwargs=dict(reasoning_effort="low"),  # set this to "low", "medium", or "high"
    eos_token_id=[200002, 199999, 200012],              # ensures the model stops correctly on tool calls
    temperature=1.0,                                    # suggested decoding parameter
    top_k=None,                                         # ensure we do not use top_k (transformers default =50)
)
engine = GPTOSSParser(model, show_reasoning_in_stream=True)
ai = Kani(engine)
chat_in_terminal(ai)

Full release notes

Added support for GPT-OSS with a model-specific parser/pipeline
Added support for GPT-5 in the OpenAIEngine
Added automatic handwritten model pipelines: when using a HuggingEngine whose model requires slightly more logic than the provided chat template, Kani automatically selects a correct handwritten prompt pipeline (in kani.model_specific)
Fixed some issues with the HF Chat Template based pipeline not sending tools in the correct schema
Fixed an issue where certain argument names could not be passed to ToolCall.from_function
Fixed an issue with importing the OpenAIEngine on openai-python>=1.99.2
Made the HuggingEngine return non-EOS special tokens
Optimized the throughput and memory usage of the HuggingEngine
BREAKING: Moved existing handwritten model pipelines from prompts/impl to model_specific
BREAKING: Moved existing tool parsers from tool_parsers to model_specific
BREAKING: Removed Vicuna 1.3: this model was a very old fine-tune of Llama v1, and removed to reduce the maintenance burden of the library.

Assets 2

09 Jun 17:50

zhudotexe

v1.4.3

afeee83

v1.4.3

Llama.cpp: Add model_path kwarg to allow loading local GGUF models (thanks @lawrenceakka!)

Note

This is technically a minor breaking change, as the position of arguments has changed. I recommend using keyword arguments to load any models.

Hugging Face: Do not set max_length generation parameter if max_new_tokens is set to avoid a verbose warning
OpenAI: Add default context lengths for o-series models, GPT-4.1, add warning for models without default context lengths

Contributors

lawrenceakka

Assets 2

09 Apr 22:26

zhudotexe

v1.4.2

e91ee9e

v1.4.2

Add model_cls to HuggingEngine to allow specifying an alternate class other than AutoModelForCausalLM (e.g. for Qwen-2.5-omni)

Assets 2

05 Apr 01:46

zhudotexe

v1.4.1

d2714e0

v1.4.1

Added better options for controlling the JSON Schema generated by an AIFunction
Generated JSON Schema now includes a function's docstring by default as the top-level description key
Generated JSON Schema's top-level title key is now a function's name instead of _FunctionSpec by default
Generated JSON Schema's fields only include a title key if a title kwarg is explicitly passed to AIParam (fixing a regression introduced some time ago)

These changes should have no effect on OpenAI function calling; these changes are made to improve compatibility with open models that use raw JSON Schema to define functions (e.g., Step-Audio).

Assets 2

Releases: zhudotexe/kani

v1.9.0

v1.9.0 - OpenAI Responses API

New Features

Fixes et al

Uh oh!

v1.8.0 - MCP Tools

New Features

Fixes

Uh oh!

v1.7.0 - Token Counting Refactor, HF Multimodal Inputs

Token Counting Refactor

Breaking Changes

New Features

Fixes

Uh oh!

v1.6.1

Uh oh!

v1.6.0

New feature: Multimodal inputs

Features

Message Parts

Terminal Utility

New feature: Native Google Gemini support

New feature: kani CLI tool

Additional features & fixes

Engine-specific

Uh oh!

v1.5.1

Uh oh!

v1.5.0

GPT-OSS and GPT-5

Full release notes

Uh oh!

v1.4.3

Contributors

Uh oh!

v1.4.2

Uh oh!

v1.4.1

Uh oh!

New feature: `kani` CLI tool