Add Z.AI (Zhipu AI) GLM provider#109
Merged
priyanshujain merged 14 commits intomasterfrom Mar 20, 2026
Merged
Conversation
Standalone provider implementation with OpenAI-compatible request/response format. Supports Chat, StreamChat, SSE parsing, and tool calling. Uses Z.AI's /v4/ API prefix and handles their error format (code/message).
Unit tests with mock HTTP server for text response, tool use, error handling, streaming text, streaming tool calls, and request format validation. Integration test against real API using glm-4.5-flash.
GLM-4.5 family at 128K, GLM-4.7/5 family at 200K.
Fetches models from Z.AI API and filters for GLM models.
Uses glm-4.5-flash (free, generous rate limits) for default/fast/nano and glm-4.7 (paid) for complex tier.
Z.AI models return reasoning in a separate `reasoning_content` field. When `content` is empty, fall back to `reasoning_content` for both non-streaming and streaming responses.
Verifies the provider correctly parses Z.AI's error format which uses `code` (e.g. "1302") instead of OpenAI's `type` field.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
glm-4.5-flash(free, generous rate limits) as default model — notglm-4.7-flashwhich has aggressive rate limiting (~3 requests then locked out for minutes)/modelsendpoint and cached, not hardcodedreasoning_contentfield — whencontentis empty, reasoning text is surfaced insteadKey decisions
glm-4.5-flashoverglm-4.7-flash: Both are free, butglm-4.5-flashhandles 20+ rapid sequential calls without rate limiting, whileglm-4.7-flashlocks out after ~3 calls. Verified via burst testing against real API./v4/prefix (not/v1/), has a different error format (code/messagevstype/message), returnsreasoning_contentalongsidecontent, and this gives us full control for Z.AI-specific behavior.Files changed
provider/zai/zai.goprovider/zai/zai_test.goprovider/registry.goZAI_API_KEYenv varprovider/context_window.goprovider/context_window_test.goprovider/models.golistModelsZAI— fetches and filters GLM modelsconfig/profiles.gosettings/registry.gointernal/cli/*.goTest plan
go build ./...compiles cleanlygo test ./provider/zai/...— 9 unit tests pass (mock server)go test ./provider/...— all 7 provider packages passZAI_API_KEY—glm-4.5-flashreturns correct responseglm-4.5-flash— 20/20 succeededreasoning_contentfallback verified — emptycontentwith reasoning returns text