Skip to content

Conversation

@YunKuiLu
Copy link
Contributor

@YunKuiLu YunKuiLu commented Oct 31, 2025

  • Introduced prompt_tokens_details with cached_tokens field to ZhiPuAiApi.Usage
  • Updated test cases to replace inline ChatOptions with DEFAULT_CHAT_OPTIONS
  • Refactored test models to ensure usage of glm-4-flash and glm-4v-flash as defaults

The usage.prompt_tokens_details.cached_tokens field is used to display the number of tokens served from cache.
Related documentation: https://docs.z.ai/api-reference/llm/chat-completion
image

All the tests have been passed:

image

…tions for tests

- Introduced `prompt_tokens_details` with `cached_tokens` field to `ZhiPuAiApi.Usage`
- Updated test cases to replace inline `ChatOptions` with `DEFAULT_CHAT_OPTIONS`
- Refactored test models to ensure usage of `glm-4-flash` and `glm-4v-flash` as defaults
- Added metadata validations for `promptTokensDetails` in response

Signed-off-by: YunKui Lu <[email protected]>
void enabledThinkingTest(String modelName) {
UserMessage userMessage = new UserMessage(
"Are there an infinite number of prime numbers such that n mod 4 == 3?");
UserMessage userMessage = new UserMessage("9.11 and 9.8, which is greater?");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce model output latency, use simpler questions

@YunKuiLu
Copy link
Contributor Author

YunKuiLu commented Nov 3, 2025

Hi @mxsl-gr , please help with code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant