diff --git a/models/spring-ai-moonshot/src/main/java/org/springframework/ai/moonshot/api/MoonshotApi.java b/models/spring-ai-moonshot/src/main/java/org/springframework/ai/moonshot/api/MoonshotApi.java index 532fb851b8b..3b4b9f66f5c 100644 --- a/models/spring-ai-moonshot/src/main/java/org/springframework/ai/moonshot/api/MoonshotApi.java +++ b/models/spring-ai-moonshot/src/main/java/org/springframework/ai/moonshot/api/MoonshotApi.java @@ -48,6 +48,7 @@ * * @author Geng Rong * @author Thomas Vitale + * @author Wang Xiaojie */ public class MoonshotApi { @@ -207,14 +208,47 @@ public enum ChatCompletionFinishReason { * Moonshot Chat Completion Models: * * + * + * {@code moonshot-v1-auto} can select the appropriate model based on the number of + * Tokens occupied by the current context. The available models for selection include: + * + *

+ * {@code moonshot-v1-auto} can be regarded as a model router, which decides which + * specific model to select based on the number of Tokens occupied by the current + * context. In terms of performance and output, {@code moonshot-v1-auto} is + * indistinguishable from the aforementioned models. + *

+ * The routing rules for the model selected by {@code moonshot-v1-auto} are as + * follows: + * + * The calculation formula is: {@code total_tokens = prompt_tokens + max_tokens} + *

+ * The total number of Tokens is composed of two parts: + *

*/ public enum ChatModel implements ChatModelDescription { // @formatter:off + MOONSHOT_V1_AUTO("moonshot-v1-auto"), MOONSHOT_V1_8K("moonshot-v1-8k"), MOONSHOT_V1_32K("moonshot-v1-32k"), MOONSHOT_V1_128K("moonshot-v1-128k"); diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/moonshot-chat.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/moonshot-chat.adoc index eafbd7c129d..62a4eb7d9bb 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/moonshot-chat.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/moonshot-chat.adoc @@ -89,7 +89,7 @@ The prefix `spring.ai.moonshot.chat` is the property prefix that lets you config | spring.ai.moonshot.chat.enabled | Enable Moonshot chat model. | true | spring.ai.moonshot.chat.base-url | Optional overrides the spring.ai.moonshot.base-url to provide chat specific url | - | spring.ai.moonshot.chat.api-key | Optional overrides the spring.ai.moonshot.api-key to provide chat specific api-key | - -| spring.ai.moonshot.chat.options.model | This is the Moonshot Chat model to use | `moonshot-v1-8k` (the `moonshot-v1-8k`, `moonshot-v1-32k`, and `moonshot-v1-128k` point to the latest model versions) +| spring.ai.moonshot.chat.options.model | This is the Moonshot Chat model to use | `moonshot-v1-8k` (the `moonshot-v1-auto`, `moonshot-v1-8k`, `moonshot-v1-32k`, and `moonshot-v1-128k` point to the latest model versions) | spring.ai.moonshot.chat.options.maxTokens | The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. | - | spring.ai.moonshot.chat.options.temperature | The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. | 0.7 | spring.ai.moonshot.chat.options.topP | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. | 1.0 @@ -103,6 +103,10 @@ NOTE: You can override the common `spring.ai.moonshot.base-url` and `spring.ai.m The `spring.ai.moonshot.chat.base-url` and `spring.ai.moonshot.chat.api-key` properties if set take precedence over the common properties. This is useful if you want to use different Moonshot accounts for different models and different model endpoints. +NOTE: When the value of `spring.ai.moonshot.chat.options.model` is set to `moonshot-v1-auto`, it can select the appropriate model based on the number of Tokens occupied by the current context. +The available models for selection include: `moonshot-v1-8k`, `moonshot-v1-32k` and `moonshot-v1-128k`. +`moonshot-v1-auto` can be considered as a model router. It decides which specific model to select based on the number of Tokens occupied by the current context. In terms of performance and output, `moonshot-v1-auto` is indistinguishable from the aforementioned models. + TIP: All properties prefixed with `spring.ai.moonshot.chat.options` can be overridden at runtime by adding a request specific <> to the `Prompt` call. == Runtime Options [[chat-options]]