Skip to content

Commit 43b5942

Browse files
committed
Add available models for Moonshot Chat Completion Models.
Signed-off-by: Xiaojie Wang <[email protected]>
1 parent c623264 commit 43b5942

File tree

2 files changed

+29
-1
lines changed

2 files changed

+29
-1
lines changed

models/spring-ai-moonshot/src/main/java/org/springframework/ai/moonshot/api/MoonshotApi.java

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
*
4949
* @author Geng Rong
5050
* @author Thomas Vitale
51+
* @author Wang Xiaojie
5152
*/
5253
public class MoonshotApi {
5354

@@ -207,14 +208,37 @@ public enum ChatCompletionFinishReason {
207208
* Moonshot Chat Completion Models:
208209
*
209210
* <ul>
211+
* <li><b>MOONSHOT_V1_AUTO</b> - moonshot-v1-auto</li>
210212
* <li><b>MOONSHOT_V1_8K</b> - moonshot-v1-8k</li>
211213
* <li><b>MOONSHOT_V1_32K</b> - moonshot-v1-32k</li>
212214
* <li><b>MOONSHOT_V1_128K</b> - moonshot-v1-128k</li>
213215
* </ul>
216+
*
217+
* {@code moonshot-v1-auto} can select the appropriate model based on the number of Tokens occupied by the current context. The available models for selection include:
218+
* <ul>
219+
* <li>{@code moonshot-v1-8k}</li>
220+
* <li>{@code moonshot-v1-32k}</li>
221+
* <li>{@code moonshot-v1-128k}</li>
222+
* </ul>
223+
* <p>{@code moonshot-v1-auto} can be regarded as a model router, which decides which specific model to select based on the number of Tokens occupied by the current context. In terms of performance and output, {@code moonshot-v1-auto} is indistinguishable from the aforementioned models.</p>
224+
* The routing rules for the model selected by {@code moonshot-v1-auto} are as follows:
225+
* <ul>
226+
* <li>If {@code total_tokens ≤ 8 * 1024}, choose {@code moonshot-v1-8k}.</li>
227+
* <li>If {@code 8 * 1024 < total_tokens ≤ 32 * 1024}, choose {@code moonshot-v1-32k}.</li>
228+
* <li>If {@code total_tokens > 32 * 1024}, choose {@code moonshot-v1-128k}.</li>
229+
* </ul>
230+
* The calculation formula is:
231+
* {@code total_tokens = prompt_tokens + max_tokens}
232+
* <p>The total number of Tokens is composed of two parts:
233+
* <ul>
234+
* <li>{@code prompt_tokens}: The number of Tokens occupied by the input prompt (Prompt).</li>
235+
* <li>{@code max_tokens}: The maximum number of Tokens expected to be generated as output.</li>
236+
* </ul>
214237
*/
215238
public enum ChatModel implements ChatModelDescription {
216239

217240
// @formatter:off
241+
MOONSHOT_V1_AUTO("moonshot-v1-auto"),
218242
MOONSHOT_V1_8K("moonshot-v1-8k"),
219243
MOONSHOT_V1_32K("moonshot-v1-32k"),
220244
MOONSHOT_V1_128K("moonshot-v1-128k");

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/moonshot-chat.adoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ The prefix `spring.ai.moonshot.chat` is the property prefix that lets you config
8989
| spring.ai.moonshot.chat.enabled | Enable Moonshot chat model. | true
9090
| spring.ai.moonshot.chat.base-url | Optional overrides the spring.ai.moonshot.base-url to provide chat specific url | -
9191
| spring.ai.moonshot.chat.api-key | Optional overrides the spring.ai.moonshot.api-key to provide chat specific api-key | -
92-
| spring.ai.moonshot.chat.options.model | This is the Moonshot Chat model to use | `moonshot-v1-8k` (the `moonshot-v1-8k`, `moonshot-v1-32k`, and `moonshot-v1-128k` point to the latest model versions)
92+
| spring.ai.moonshot.chat.options.model | This is the Moonshot Chat model to use | `moonshot-v1-8k` (the `moonshot-v1-auto`, `moonshot-v1-8k`, `moonshot-v1-32k`, and `moonshot-v1-128k` point to the latest model versions)
9393
| spring.ai.moonshot.chat.options.maxTokens | The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. | -
9494
| spring.ai.moonshot.chat.options.temperature | The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. | 0.7
9595
| spring.ai.moonshot.chat.options.topP | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. | 1.0
@@ -103,6 +103,10 @@ NOTE: You can override the common `spring.ai.moonshot.base-url` and `spring.ai.m
103103
The `spring.ai.moonshot.chat.base-url` and `spring.ai.moonshot.chat.api-key` properties if set take precedence over the common properties.
104104
This is useful if you want to use different Moonshot accounts for different models and different model endpoints.
105105

106+
NOTE: When the value of `spring.ai.moonshot.chat.options.model` is set to `moonshot-v1-auto`, it can select the appropriate model based on the number of Tokens occupied by the current context.
107+
The available models for selection include: `moonshot-v1-8k`, `moonshot-v1-32k` and `moonshot-v1-128k`.
108+
`moonshot-v1-auto` can be considered as a model router. It decides which specific model to select based on the number of Tokens occupied by the current context. In terms of performance and output, `moonshot-v1-auto` is indistinguishable from the aforementioned models.
109+
106110
TIP: All properties prefixed with `spring.ai.moonshot.chat.options` can be overridden at runtime by adding a request specific <<chat-options>> to the `Prompt` call.
107111

108112
== Runtime Options [[chat-options]]

0 commit comments

Comments
 (0)