Skip to content

Commit 0c4d6ab

Browse files
committed
Improve Ollama docs
1 parent 27354cd commit 0c4d6ab

File tree

4 files changed

+43
-27
lines changed

4 files changed

+43
-27
lines changed

models/spring-ai-ollama/src/main/java/org/springframework/ai/ollama/api/OllamaOptions.java

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ public class OllamaOptions implements FunctionCallingOptions, ChatOptions, Embed
149149
/**
150150
* Sets the random number seed to use for generation. Setting this to a
151151
* specific number will make the model generate the same text for the same prompt.
152-
* (Default: 0)
152+
* (Default: -1)
153153
*/
154154
@JsonProperty("seed") private Integer seed;
155155

@@ -268,8 +268,8 @@ public class OllamaOptions implements FunctionCallingOptions, ChatOptions, Embed
268268
*/
269269
@JsonProperty("keep_alive") private String keepAlive;
270270

271-
/**
272-
* OpenAI Tool Function Callbacks to register with the ChatModel.
271+
/**
272+
* Tool Function Callbacks to register with the ChatModel.
273273
* For Prompt Options the functionCallbacks are automatically enabled for the duration of the prompt execution.
274274
* For Default Options the functionCallbacks are registered but disabled by default. Use the enableFunctions to set the functions
275275
* from the registry to be used by the ChatModel chat completion requests.
@@ -307,6 +307,11 @@ public OllamaOptions withModel(String model) {
307307
return this;
308308
}
309309

310+
public OllamaOptions withModel(OllamaModel model) {
311+
this.model = model.getName();
312+
return this;
313+
}
314+
310315
public String getModel() {
311316
return model;
312317
}
-22.5 KB
Loading

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/functions/ollama-chat-functions.adoc

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
TIP: You need Ollama 0.2.8 or newer.
44

5-
TIP: You need https://ollama.com/library[Models] pre-trained for Tools support.
5+
TIP: You need https://ollama.com/search?c=tools[Models] pre-trained for Tools support.
66
Usually, such models are tagged with a `Tools` tag.
77
For example `mistral`, `firefunction-v2` or `llama3.1:70b`.
88

@@ -121,9 +121,6 @@ public record Request(String location, Unit unit) {}
121121

122122
It is a best practice to annotate the request object with information such that the generated JSON schema of that function is as descriptive as possible to help the AI model pick the correct function to invoke.
123123

124-
The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/openai/tool/FunctionCallbackWithPlainFunctionBeanIT.java[FunctionCallbackWithPlainFunctionBeanIT.java] demonstrates this approach.
125-
126-
127124
==== FunctionCallback Wrapper
128125

129126
Another way to register a function is to create a `FunctionCallbackWrapper` wrapper like this:
@@ -179,7 +176,7 @@ Here is the current weather for the requested cities:
179176
- Paris, France: 15.0°C
180177
----
181178

182-
The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/openai/tool/FunctionCallbackWrapperIT.java[FunctionCallbackWrapperIT.java] test demo this approach.
179+
The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackWrapperIT.java[FunctionCallbackWrapperIT.java] test demo this approach.
183180

184181

185182
=== Register/Call Functions with Prompt Options
@@ -206,7 +203,7 @@ NOTE: The in-prompt registered functions are enabled by default for the duration
206203

207204
This approach allows to dynamically chose different functions to be called based on the user input.
208205

209-
The https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackInPromptIT.java[FunctionCallbackInPromptIT.java] integration test provides a complete example of how to register a function with the `OllamaChatModel` and use it in a prompt request.
206+
The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackInPromptIT.java[FunctionCallbackInPromptIT.java] integration test provides a complete example of how to register a function with the `OllamaChatModel` and use it in a prompt request.
210207

211208
== Appendices:
212209

@@ -222,4 +219,4 @@ The following diagram illustrates the flow of the Ollama API:
222219

223220
image:ollama-function-calling-flow.jpg[title="Ollama API Function Calling Flow", width=800]
224221

225-
The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/chat/api/tool/OpenAiApiToolFunctionCallIT.java[OllamaApiToolFunctionCallIT.java] provides a complete example on how to use the Ollama API function calling.
222+
The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/api/tool/OllamaApiToolFunctionCallIT.java[OllamaApiToolFunctionCallIT.java] provides a complete example on how to use the Ollama API function calling.

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/ollama-chat.adoc

Lines changed: 31 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@
33
With https://ollama.ai/[Ollama] you can run various Large Language Models (LLMs) locally and generate text from them.
44
Spring AI supports the Ollama text generation with `OllamaChatModel`.
55

6+
7+
TIP: Ollama offers an OpenAI API compatible endpoint as well.
8+
Check the xref:_openai_api_compatibility[OpenAI API compatibility] section to learn how to use the xref:api/chat/openai-chat.adoc[Spring AI OpenAI] to talk to an Ollama server.
9+
610
== Prerequisites
711

812
You first need to run Ollama on your local machine.
@@ -12,7 +16,8 @@ NOTE: installing `ollama run llama3` will download a 4.7GB model artifact.
1216

1317
=== Add Repositories and BOM
1418

15-
Spring AI artifacts are published in Spring Milestone and Snapshot repositories. Refer to the xref:getting-started.adoc#repositories[Repositories] section to add these repositories to your build system.
19+
Spring AI artifacts are published in Spring Milestone and Snapshot repositories.
20+
Refer to the xref:getting-started.adoc#repositories[Repositories] section to add these repositories to your build system.
1621

1722
To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build system.
1823

@@ -74,32 +79,32 @@ The remaining `options` properties are based on the link:https://github.com/olla
7479
| Property | Description | Default
7580
| spring.ai.ollama.chat.options.numa | Whether to use NUMA. | false
7681
| spring.ai.ollama.chat.options.num-ctx | Sets the size of the context window used to generate the next token. | 2048
77-
| spring.ai.ollama.chat.options.num-batch | ??? | 512
82+
| spring.ai.ollama.chat.options.num-batch | Prompt processing maximum batch size. | 512
7883
| spring.ai.ollama.chat.options.num-gpu | The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamically | -1
79-
| spring.ai.ollama.chat.options.main-gpu | ??? | -
80-
| spring.ai.ollama.chat.options.low-vram | ??? | false
81-
| spring.ai.ollama.chat.options.f16-kv | ??? | true
82-
| spring.ai.ollama.chat.options.logits-all | ??? | -
83-
| spring.ai.ollama.chat.options.vocab-only | ??? | -
84-
| spring.ai.ollama.chat.options.use-mmap | ??? | true
85-
| spring.ai.ollama.chat.options.use-mlock | ??? | false
84+
| spring.ai.ollama.chat.options.main-gpu | When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. | 0
85+
| spring.ai.ollama.chat.options.low-vram | - | false
86+
| spring.ai.ollama.chat.options.f16-kv | - | true
87+
| spring.ai.ollama.chat.options.logits-all | Return logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true. | -
88+
| spring.ai.ollama.chat.options.vocab-only | Load only the vocabulary, not the weights. | -
89+
| spring.ai.ollama.chat.options.use-mmap | By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you're not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all. | null
90+
| spring.ai.ollama.chat.options.use-mlock | Lock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM. | false
8691
| spring.ai.ollama.chat.options.num-thread | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decide | 0
87-
| spring.ai.ollama.chat.options.num-keep | ??? | 0
92+
| spring.ai.ollama.chat.options.num-keep | - | 4
8893
| spring.ai.ollama.chat.options.seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. | -1
8994
| spring.ai.ollama.chat.options.num-predict | Maximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context) | -1
9095
| spring.ai.ollama.chat.options.top-k | Reduces the probability of generating nonsense. A higher value (e.g., 100) will give more diverse answers, while a lower value (e.g., 10) will be more conservative. | 40
9196
| spring.ai.ollama.chat.options.top-p | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. | 0.9
9297
| spring.ai.ollama.chat.options.tfs-z | Tail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. | 1.0
93-
| spring.ai.ollama.chat.options.typical-p | ??? | 1.0
94-
| spring.ai.ollama.chat.options.repeat-last-n | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | 64
98+
| spring.ai.ollama.chat.options.typical-p | - | 1.0
99+
| spring.ai.ollama.chat.options.repeat-last-n | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | 64
95100
| spring.ai.ollama.chat.options.temperature | The temperature of the model. Increasing the temperature will make the model answer more creatively. | 0.8
96101
| spring.ai.ollama.chat.options.repeat-penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. | 1.1
97-
| spring.ai.ollama.chat.options.presence-penalty | ??? | 0.0
98-
| spring.ai.ollama.chat.options.frequency-penalty | ??? | 0.0
102+
| spring.ai.ollama.chat.options.presence-penalty | - | 0.0
103+
| spring.ai.ollama.chat.options.frequency-penalty | - | 0.0
99104
| spring.ai.ollama.chat.options.mirostat | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) | 0
100105
| spring.ai.ollama.chat.options.mirostat-tau | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. | 5.0
101106
| spring.ai.ollama.chat.options.mirostat-eta | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. | 0.1
102-
| spring.ai.ollama.chat.options.penalize-newline | ??? | true
107+
| spring.ai.ollama.chat.options.penalize-newline | - | true
103108
| spring.ai.ollama.chat.options.stop | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile. | -
104109
| spring.ai.ollama.chat.options.functions | List of functions, identified by their names, to enable for function calling in a single prompt requests. Functions with those names must exist in the functionCallbacks registry. | -
105110
|====
@@ -120,9 +125,10 @@ For example to override the default model and temperature for a specific request
120125
ChatResponse response = chatModel.call(
121126
new Prompt(
122127
"Generate the names of 5 famous pirates.",
123-
OllamaOptions.create()
124-
.withModel("llama2")
128+
OllamaOptions.builder()
129+
.withModel(OllamaModel.LLAMA3_1)
125130
.withTemperature(0.4)
131+
.build();
126132
));
127133
----
128134

@@ -180,6 +186,14 @@ photo was taken in an area with metallic decorations or fixtures. The overall se
180186
where fruits are being displayed, possibly for convenience or aesthetic purposes.
181187
----
182188

189+
== OpenAI API Compatibility
190+
191+
Ollama is OpenAI API compatible and you can use the xref:api/chat/openai-chat.adoc[Spring AI OpenAI] client to talk to Ollama and use tools.
192+
For this you need to set the OpenAI base-url: `spring.ai.openai.chat.base-url=http://localhost:11434` and select one of the provided Ollama models: `spring.ai.openai.chat.options.model=mistral`.
193+
194+
Check the link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/chat/OllamaWithOpenAiChatModelIT.java[OllamaWithOpenAiChatModelIT.java] tests for examples of using Ollama over Spring AI OpenAI.
195+
196+
183197
== Sample Controller
184198

185199
https://start.spring.io/[Create] a new Spring Boot project and add the `spring-ai-ollama-spring-boot-starter` to your pom (or gradle) dependencies.

0 commit comments

Comments
 (0)