You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/functions/ollama-chat-functions.adoc
+4-7Lines changed: 4 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
TIP: You need Ollama 0.2.8 or newer.
4
4
5
-
TIP: You need https://ollama.com/library[Models] pre-trained for Tools support.
5
+
TIP: You need https://ollama.com/search?c=tools[Models] pre-trained for Tools support.
6
6
Usually, such models are tagged with a `Tools` tag.
7
7
For example `mistral`, `firefunction-v2` or `llama3.1:70b`.
8
8
@@ -121,9 +121,6 @@ public record Request(String location, Unit unit) {}
121
121
122
122
It is a best practice to annotate the request object with information such that the generated JSON schema of that function is as descriptive as possible to help the AI model pick the correct function to invoke.
123
123
124
-
The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/openai/tool/FunctionCallbackWithPlainFunctionBeanIT.java[FunctionCallbackWithPlainFunctionBeanIT.java] demonstrates this approach.
125
-
126
-
127
124
==== FunctionCallback Wrapper
128
125
129
126
Another way to register a function is to create a `FunctionCallbackWrapper` wrapper like this:
@@ -179,7 +176,7 @@ Here is the current weather for the requested cities:
179
176
- Paris, France: 15.0°C
180
177
----
181
178
182
-
The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/openai/tool/FunctionCallbackWrapperIT.java[FunctionCallbackWrapperIT.java] test demo this approach.
179
+
The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackWrapperIT.java[FunctionCallbackWrapperIT.java] test demo this approach.
183
180
184
181
185
182
=== Register/Call Functions with Prompt Options
@@ -206,7 +203,7 @@ NOTE: The in-prompt registered functions are enabled by default for the duration
206
203
207
204
This approach allows to dynamically chose different functions to be called based on the user input.
208
205
209
-
The https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackInPromptIT.java[FunctionCallbackInPromptIT.java] integration test provides a complete example of how to register a function with the `OllamaChatModel` and use it in a prompt request.
206
+
The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackInPromptIT.java[FunctionCallbackInPromptIT.java] integration test provides a complete example of how to register a function with the `OllamaChatModel` and use it in a prompt request.
210
207
211
208
== Appendices:
212
209
@@ -222,4 +219,4 @@ The following diagram illustrates the flow of the Ollama API:
222
219
223
220
image:ollama-function-calling-flow.jpg[title="Ollama API Function Calling Flow", width=800]
224
221
225
-
The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/chat/api/tool/OpenAiApiToolFunctionCallIT.java[OllamaApiToolFunctionCallIT.java] provides a complete example on how to use the Ollama API function calling.
222
+
The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/api/tool/OllamaApiToolFunctionCallIT.java[OllamaApiToolFunctionCallIT.java] provides a complete example on how to use the Ollama API function calling.
Copy file name to clipboardExpand all lines: spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/ollama-chat.adoc
+31-17Lines changed: 31 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,10 @@
3
3
With https://ollama.ai/[Ollama] you can run various Large Language Models (LLMs) locally and generate text from them.
4
4
Spring AI supports the Ollama text generation with `OllamaChatModel`.
5
5
6
+
7
+
TIP: Ollama offers an OpenAI API compatible endpoint as well.
8
+
Check the xref:_openai_api_compatibility[OpenAI API compatibility] section to learn how to use the xref:api/chat/openai-chat.adoc[Spring AI OpenAI] to talk to an Ollama server.
9
+
6
10
== Prerequisites
7
11
8
12
You first need to run Ollama on your local machine.
@@ -12,7 +16,8 @@ NOTE: installing `ollama run llama3` will download a 4.7GB model artifact.
12
16
13
17
=== Add Repositories and BOM
14
18
15
-
Spring AI artifacts are published in Spring Milestone and Snapshot repositories. Refer to the xref:getting-started.adoc#repositories[Repositories] section to add these repositories to your build system.
19
+
Spring AI artifacts are published in Spring Milestone and Snapshot repositories.
20
+
Refer to the xref:getting-started.adoc#repositories[Repositories] section to add these repositories to your build system.
16
21
17
22
To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build system.
18
23
@@ -74,32 +79,32 @@ The remaining `options` properties are based on the link:https://github.com/olla
74
79
| Property | Description | Default
75
80
| spring.ai.ollama.chat.options.numa | Whether to use NUMA. | false
76
81
| spring.ai.ollama.chat.options.num-ctx | Sets the size of the context window used to generate the next token. | 2048
| spring.ai.ollama.chat.options.num-batch | Prompt processing maximum batch size. | 512
78
83
| spring.ai.ollama.chat.options.num-gpu | The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamically | -1
| spring.ai.ollama.chat.options.main-gpu | When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. | 0
| spring.ai.ollama.chat.options.logits-all | Return logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true. | -
88
+
| spring.ai.ollama.chat.options.vocab-only | Load only the vocabulary, not the weights. | -
89
+
| spring.ai.ollama.chat.options.use-mmap | By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you're not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all. | null
90
+
| spring.ai.ollama.chat.options.use-mlock | Lock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM. | false
86
91
| spring.ai.ollama.chat.options.num-thread | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decide | 0
| spring.ai.ollama.chat.options.seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. | -1
89
94
| spring.ai.ollama.chat.options.num-predict | Maximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context) | -1
90
95
| spring.ai.ollama.chat.options.top-k | Reduces the probability of generating nonsense. A higher value (e.g., 100) will give more diverse answers, while a lower value (e.g., 10) will be more conservative. | 40
91
96
| spring.ai.ollama.chat.options.top-p | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. | 0.9
92
97
| spring.ai.ollama.chat.options.tfs-z | Tail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. | 1.0
| spring.ai.ollama.chat.options.repeat-last-n | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | 64
| spring.ai.ollama.chat.options.repeat-last-n | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | 64
95
100
| spring.ai.ollama.chat.options.temperature | The temperature of the model. Increasing the temperature will make the model answer more creatively. | 0.8
96
101
| spring.ai.ollama.chat.options.repeat-penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. | 1.1
| spring.ai.ollama.chat.options.mirostat-tau | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. | 5.0
101
106
| spring.ai.ollama.chat.options.mirostat-eta | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. | 0.1
| spring.ai.ollama.chat.options.stop | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile. | -
104
109
| spring.ai.ollama.chat.options.functions | List of functions, identified by their names, to enable for function calling in a single prompt requests. Functions with those names must exist in the functionCallbacks registry. | -
105
110
|====
@@ -120,9 +125,10 @@ For example to override the default model and temperature for a specific request
120
125
ChatResponse response = chatModel.call(
121
126
new Prompt(
122
127
"Generate the names of 5 famous pirates.",
123
-
OllamaOptions.create()
124
-
.withModel("llama2")
128
+
OllamaOptions.builder()
129
+
.withModel(OllamaModel.LLAMA3_1)
125
130
.withTemperature(0.4)
131
+
.build();
126
132
));
127
133
----
128
134
@@ -180,6 +186,14 @@ photo was taken in an area with metallic decorations or fixtures. The overall se
180
186
where fruits are being displayed, possibly for convenience or aesthetic purposes.
181
187
----
182
188
189
+
== OpenAI API Compatibility
190
+
191
+
Ollama is OpenAI API compatible and you can use the xref:api/chat/openai-chat.adoc[Spring AI OpenAI] client to talk to Ollama and use tools.
192
+
For this you need to set the OpenAI base-url: `spring.ai.openai.chat.base-url=http://localhost:11434` and select one of the provided Ollama models: `spring.ai.openai.chat.options.model=mistral`.
193
+
194
+
Check the link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/chat/OllamaWithOpenAiChatModelIT.java[OllamaWithOpenAiChatModelIT.java] tests for examples of using Ollama over Spring AI OpenAI.
195
+
196
+
183
197
== Sample Controller
184
198
185
199
https://start.spring.io/[Create] a new Spring Boot project and add the `spring-ai-ollama-spring-boot-starter` to your pom (or gradle) dependencies.
0 commit comments