Improve Ollama docs

tzolov · tzolov · commit 0c4d6ab753d1 · 2024-07-27T18:47:51.000+02:00
diff --git a/models/spring-ai-ollama/src/main/java/org/springframework/ai/ollama/api/OllamaOptions.java b/models/spring-ai-ollama/src/main/java/org/springframework/ai/ollama/api/OllamaOptions.java
@@ -149,7 +149,7 @@ public class OllamaOptions implements FunctionCallingOptions, ChatOptions, Embed
 	/**
 	 * Sets the random number seed to use for generation. Setting this to a
 	 * specific number will make the model generate the same text for the same prompt.
-	 * (Default: 0)
+	 * (Default: -1)
 	 */
 	@JsonProperty("seed") private Integer seed;
 
@@ -268,8 +268,8 @@ public class OllamaOptions implements FunctionCallingOptions, ChatOptions, Embed
 	 */
 	@JsonProperty("keep_alive") private String keepAlive;
 
-		/**
-	 * OpenAI Tool Function Callbacks to register with the ChatModel.
+	/**
+	 * Tool Function Callbacks to register with the ChatModel.
 	 * For Prompt Options the functionCallbacks are automatically enabled for the duration of the prompt execution.
 	 * For Default Options the functionCallbacks are registered but disabled by default. Use the enableFunctions to set the functions
 	 * from the registry to be used by the ChatModel chat completion requests.
@@ -307,6 +307,11 @@ public OllamaOptions withModel(String model) {
 		return this;
 	}
 
+	public OllamaOptions withModel(OllamaModel model) {
+		this.model = model.getName();
+		return this;
+	}
+
 	public String getModel() {
 		return model;
 	}
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/images/ollama-function-calling-flow.jpg b/spring-ai-docs/src/main/antora/modules/ROOT/images/ollama-function-calling-flow.jpg
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/functions/ollama-chat-functions.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/functions/ollama-chat-functions.adoc
@@ -2,7 +2,7 @@
 
 TIP: You need Ollama 0.2.8 or newer.
 
-TIP: You need https://ollama.com/library[Models] pre-trained for Tools support. 
+TIP: You need https://ollama.com/search?c=tools[Models] pre-trained for Tools support. 
 Usually, such models are tagged with a `Tools` tag.
 For example `mistral`, `firefunction-v2` or `llama3.1:70b`.
 
@@ -121,9 +121,6 @@ public record Request(String location, Unit unit) {}
 
 It is a best practice to annotate the request object with information such that the generated JSON schema of that function is as descriptive as possible to help the AI model pick the correct function to invoke.
 
-The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/openai/tool/FunctionCallbackWithPlainFunctionBeanIT.java[FunctionCallbackWithPlainFunctionBeanIT.java] demonstrates this approach.
-
-
 ==== FunctionCallback Wrapper
 
 Another way to register a function is to create a `FunctionCallbackWrapper` wrapper like this:
@@ -179,7 +176,7 @@ Here is the current weather for the requested cities:
 - Paris, France: 15.0°C
 ----
 
-The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/openai/tool/FunctionCallbackWrapperIT.java[FunctionCallbackWrapperIT.java] test demo this approach.
+The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackWrapperIT.java[FunctionCallbackWrapperIT.java] test demo this approach.
 
 
 === Register/Call Functions with Prompt Options
@@ -206,7 +203,7 @@ NOTE: The in-prompt registered functions are enabled by default for the duration
 
 This approach allows to dynamically chose different functions to be called based on the user input.
 
-The https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackInPromptIT.java[FunctionCallbackInPromptIT.java] integration test provides a complete example of how to register a function with the `OllamaChatModel` and use it in a prompt request.
+The link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/ollama/tool/FunctionCallbackInPromptIT.java[FunctionCallbackInPromptIT.java] integration test provides a complete example of how to register a function with the `OllamaChatModel` and use it in a prompt request.
 
 == Appendices:
 
@@ -222,4 +219,4 @@ The following diagram illustrates the flow of the Ollama API:
 
 image:ollama-function-calling-flow.jpg[title="Ollama API Function Calling Flow", width=800]
 
-The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/chat/api/tool/OpenAiApiToolFunctionCallIT.java[OllamaApiToolFunctionCallIT.java] provides a complete example on how to use the Ollama API function calling.
+The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/api/tool/OllamaApiToolFunctionCallIT.java[OllamaApiToolFunctionCallIT.java] provides a complete example on how to use the Ollama API function calling.
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/ollama-chat.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/ollama-chat.adoc
@@ -3,6 +3,10 @@
 With https://ollama.ai/[Ollama] you can run various Large Language Models (LLMs) locally and generate text from them.
 Spring AI supports the Ollama text generation with `OllamaChatModel`.
 
+
+TIP: Ollama offers an OpenAI API compatible endpoint as well. 
+Check the xref:_openai_api_compatibility[OpenAI API compatibility] section to learn how to use the xref:api/chat/openai-chat.adoc[Spring AI OpenAI] to talk to an Ollama server.
+
 == Prerequisites
 
 You first need to run Ollama on your local machine.
@@ -12,7 +16,8 @@ NOTE: installing `ollama run llama3` will download a 4.7GB model artifact.
 
 === Add Repositories and BOM
 
-Spring AI artifacts are published in Spring Milestone and Snapshot repositories.   Refer to the xref:getting-started.adoc#repositories[Repositories] section to add these repositories to your build system.
+Spring AI artifacts are published in Spring Milestone and Snapshot repositories. 
+Refer to the xref:getting-started.adoc#repositories[Repositories] section to add these repositories to your build system.
 
 To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build system.
 
@@ -74,32 +79,32 @@ The remaining `options` properties are based on the link:https://github.com/olla
 | Property | Description | Default
 | spring.ai.ollama.chat.options.numa              | Whether to use NUMA.                                           | false
 | spring.ai.ollama.chat.options.num-ctx           | Sets the size of the context window used to generate the next token. | 2048
-| spring.ai.ollama.chat.options.num-batch         | ???                                                             | 512
+| spring.ai.ollama.chat.options.num-batch         | Prompt processing maximum batch size. | 512
 | spring.ai.ollama.chat.options.num-gpu           | The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamically | -1
-| spring.ai.ollama.chat.options.main-gpu          | ???                                                             | -
-| spring.ai.ollama.chat.options.low-vram          | ???                                                             | false
-| spring.ai.ollama.chat.options.f16-kv            | ???                                                             | true
-| spring.ai.ollama.chat.options.logits-all        | ???                                                             | -
-| spring.ai.ollama.chat.options.vocab-only        | ???                                                             | -
-| spring.ai.ollama.chat.options.use-mmap          | ???                                                             | true
-| spring.ai.ollama.chat.options.use-mlock         | ???                                                             | false
+| spring.ai.ollama.chat.options.main-gpu          | When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. | 0
+| spring.ai.ollama.chat.options.low-vram          | -                                                             | false
+| spring.ai.ollama.chat.options.f16-kv            | -                                                             | true
+| spring.ai.ollama.chat.options.logits-all        | Return logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true. | -
+| spring.ai.ollama.chat.options.vocab-only        | Load only the vocabulary, not the weights. | -
+| spring.ai.ollama.chat.options.use-mmap          | By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you're not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all. | null
+| spring.ai.ollama.chat.options.use-mlock         | Lock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM. | false
 | spring.ai.ollama.chat.options.num-thread        | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decide | 0
-| spring.ai.ollama.chat.options.num-keep          | ???                                                             | 0
+| spring.ai.ollama.chat.options.num-keep          | -                                                             | 4
 | spring.ai.ollama.chat.options.seed              | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt.  | -1
 | spring.ai.ollama.chat.options.num-predict       | Maximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context) | -1
 | spring.ai.ollama.chat.options.top-k             | Reduces the probability of generating nonsense. A higher value (e.g., 100) will give more diverse answers, while a lower value (e.g., 10) will be more conservative.  | 40
 | spring.ai.ollama.chat.options.top-p             | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text.  | 0.9
 | spring.ai.ollama.chat.options.tfs-z             | Tail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. | 1.0
-| spring.ai.ollama.chat.options.typical-p         | ???                                                             | 1.0
-| spring.ai.ollama.chat.options.repeat-last-n      | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | 64
+| spring.ai.ollama.chat.options.typical-p         | -                                                             | 1.0
+| spring.ai.ollama.chat.options.repeat-last-n     | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | 64
 | spring.ai.ollama.chat.options.temperature       | The temperature of the model. Increasing the temperature will make the model answer more creatively. | 0.8
 | spring.ai.ollama.chat.options.repeat-penalty    | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. | 1.1
-| spring.ai.ollama.chat.options.presence-penalty  | ???                                                             | 0.0
-| spring.ai.ollama.chat.options.frequency-penalty | ???                                                             | 0.0
+| spring.ai.ollama.chat.options.presence-penalty  | -                                                             | 0.0
+| spring.ai.ollama.chat.options.frequency-penalty | -                                                             | 0.0
 | spring.ai.ollama.chat.options.mirostat          | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) | 0
 | spring.ai.ollama.chat.options.mirostat-tau      | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. | 5.0
 | spring.ai.ollama.chat.options.mirostat-eta      | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. | 0.1
-| spring.ai.ollama.chat.options.penalize-newline  | ???                                                             | true
+| spring.ai.ollama.chat.options.penalize-newline  | -                                                             | true
 | spring.ai.ollama.chat.options.stop              | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile. | -
 | spring.ai.ollama.chat.options.functions         | List of functions, identified by their names, to enable for function calling in a single prompt requests. Functions with those names must exist in the functionCallbacks registry. | -
 |====
@@ -120,9 +125,10 @@ For example to override the default model and temperature for a specific request
 ChatResponse response = chatModel.call(
     new Prompt(
         "Generate the names of 5 famous pirates.",
-        OllamaOptions.create()
-            .withModel("llama2")
+        OllamaOptions.builder()
+            .withModel(OllamaModel.LLAMA3_1)
             .withTemperature(0.4)
+            .build();
     ));
 ----
 
@@ -180,6 +186,14 @@ photo was taken in an area with metallic decorations or fixtures. The overall se
 where fruits are being displayed, possibly for convenience or aesthetic purposes.
 ----
 
+== OpenAI API Compatibility
+
+Ollama is OpenAI API compatible and you can use the xref:api/chat/openai-chat.adoc[Spring AI OpenAI] client to talk to Ollama and use tools. 
+For this you need to set the OpenAI base-url: `spring.ai.openai.chat.base-url=http://localhost:11434` and select one of the provided Ollama models: `spring.ai.openai.chat.options.model=mistral`.
+
+Check the link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/chat/OllamaWithOpenAiChatModelIT.java[OllamaWithOpenAiChatModelIT.java] tests for examples of using Ollama over Spring AI OpenAI.
+
+
 == Sample Controller
 
 https://start.spring.io/[Create] a new Spring Boot project and add the `spring-ai-ollama-spring-boot-starter` to your pom (or gradle) dependencies.