Skip to content

Commit 9074aff

Browse files
authored
Merge pull request #1646 from cescoffier/ollama-guide
Ollama guide
2 parents 42d19f5 + 6441985 commit 9074aff

File tree

2 files changed

+232
-2
lines changed

2 files changed

+232
-2
lines changed

docs/modules/ROOT/nav.adoc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
* xref:guide-few-shots.adoc[Using few-shots in prompts]
1515
* xref:guide-prompt-engineering.adoc[Prompt engineering patterns]
16+
* xref:guide-ollama.adoc[Using Ollama models]
1617
// * xref:guide-ai-services-patterns.adoc[AI Services patterns]
1718
* xref:guide-fault-tolerance.adoc[Fault Tolerance]
1819
* xref:guide-csv.adoc[Index CSVs in a RAG pipeline]
@@ -26,10 +27,8 @@
2627
// * xref:guide-log.adoc[Logging Model Interactions]
2728
// * xref:guide-token.adoc[Tracking token usages]
2829
29-
// * xref:guide-local-models.adoc[Using local models]
3030
// * xref:guide-in-process-models.adoc[Using in-process models]
3131

32-
// * xref:guide-generating-images.adoc[Generating Images from Prompts]
3332
// Add evaluation and guardrails and testing guides
3433
// Give knowledge to AI models
3534

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
= Using Ollama with Quarkus LangChain4j
2+
3+
include::./includes/attributes.adoc[]
4+
include::./includes/customization.adoc[]
5+
6+
This guide shows how to use local Ollama models with the Quarkus LangChain4j extension.
7+
You'll learn how to:
8+
9+
* Set up the environment and dependencies
10+
* Use an Ollama-powered chat model
11+
* Use function calling (tool execution)
12+
* Use an Ollama embedding model
13+
14+
== 1. Setup
15+
16+
=== Install Ollama
17+
18+
First, install Ollama from https://ollama.com. It lets you run LLMs locally with minimal setup.
19+
20+
To verify installation:
21+
22+
[source, bash]
23+
----
24+
ollama run llama3
25+
----
26+
27+
You can pull other models using:
28+
29+
[source, bash]
30+
----
31+
ollama pull qwen3:1.7b
32+
ollama pull snowflake-arctic-embed:latest
33+
----
34+
35+
NOTE: Some models may require more RAM or GPU acceleration. Check the Ollama model card for details.
36+
37+
38+
TIP: In dev mode, Quarkus will automatically starts the Ollama server if it is not already running. This allows you to test your application without needing to manually start the Ollama server. It will also automatically pull the models you use in your application if they are not already available locally. This can take some time, so pre-pulling is recommended for faster startup.
39+
40+
=== Add Maven Dependencies
41+
42+
Add the following dependencies to your `pom.xml`:
43+
44+
[source, xml, subs=attributes+]
45+
----
46+
<dependency>
47+
<groupId>io.quarkiverse.langchain4j</groupId>
48+
<artifactId>quarkus-langchain4j-ollama</artifactId>
49+
<version>{project-version}</version>
50+
</dependency>
51+
<dependency>
52+
<groupId>io.quarkus</groupId>
53+
<artifactId>quarkus-rest-jackson</artifactId>
54+
</dependency>
55+
----
56+
57+
The `quarkus-langchain4j-ollama` extension provides the necessary integration with Ollama models.
58+
The `quarkus-rest-jackson` dependency is needed for REST endpoints (for demo purpose).
59+
60+
=== Configure the Application
61+
62+
In your `application.properties`, configure the chat and embedding models:
63+
64+
[source, properties]
65+
----
66+
# Chat model
67+
quarkus.langchain4j.ollama.chat-model.model-name=qwen3:1.7b <1>
68+
quarkus.langchain4j.ollama.chat-model.temperature=0 <2>
69+
quarkus.langchain4j.timeout=60s <3>
70+
71+
# Embedding model
72+
quarkus.langchain4j.ollama.embedding-model.model-name=snowflake-arctic-embed:latest <4>
73+
----
74+
1. Specify the Ollama chat model to use (e.g., `qwen3:1.7b`).
75+
2. Set the temperature to 0 for deterministic outputs (especially useful for function calling).
76+
3. Local inference can take time, so set a reasonable timeout (e.g., 60 seconds).
77+
4. Specify the Ollama embedding model (e.g., `snowflake-arctic-embed:latest`).
78+
79+
== 2. Using the Ollama Chat Model
80+
81+
To interact with an Ollama chat model, define an AI service interface:
82+
83+
[source, java]
84+
----
85+
@RegisterAiService
86+
public interface Assistant {
87+
88+
@UserMessage("Say 'hello world' using a 4 line poem.")
89+
String greeting();
90+
}
91+
----
92+
93+
Quarkus will automatically generate the implementation using the configured Ollama model.
94+
95+
You can expose this through a simple REST endpoint:
96+
97+
[source, java]
98+
----
99+
@Path("/hello")
100+
public class GreetingResource {
101+
102+
@Inject
103+
Assistant assistant;
104+
105+
@GET
106+
public String hello() {
107+
return assistant.greeting();
108+
}
109+
}
110+
----
111+
112+
Visit http://localhost:8080/hello to see the model generate a 4-line “hello world” poem:
113+
114+
[source, text]
115+
----
116+
In the quiet dawn, a whisper breaks the silence,
117+
Hello, world, where dreams take flight and light.
118+
The sun ascends, a golden, warm embrace,
119+
A greeting to the earth, a heart's soft grace.
120+
----
121+
122+
== 3. Using Function Calling
123+
124+
Ollama also provides reasoning model (like `qwen3:1.7b`) that supports function calling, allowing the model to invoke external tools or business logic.
125+
126+
Here, we declare a tool method that logs a message:
127+
128+
[source, java]
129+
----
130+
@ApplicationScoped
131+
public class SenderService {
132+
133+
@Tool
134+
public void sendMessage(String message) {
135+
Log.infof("Sending message: %s", message);
136+
}
137+
}
138+
----
139+
140+
Then we declare an AI service that uses this tool:
141+
142+
[source, java]
143+
----
144+
@RegisterAiService
145+
public interface Assistant {
146+
147+
@UserMessage("Say 'hello world' using a 4 line poem and send it using the SenderService.")
148+
@ToolBox(SenderService.class)
149+
String greetingAndSend();
150+
}
151+
----
152+
153+
The assistant will:
154+
155+
1. Generate a poem
156+
2. Call the `sendMessage(...)` tool with the poem
157+
158+
You can test this via:
159+
160+
[source, java]
161+
----
162+
@GET
163+
@Path("/function-calling")
164+
public String helloWithFunctionCalling() {
165+
return assistant.greetingAndSend();
166+
}
167+
----
168+
169+
Visit http://localhost:8080/hello/function-calling to trigger the tool.
170+
If you check the logs, you should see:
171+
172+
[source, text]
173+
----
174+
.... INFO [org.acm.SenderService] (executor-thread-1) Sending message: Hello, world!
175+
A simple message.
176+
In this, we go.
177+
Peace and joy.
178+
----
179+
180+
NOTE: Lowering the temperature helps ensure the model uses the tool consistently.
181+
182+
183+
== 4. Using the Ollama Embedding Model
184+
185+
You can also use Ollama to generate text embeddings for vector-based tasks.
186+
This is useful for Retrieval-Augmented Generation (RAG) or semantic search.
187+
188+
Inject the `EmbeddingModel`:
189+
190+
[source, java]
191+
----
192+
@Inject
193+
EmbeddingModel embeddingModel;
194+
----
195+
196+
Then use it like this:
197+
198+
[source, java]
199+
----
200+
@POST
201+
@Path("/embed")
202+
public List<Float> embed(String text) {
203+
return embeddingModel.embed(text).content().vectorAsList();
204+
}
205+
----
206+
207+
Send a POST request with plain text to `/hello/embed`, and you’ll get a float vector representing the input:
208+
209+
[source, shell]
210+
----
211+
curl -X POST http://localhost:8080/hello/embed \
212+
-H "Content-Type: text/plain" \
213+
--data-binary @- <<EOF
214+
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
215+
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
216+
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
217+
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
218+
EOF
219+
----
220+
221+
You will receive a list of floats representing the embedding.
222+
223+
== 5. Conclusion
224+
225+
Ollama enables local inference with a wide variety of LLMs, and Quarkus LangChain4j makes it easy to integrate them into Java applications.
226+
227+
Next steps:
228+
229+
* Try other Ollama models (e.g. `llama3`, `mistral`)
230+
* Switch the xref:quickstart-rag.adoc[RAG quickstart] to use Ollama-served models (both chat and embedding)
231+
* Implement more xref:rag.adoc[complex RAG workflows] using Ollama models

0 commit comments

Comments
 (0)