Skip to content

Commit 1f9fd06

Browse files
committed
Streamed response guide
1 parent 42d19f5 commit 1f9fd06

File tree

2 files changed

+166
-1
lines changed

2 files changed

+166
-1
lines changed

docs/modules/ROOT/nav.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,12 @@
1919
* xref:guide-web-search.adoc[Using Tavily Web Search]
2020
* xref:guide-passing-image.adoc[Passing Images to Models]
2121
* xref:guide-generating-image.adoc[Generating Images]
22+
* xref:guide-streamed-responses.adoc[Using streamed responses]
2223
* xref:guide-semantic-compression.adoc[Compressing Chat History]
2324
// * xref:guide-agentic-patterns.adoc[Implementing Agentic patterns]
2425
// * xref:guide-structured-output.adoc[Returning structured data from a model]
25-
// * xref:guide-streamed-responses.adoc[Using function calling]
26+
27+
2628
// * xref:guide-log.adoc[Logging Model Interactions]
2729
// * xref:guide-token.adoc[Tracking token usages]
2830

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
= Using Streamed Responses with Quarkus LangChain4j
2+
3+
include::./includes/attributes.adoc[]
4+
include::./includes/customization.adoc[]
5+
6+
Streamed responses allow large language models to return partial answers as they are generated.
7+
It significantly improves latency and responsiveness for end users.
8+
With Quarkus LangChain4j, you can integrate streaming via REST (SSE) or WebSockets, leveraging `Multi<String>` for reactive, non-blocking processing.
9+
10+
This guide shows how to:
11+
12+
* Define AI services that return streamed responses
13+
* Implement both SSE and WebSocket endpoints
14+
* Test your application using `curl` and `wscat`
15+
16+
== Why Use Streamed Responses?
17+
18+
Traditional AI services generate the entire response before returning it, which can lead to:
19+
20+
* Perceived latency (long pause before the first word appears)
21+
* Higher memory usage (especially for long completions)
22+
23+
Streaming addresses this by sending tokens as they are produced. Benefits include:
24+
25+
* Better user experience (progressive rendering)
26+
* Reduced memory pressure on both server and client
27+
* Easier integration with frontend frameworks (chat bots, dashboards)
28+
29+
== Project Setup
30+
31+
Add the following dependencies in your `pom.xml`:
32+
33+
[source,xml,subs=attributes+]
34+
----
35+
<!-- Or any other model provider that supports streaming, such as OpenAI -->
36+
<dependency>
37+
<groupId>io.quarkiverse.langchain4j</groupId>
38+
<artifactId>quarkus-langchain4j-ollama</artifactId>
39+
<version>{project-version}</version>
40+
</dependency>
41+
<dependency>
42+
<groupId>io.quarkus</groupId>
43+
<artifactId>quarkus-rest-jackson</artifactId>
44+
</dependency>
45+
<dependency>
46+
<groupId>io.quarkus</groupId>
47+
<artifactId>quarkus-websockets-next</artifactId>
48+
</dependency>
49+
----
50+
51+
IF you are using Ollama, configure your model in `application.properties`:
52+
53+
[source,properties]
54+
----
55+
quarkus.langchain4j.ollama.chat-model.model-name=qwen3:1.7b
56+
----
57+
58+
== Streamed Responses in AI Services
59+
60+
To enable streaming, your AI service method must return a `Multi<String>`.
61+
Each emitted item represents a token or part of the final response.
62+
63+
[source,java]
64+
----
65+
@RegisterAiService
66+
@SystemMessage("You are a helpful AI assistant. Be concise and to the point.")
67+
public interface StreamedAssistant {
68+
69+
@UserMessage("Answer the question: {question}")
70+
Multi<String> respondToQuestion(String question);
71+
72+
}
73+
----
74+
75+
Quarkus uses https://smallrye.io/smallrye-mutiny/latest/[Mutiny] under the hood.
76+
In Quarkus, methods returning Multi are considered non-blocking.
77+
Do not use blocking code inside streaming pipelines. For details, refer to the https://quarkus.io/guides/quarkus-reactive-architecture[Quarkus Reactive Architecture].
78+
79+
== Streaming with Server-Sent Events (SSE)
80+
81+
SSE is a simple way to stream text over HTTP. Let’s expose an endpoint returning `Multi<String>` as an event stream:
82+
83+
[source,java]
84+
----
85+
@Path("/stream")
86+
public class Endpoint {
87+
88+
@Inject StreamedAssistant assistant;
89+
90+
@POST
91+
@Produces(MediaType.SERVER_SENT_EVENTS)
92+
public Multi<String> stream(String question) {
93+
return assistant.respondToQuestion(question);
94+
}
95+
}
96+
----
97+
98+
Run the application (`mvn quarkus:dev`), then use `curl`:
99+
100+
[source,bash]
101+
----
102+
curl -N -X POST http://localhost:8080/stream -d "Why is the sky blue?" \
103+
-H "Content-Type: text/plain"
104+
----
105+
106+
The `-N` option disables buffering so you see the stream as it arrives.
107+
You’ll receive a stream of tokens, each appearing as a new line in the terminal.
108+
109+
== Streaming with WebSockets
110+
111+
For more interactive use cases (chat UIs, dashboards), you can expose a WebSocket endpoint using Quarkus WebSockets.Next.
112+
113+
[source,java]
114+
----
115+
@WebSocket(path = "/ws/stream")
116+
public class WebSocketEndpoint {
117+
118+
@Inject WSStreamedAssistant assistant;
119+
120+
@OnTextMessage
121+
public Multi<String> onTextMessage(String question) {
122+
return assistant.respondToQuestion(question);
123+
}
124+
}
125+
----
126+
127+
To manage state across messages (local message history), annotate the AI service with `@SessionScoped`:
128+
129+
[source,java]
130+
----
131+
@RegisterAiService
132+
@SystemMessage("You are a helpful AI assistant. Be concise and to the point.")
133+
@SessionScoped
134+
public interface WSStreamedAssistant {
135+
136+
@UserMessage("Answer the question: {question}")
137+
Multi<String> respondToQuestion(String question);
138+
}
139+
----
140+
141+
Install a WebSocket client like `wscat`:
142+
143+
[source,bash]
144+
----
145+
npm install -g wscat
146+
----
147+
148+
Connect and send a message:
149+
150+
[source,bash]
151+
----
152+
wscat -c ws://localhost:8080/ws/stream
153+
> Why is swimming pool water blue?
154+
----
155+
156+
You’ll see a token stream printed as separate lines in real-time.
157+
158+
== Summary
159+
160+
* Use `Multi<String>` in your AI services to enable streaming
161+
* Streaming improves user experience and scalability
162+
* SSE offers a simple HTTP-based solution
163+
* WebSockets provide a more interactive and stateful option

0 commit comments

Comments
 (0)