add Qwen tts model support. #541

flystar32 · 2026-01-13T08:51:17Z

AgentScope-Java Version

1.0.7-SNAPSHOT

Description

Add Qwen TTS model support.

AgentScope provides three ways to use TTS:

TTSHook - Auto-speak all Agent responses (non-invasive, speak while generating)
TTSModel - Standalone speech synthesis (independent of Agent, flexible calling)
DashScopeMultiModalTool - Agent invokes TTS as tool actively (Agent converts text to speech when needed)

Checklist

Please check the following items before code is ready to be reviewed.

Code has been formatted with mvn spotless:apply
All tests are passing (mvn test)
Javadoc comments are complete and follow project conventions
Related documentation has been updated (e.g. links, examples, etc.)
Code is ready for review

Change-Id: Ie83648211c2578fa814b351f7a106be0f4387d85

Change-Id: I1bbfffc706544db8abc8188d613f2853efde4427

Copilot

Pull request overview

This PR adds comprehensive Qwen TTS (Text-to-Speech) model support to AgentScope Java, enabling agents to "speak" their responses in real-time. The implementation supports the qwen3-tts-flash and qwen-tts models alongside existing Sambert models.

Changes:

Adds core TTS infrastructure (TTSModel interface, TTSOptions, TTSResponse, TTSException)
Implements DashScopeTTSModel for non-streaming TTS and DashScopeRealtimeTTSModel for streaming synthesis
Provides TTSHook for automatic agent speech synthesis and AudioPlayer for local audio playback
Updates DashScopeMultiModalTool to support Qwen TTS models alongside existing Sambert models
Includes comprehensive documentation in both English and Chinese with usage examples
Adds example applications (CLI and web-based) demonstrating all three TTS usage patterns

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
docs/zh/task/tts.md	Chinese documentation for TTS functionality
docs/en/task/tts.md	English documentation for TTS functionality
TTSExample.java	Quickstart example demonstrating three TTS usage patterns
ReActAgentWithTTSDemo.java	Interactive CLI demo with real-time TTS
ChatTTSSpringBootApplication.java	Spring Boot web application entry point
ChatController.java	REST controller for TTS-enabled chat with SSE streaming
index.html	Frontend UI for real-time chat with audio playback
TTSModel.java	Base interface for all TTS model implementations
TTSOptions.java	Configuration options for TTS synthesis
TTSResponse.java	Response object encapsulating TTS synthesis results
TTSException.java	Exception class for TTS operation failures
DashScopeTTSModel.java	Non-streaming TTS model implementation using DashScope API
DashScopeRealtimeTTSModel.java	Streaming TTS model with incremental input support
AudioPlayer.java	Local audio playback using Java Sound API
TTSHook.java	Hook for real-time TTS during agent execution
DashScopeMultiModalTool.java	Extended to support Qwen TTS models via multimodal API
DashScopeTTSModelTest.java	Unit tests for DashScopeTTSModel
Test files	Updated test cases for DashScopeMultiModalTool
POM files	Added dependencies and module configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-13T08:58:03Z

agentscope-core/src/main/java/io/agentscope/core/tool/multimodal/DashScopeMultiModalTool.java

+    /**
+     * Synthesize audio using Qwen TTS models via multimodal-generation API.
+     */
+    private Mono<ToolResultBlock> synthesizeWithQwenTTS(
+            String text, String model, String voice, String language) {
+        String finalVoice =
+                Optional.ofNullable(voice).filter(s -> !s.trim().isEmpty()).orElse("Cherry");
+        String finalLanguage =
+                Optional.ofNullable(language).filter(s -> !s.trim().isEmpty()).orElse("Chinese");
+
+        return Mono.fromCallable(
+                        () -> {
+                            // Build request for Qwen TTS API
+                            Map<String, Object> input = new java.util.HashMap<>();
+                            input.put("text", text);
+                            input.put("voice", finalVoice);
+                            input.put("language_type", finalLanguage);
+
+                            Map<String, Object> request = new java.util.HashMap<>();
+                            request.put("model", model);
+                            request.put("input", input);
+
+                            String requestBody =
+                                    io.agentscope.core.util.JsonUtils.getJsonCodec()
+                                            .toJson(request);
+
+                            // Call DashScope API using Java HttpClient
+                            java.net.http.HttpClient client =
+                                    java.net.http.HttpClient.newHttpClient();
+                            java.net.http.HttpRequest httpRequest =
+                                    java.net.http.HttpRequest.newBuilder()
+                                            .uri(
+                                                    URI.create(
+                                                            "https://dashscope.aliyuncs.com/api/v1/services"
+                                                                + "/aigc/multimodal-generation/generation"))
+                                            .header("Authorization", "Bearer " + this.apiKey)
+                                            .header("Content-Type", "application/json")
+                                            .header("User-Agent", Version.getUserAgent())
+                                            .POST(
+                                                    java.net.http.HttpRequest.BodyPublishers
+                                                            .ofString(requestBody))
+                                            .build();
+
+                            java.net.http.HttpResponse<String> response =
+                                    client.send(
+                                            httpRequest,
+                                            java.net.http.HttpResponse.BodyHandlers.ofString());
+
+                            if (response.statusCode() != 200) {
+                                log.error(
+                                        "Qwen TTS API failed: status={}, body={}",
+                                        response.statusCode(),
+                                        response.body());
+                                return ToolResultBlock.error(
+                                        "TTS API failed: " + response.statusCode());
+                            }
+
+                            return parseQwenTTSResponse(response.body());
+                        })
+                .onErrorResume(
+                        e -> {
+                            log.error(
+                                    "Failed to generate audio with Qwen TTS: '{}'",
+                                    e.getMessage(),
+                                    e);
+                            return Mono.just(ToolResultBlock.error(e.getMessage()));
+                        });
+    }


The parseQwenTTSResponse and synthesizeWithQwenTTS methods are private but lack Javadoc documentation. While the guideline focuses on public methods, these are substantial private methods that would benefit from documentation explaining their purpose and behavior, especially since they handle complex API response parsing.

Copilot · 2026-01-13T08:58:04Z

agentscope-core/src/main/java/io/agentscope/core/model/tts/DashScopeRealtimeTTSModel.java

+    // Streaming input support
+    private final boolean supportsStreamingInput = true;
+    private final StringBuilder textBuffer = new StringBuilder();
+    private final BlockingQueue<AudioBlock> audioQueue = new LinkedBlockingQueue<>();
+    private final Sinks.Many<AudioBlock> audioSink =
+            Sinks.many().multicast().onBackpressureBuffer();
+    private final AtomicBoolean sessionActive = new AtomicBoolean(false);
+    private Thread synthesisThread;


The sessionActive, textBuffer, and audioQueue fields are accessed from multiple threads (main thread and synthesis thread) without proper synchronization. While AtomicBoolean is used for sessionActive, the textBuffer uses synchronized blocks, but audioQueue operations and reading sessionActive in multiple places could lead to race conditions. Consider using more comprehensive synchronization or documented thread-safety guarantees.

Copilot · 2026-01-13T08:58:04Z

agentscope-core/src/main/java/io/agentscope/core/hook/TTSHook.java

+    public void stop() {
+        if (audioPlayer != null && playerStarted) {
+            audioPlayer.stop();
+            playerStarted = false;
+        }
+        sessionStarted = false;
+        audioSink.tryEmitComplete();
+    }


In the stop() method, audioSink.tryEmitComplete() is called without checking the result. If emission fails, the sink may not be properly closed. Consider logging a warning if tryEmitComplete() returns a failure result.

Copilot · 2026-01-13T08:58:04Z

agentscope-core/src/main/java/io/agentscope/core/hook/TTSHook.java

+    private void emitAudio(AudioBlock audio) {
+        // 1. Emit to reactive stream (for SSE/WebSocket consumers)
+        audioSink.tryEmitNext(audio);
+
+        // 2. Call callback if provided
+        if (audioCallback != null) {
+            audioCallback.accept(audio);
+        }
+
+        // 3. Play locally if player is configured
+        if (audioPlayer != null) {
+            audioPlayer.play(audio);
+        }
+    }


The method emitAudio calls audioSink.tryEmitNext(audio) without checking the return value. If the emission fails (e.g., due to backpressure or sink termination), the audio block will be silently dropped. Consider logging a warning when emission fails to help with debugging.

Copilot · 2026-01-13T08:58:04Z

agentscope-core/src/main/java/io/agentscope/core/model/tts/AudioPlayer.java

+
+            // Start background playback thread
+            playbackThread = new Thread(this::playbackLoop, "audio-player");
+            playbackThread.setDaemon(true);


The playbackThread is started as a daemon thread, which means it will be abruptly terminated when the JVM exits, potentially cutting off audio playback mid-stream. Consider implementing graceful shutdown or documenting this behavior, especially since the stop() method exists but may not always be called.

Suggested change

playbackThread.setDaemon(true);

Copilot · 2026-01-13T08:58:05Z

agentscope-core/src/main/java/io/agentscope/core/tool/multimodal/DashScopeMultiModalTool.java

+                                                    URI.create(
+                                                            "https://dashscope.aliyuncs.com/api/v1/services"
+                                                                + "/aigc/multimodal-generation/generation"))
+                                            .header("Authorization", "Bearer " + this.apiKey)


The API key is passed directly to the HttpClient without any validation or sanitization. Consider adding validation to ensure the API key is not empty/null before making the HTTP request, or at least document the expected format.

agentscope-core/src/main/java/io/agentscope/core/model/tts/DashScopeTTSModel.java

agentscope-core/src/main/java/io/agentscope/core/model/tts/DashScopeRealtimeTTSModel.java

agentscope-core/src/main/java/io/agentscope/core/model/tts/AudioPlayer.java

agentscope-core/src/main/java/io/agentscope/core/model/tts/DashScopeRealtimeTTSModel.java

codecov · 2026-01-13T08:59:45Z

Codecov Report

❌ Patch coverage is 59.31373% with 415 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...cope/core/model/tts/DashScopeRealtimeTTSModel.java	24.78%	252 Missing and 6 partials ⚠️
...java/io/agentscope/core/model/tts/AudioPlayer.java	41.50%	51 Missing and 11 partials ⚠️
...o/agentscope/core/model/tts/DashScopeTTSModel.java	69.67%	21 Missing and 26 partials ⚠️
...src/main/java/io/agentscope/core/hook/TTSHook.java	71.07%	18 Missing and 17 partials ⚠️
.../core/tool/multimodal/DashScopeMultiModalTool.java	86.41%	6 Missing and 5 partials ⚠️
...java/io/agentscope/core/model/tts/TTSResponse.java	95.65%	0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

…hScopeRealtimeTTSModel.java Co-authored-by: Copilot <[email protected]>

…ioPlayer.java Co-authored-by: Copilot <[email protected]>

Change-Id: Ia1f57f71a15d0e7ae6c856c2e70af580d1e86c1a

Change-Id: Id22dae41979765f5ac7b95bea69cdec49966bf8b

Change-Id: I6d25a1ffc056b87afcf0f7ccdc50e71e357538c1

Change-Id: I2c2760b5268279690992b329b5a779973f7e4cc9

Change-Id: I42142bfb331d73ed859f19e9a8d18b9c852d52e3

Change-Id: Id92fffd580ef06f8430a9dce8923aabde6500efd

Change-Id: I1a335080359c2c4de5f14c6f46bf5eb3cf802b4b

Change-Id: Ic3cd684fb3ac9d327d5e417f8ba4a8e95478abbe

Change-Id: I6a1dbc5c37ad71ddb936c9896ddfc6756192d202

Change-Id: I3ca32bfc9e83d2c7f6121f86a0affc4c4433ee4f

Change-Id: I638f53187f4facac5568103649b3034d7269c44d

Change-Id: I6fc008d9337c6dccbf01b2d07aa177667b2dc08a

agentscope-core/src/main/java/io/agentscope/core/model/tts/AudioPlayer.java

agentscope-core/src/main/java/io/agentscope/core/model/tts/DashScopeRealtimeTTSModel.java

Change-Id: I39283aebefa6978c6cc9829f156e345aa47f3c31

flystar32 added 2 commits January 13, 2026 16:20

add tts model support.

b1cb244

Change-Id: Ie83648211c2578fa814b351f7a106be0f4387d85

add qwen tts doc.

632bb2c

Change-Id: I1bbfffc706544db8abc8188d613f2853efde4427

flystar32 requested review from a team and Copilot January 13, 2026 08:51

Copilot started reviewing on behalf of flystar32 January 13, 2026 08:51 View session

Copilot AI reviewed Jan 13, 2026

View reviewed changes

flystar32 and others added 17 commits January 13, 2026 18:57

Merge branch 'main' into main

9212a6f

Update agentscope-core/src/main/java/io/agentscope/core/model/tts/Das…

42172b3

…hScopeRealtimeTTSModel.java Co-authored-by: Copilot <[email protected]>

Update agentscope-core/src/main/java/io/agentscope/core/model/tts/Das…

e30cbea

…hScopeRealtimeTTSModel.java Co-authored-by: Copilot <[email protected]>

Update agentscope-core/src/main/java/io/agentscope/core/model/tts/Aud…

1e0f8a5

…ioPlayer.java Co-authored-by: Copilot <[email protected]>

add unit test

d51e076

Change-Id: Ia1f57f71a15d0e7ae6c856c2e70af580d1e86c1a

add java doc and check emit result

f144ead

Change-Id: Id22dae41979765f5ac7b95bea69cdec49966bf8b

add java doc

c46d9de

Change-Id: I6d25a1ffc056b87afcf0f7ccdc50e71e357538c1

add license header

59d0f63

Change-Id: I2c2760b5268279690992b329b5a779973f7e4cc9

unit test support no audio hardware environment

02d4aec

Change-Id: I42142bfb331d73ed859f19e9a8d18b9c852d52e3

add unit test

bdcc053

Change-Id: Id92fffd580ef06f8430a9dce8923aabde6500efd

add realtime tts support

4e9d9f6

Change-Id: I1a335080359c2c4de5f14c6f46bf5eb3cf802b4b

unit test support no audio hardware environment

a1361fa

Change-Id: Ic3cd684fb3ac9d327d5e417f8ba4a8e95478abbe

add unit test

b6f9b71

Change-Id: I6a1dbc5c37ad71ddb936c9896ddfc6756192d202

update unit test

1e08adb

Change-Id: I3ca32bfc9e83d2c7f6121f86a0affc4c4433ee4f

Merge branch 'main' into main

56730c0

update DashScopeRealtimeTTSModel's webscoket

7bbc67f

Change-Id: I638f53187f4facac5568103649b3034d7269c44d

add unit test

c2f31b0

Change-Id: I6fc008d9337c6dccbf01b2d07aa177667b2dc08a

AlbumenJ requested changes Jan 16, 2026

View reviewed changes

update DashScopeRealtimeTTSModel, change objectmapper to record

e3d8b8b

Change-Id: I39283aebefa6978c6cc9829f156e345aa47f3c31

add Qwen tts model support. #541

Are you sure you want to change the base?

add Qwen tts model support. #541

Uh oh!

Conversation

flystar32 commented Jan 13, 2026

AgentScope-Java Version

Description

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 13, 2026 •

edited

Loading