Skip to content

Commit cc18caf

Browse files
committed
Merge remote-tracking branch 'upstream/main'
2 parents 8f19e4b + 6d23ea4 commit cc18caf

File tree

7 files changed

+202
-32
lines changed

7 files changed

+202
-32
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ jobs:
3939
pytest -v --cov-report=xml --cov=src/sdialog
4040
4141
- name: Upload coverage reports to Codecov
42+
if: matrix.python-version == '3.10'
4243
uses: codecov/codecov-action@v5
4344
with:
4445
fail_ci_if_error: false

docs/examples/index.rst

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,59 @@ Let's start with something fun and straightforward—creating a simple dialogue
9090
9191
Individual agents can be served and exposed as a OpenAI compatible API endpoint with the :meth:`~sdialog.agents.Agent.serve` method (e.g. ``mentor.serve(port=1333)``), see :ref:`here <serving_agents>` for more details.
9292

93+
.. _ex-agent-tools:
94+
95+
Agent Tools (Function Calling)
96+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
97+
You can attach plain Python functions as tools. When the backend supports tool/function calling,
98+
the agent can call them during response generation.
99+
100+
.. code-block:: python
101+
102+
import sdialog
103+
from sdialog.agents import Agent
104+
105+
sdialog.config.llm("openai:gpt-4.1")
106+
107+
def get_weather(city: str) -> dict:
108+
"""Return weather information for a city."""
109+
return {"city": city, "temperature_c": 21, "condition": "sunny"}
110+
111+
assistant = Agent(
112+
name="WeatherAssistant",
113+
tools=[get_weather],
114+
system_prompt="Use tools when needed and answer concisely."
115+
)
116+
117+
print(assistant("What's the weather in Geneva?"))
118+
119+
.. _ex-final-response-tool:
120+
121+
Direct Tool Output with ``@final_response_tool``
122+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123+
If a tool returns a pre-formatted result (e.g., a markdown table), you can mark it with
124+
``@final_response_tool`` so the agent returns the tool output directly as the final response.
125+
126+
This is especially useful when the tool already produces exactly the text you want the user to see.
127+
Without the decorator, the LLM would typically read the tool output and generate a new answer from it,
128+
which may add extra wording, reformat the content, or spend unnecessary tokens reproducing a large block
129+
of structured text. With ``@final_response_tool``, the tool output becomes the final answer directly.
130+
131+
.. code-block:: python
132+
133+
from sdialog.agents import Agent, final_response_tool
134+
135+
@final_response_tool
136+
def get_report_table(topic: str) -> str:
137+
return "| Item | Value |\n|---|---|\n| example | 42 |"
138+
139+
agent = Agent(tools=[get_report_table])
140+
141+
Notes:
142+
143+
- Non-empty tool output is returned directly as the agent final answer.
144+
- Empty tool output falls back to regular tool flow (the LLM can continue and synthesize a response).
145+
93146
Few-Shot Learning with Example Dialogs
94147
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
95148
Now let's explore one of SDialog's most powerful features! We can guide our dialogues by providing examples that show the system what style, structure, or format we want. This technique, called few-shot learning, works by supplying ``example_dialogs`` to generation components. These exemplar dialogs are injected into the system prompt to steer tone, task format, and conversation flow.

requirements-audio-test.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,5 @@ sox
44
jams
55
pyloudnorm
66
pyroomacoustics
7-
datasets<=3.6.0
87
huggingface_hub[cli]
98
dscaper>=1.7.7

src/sdialog/audio/dialog.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,7 @@ def persona_to_voice(
490490
persona_to_voice_desc: Union[str, callable] = None,
491491
voices: dict[Role, Union[Voice, tuple[str, str]]] = None,
492492
keep_duplicate: bool = False,
493-
tts_engine: BaseTTS | BaseVoiceCloneTTS = None,
493+
tts_engine: Union[BaseTTS, BaseVoiceCloneTTS] = None,
494494
seed: int = None
495495
) -> None:
496496
"""

src/sdialog/audio/tts/qwen3/tts.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
# SPDX-License-Identifier: MIT
44
import torch
55
import numpy as np
6+
from typing import Optional
67

78
from ..base import BaseTTS, BaseVoiceCloneTTS
89
from sdialog.audio.normalizers import TextNormalizer, UnicodeToAsciiNormalizer, normalize_text
@@ -176,7 +177,7 @@ def __init__(
176177

177178
def generate(self,
178179
text: str,
179-
speaker_voice: str | object = None,
180+
speaker_voice: Optional[object] = None,
180181
tts_pipeline_kwargs: dict = {}) -> tuple[np.ndarray, int]:
181182
"""
182183
Generates audio from text using voice cloning.

tests/conftest.py

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
import sys
2+
import types
3+
import importlib.machinery
4+
from types import SimpleNamespace
5+
6+
import numpy as np
7+
8+
9+
def _install_qwen_tts_stub() -> None:
10+
try:
11+
__import__("qwen_tts")
12+
return
13+
except ImportError:
14+
pass
15+
16+
class _FakeQwen3TTSModel:
17+
def __init__(self, *args, **kwargs):
18+
self.args = args
19+
self.kwargs = kwargs
20+
21+
@classmethod
22+
def from_pretrained(cls, *args, **kwargs):
23+
return cls(*args, **kwargs)
24+
25+
def generate_custom_voice(self, text, speaker=None, **kwargs):
26+
return [np.zeros(24_000, dtype=np.float32)], 24_000
27+
28+
def generate_voice_clone(self, text, **kwargs):
29+
return [np.zeros(24_000, dtype=np.float32)], 24_000
30+
31+
def generate_voice_design(self, text, language=None, instruct=None, **kwargs):
32+
return [np.zeros(24_000, dtype=np.float32)], 24_000
33+
34+
def create_voice_clone_prompt(self, ref_audio=None, ref_text=None, **kwargs):
35+
return {
36+
"ref_audio": ref_audio,
37+
"ref_text": ref_text,
38+
}
39+
40+
qwen_tts_module = types.ModuleType("qwen_tts")
41+
qwen_tts_module.Qwen3TTSModel = _FakeQwen3TTSModel
42+
qwen_tts_module.__spec__ = importlib.machinery.ModuleSpec("qwen_tts", loader=None)
43+
sys.modules["qwen_tts"] = qwen_tts_module
44+
45+
46+
_install_qwen_tts_stub()
47+
48+
49+
def _install_torchcodec_stub() -> None:
50+
try:
51+
__import__("torchcodec")
52+
return
53+
except ImportError:
54+
pass
55+
56+
class _FakeTensor:
57+
def __init__(self, array):
58+
self._array = array
59+
60+
def cpu(self):
61+
return self
62+
63+
def numpy(self):
64+
return self._array
65+
66+
class _FakeAudioSamples:
67+
def __init__(self, data=None, sample_rate: int = 16_000):
68+
_arr = np.zeros((1, sample_rate), dtype=np.float32) if data is None else data
69+
self.data = _FakeTensor(_arr)
70+
self.sample_rate = sample_rate
71+
72+
class _FakeAudioDecoder:
73+
def __init__(self, source=None, *args, **kwargs):
74+
self.source = source
75+
self.args = args
76+
self.kwargs = kwargs
77+
_path = None
78+
if isinstance(source, dict):
79+
_path = source.get("path")
80+
else:
81+
_path = getattr(source, "path", None)
82+
83+
self.metadata = SimpleNamespace(
84+
sample_rate=16_000,
85+
path=_path,
86+
)
87+
88+
def __getitem__(self, key: str):
89+
if key == "path":
90+
return self.metadata.path
91+
if key == "sampling_rate":
92+
return self.metadata.sample_rate
93+
if key == "array":
94+
y = self.get_all_samples().data.cpu().numpy()
95+
return np.mean(y, axis=tuple(range(y.ndim - 1))) if y.ndim > 1 else y
96+
raise KeyError(key)
97+
98+
def get_all_samples(self):
99+
return _FakeAudioSamples()
100+
101+
def get_samples_played_in_range(self, *_args, **_kwargs):
102+
return SimpleNamespace(sample_rate=self.metadata.sample_rate)
103+
104+
torchcodec_module = types.ModuleType("torchcodec")
105+
decoders_module = types.ModuleType("torchcodec.decoders")
106+
decoders_module.AudioDecoder = _FakeAudioDecoder
107+
torchcodec_module.decoders = decoders_module
108+
torchcodec_module.__spec__ = importlib.machinery.ModuleSpec("torchcodec", loader=None)
109+
decoders_module.__spec__ = importlib.machinery.ModuleSpec("torchcodec.decoders", loader=None)
110+
111+
sys.modules["torchcodec"] = torchcodec_module
112+
sys.modules["torchcodec.decoders"] = decoders_module
113+
114+
# transformers.audio_utils (and others) call importlib.metadata.version("torchcodec")
115+
# at module level. That call bypasses sys.modules and reads on-disk dist-info, so
116+
# it raises PackageNotFoundError even though our stub is in sys.modules, causing a
117+
# cascade failure across ALL tests. Patch it to return a harmless version string.
118+
import importlib.metadata as _imeta
119+
_real_version = _imeta.version
120+
121+
def _patched_version(name: str) -> str:
122+
if name == "torchcodec":
123+
return "0.0.0"
124+
return _real_version(name)
125+
126+
_imeta.version = _patched_version
127+
128+
129+
_install_torchcodec_stub()

tests/test_audio.py

Lines changed: 16 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -5,35 +5,22 @@
55
import numpy as np
66
import pandas as pd
77

8-
# Try to import audio dependencies
9-
try:
10-
import soundfile as sf
11-
12-
from sdialog.audio.turn import AudioTurn
13-
from sdialog.audio.room_generator import BasicRoomGenerator
14-
from sdialog.audio.utils import Role, Furniture, SpeakerSide
15-
from sdialog.audio.room import Position3D, Dimensions3D, DirectivityType, Room
16-
from sdialog.audio.voice_database import Voice, is_a_audio_file
17-
from sdialog.audio.voice_database import BaseVoiceDatabase, LocalVoiceDatabase, VoiceDatabase
18-
from sdialog.audio.tts import BaseTTS
19-
from sdialog.audio.jsalt import MedicalRoomGenerator, RoomRole
20-
from sdialog.audio.acoustics_simulator import AcousticsSimulator, AudioSource
21-
from sdialog.audio.dialog import AudioDialog
22-
from sdialog.audio.pipeline import AudioPipeline, to_audio
23-
from sdialog.audio.dscaper_utils import send_utterances_to_dscaper, generate_dscaper_timeline
24-
from sdialog.audio.impulse_response_database import LocalImpulseResponseDatabase, RecordingDevice
25-
from sdialog.audio.processing import AudioProcessor
26-
except ImportError:
27-
print("\n" + "=" * 80)
28-
print("Audio dependencies are not installed. All audio tests will be skipped.")
29-
print("=" * 80 + "\n")
30-
31-
# Skip the entire module - pytest will not collect any tests from this file
32-
pytest.skip(
33-
"Audio dependencies not installed. If you are working with audio, install them with: "
34-
"pip install sdialog[audio]",
35-
allow_module_level=True
36-
)
8+
import soundfile as sf
9+
10+
from sdialog.audio.turn import AudioTurn
11+
from sdialog.audio.room_generator import BasicRoomGenerator
12+
from sdialog.audio.utils import Role, Furniture, SpeakerSide
13+
from sdialog.audio.room import Position3D, Dimensions3D, DirectivityType, Room
14+
from sdialog.audio.voice_database import Voice, is_a_audio_file
15+
from sdialog.audio.voice_database import BaseVoiceDatabase, LocalVoiceDatabase, VoiceDatabase
16+
from sdialog.audio.tts import BaseTTS
17+
from sdialog.audio.jsalt import MedicalRoomGenerator, RoomRole
18+
from sdialog.audio.acoustics_simulator import AcousticsSimulator, AudioSource
19+
from sdialog.audio.dialog import AudioDialog
20+
from sdialog.audio.pipeline import AudioPipeline, to_audio
21+
from sdialog.audio.dscaper_utils import send_utterances_to_dscaper, generate_dscaper_timeline
22+
from sdialog.audio.impulse_response_database import LocalImpulseResponseDatabase, RecordingDevice
23+
from sdialog.audio.processing import AudioProcessor
3724

3825
from sdialog import Turn, Dialog
3926
from unittest.mock import MagicMock, patch

0 commit comments

Comments
 (0)