Azure provider STT and TTS by egorky · Pull Request #261 · hkjarral/AVA-AI-Voice-Agent-for-Asterisk

egorky · 2026-03-02T23:00:47Z

Add AzureSTTProviderConfig and AzureTTSProviderConfig Pydantic models to config.py
Inject AZURE_SPEECH_KEY from env in config/security.py
Implement AzureSTTFastAdapter (fast transcription REST API),
AzureSTTRealtimeAdapter (one-shot STT REST), and AzureTTSAdapter
in src/pipelines/azure.py
Register azure_stt_fast, azure_stt_realtime, azure_tts factories in
orchestrator; azure_stt alias routes by variant field
Add AzureProviderForm.tsx to admin UI with full parameter coverage:
region, variant (realtime/fast), language, voice name, output format,
target encoding/sample rate, chunk size, timeouts, custom endpoints
Register 'azure' in REGISTERED_PROVIDER_TYPES (providerNaming.ts)
Add azure_stt and azure_tts quick-add templates to ProvidersPage
Add Azure expert settings panels to PipelineForm (variant/language
overrides for STT; voice name/output format overrides for TTS)

Summary by CodeRabbit

New Features
- Microsoft Azure Speech integration: STT (fast & realtime) and TTS support with region validation, key verification, voice/format/custom endpoint controls, streaming options, provider UI/form, add-provider templates, and config entries.
- LLM aggregation thresholds: configurable minimum words and characters in the UI.
- Faster pipeline transcripts and playback via early STT flushes and per-call transcript queues.
Bug Fixes
- Environment variable parsing now strips surrounding quotes for correct values.

- Add AzureSTTProviderConfig and AzureTTSProviderConfig Pydantic models to config.py - Inject AZURE_SPEECH_KEY from env in config/security.py - Implement AzureSTTFastAdapter (fast transcription REST API), AzureSTTRealtimeAdapter (one-shot STT REST), and AzureTTSAdapter in src/pipelines/azure.py - Register azure_stt_fast, azure_stt_realtime, azure_tts factories in orchestrator; azure_stt alias routes by variant field - Add AzureProviderForm.tsx to admin UI with full parameter coverage: region, variant (realtime/fast), language, voice name, output format, target encoding/sample rate, chunk size, timeouts, custom endpoints - Register 'azure' in REGISTERED_PROVIDER_TYPES (providerNaming.ts) - Add azure_stt and azure_tts quick-add templates to ProvidersPage - Add Azure expert settings panels to PipelineForm (variant/language overrides for STT; voice name/output format overrides for TTS)

coderabbitai · 2026-03-02T23:01:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Microsoft Azure Speech Service support (STT/TTS): new provider models, Azure adapters (fast/realtime STT, TTS), orchestrator factories, frontend provider UI/forms, backend env/key handling and validation, engine transcript queuing, streaming batching, requirements, and provider templates.

Changes

Cohort / File(s)	Summary
Config Models & Exports `src/config.py`, `src/config/__init__.py`	Added `AzureSTTProviderConfig`, `AzureTTSProviderConfig`, `_AZURE_REGION_RE`, and `validate_azure_region`; re-exported them.
Pipelines: Azure Adapters `src/pipelines/azure.py`, `src/pipelines/__init__.py`	New module implementing `AzureSTTFastAdapter`, `AzureSTTRealtimeAdapter`, `AzureTTSAdapter` and exported them.
Orchestrator: Factory Registration `src/pipelines/orchestrator.py`	Hydration, validation, and factory registration for `azure_stt_fast`, `azure_stt_realtime`, alias `azure_stt`, and `azure_tts`; wiring provider configs into factories.
Engine & Streaming `src/engine.py`, `src/core/streaming_playback_manager.py`	Added per-call pipeline transcript queues and early STT flush enqueueing; batched streaming-session upserts (1s); aggregation thresholds now configurable (min_words/min_chars); downstream_mode override handling.
Frontend: Azure Provider UI & Forms `admin_ui/frontend/src/components/config/providers/AzureProviderForm.tsx`, `admin_ui/frontend/src/pages/ProvidersPage.tsx`, `admin_ui/frontend/src/components/config/PipelineForm.tsx`, `admin_ui/frontend/src/utils/providerNaming.ts`	Added `AzureProviderForm`, provider templates and registration; ProvidersPage recognizes/renders Azure; PipelineForm wired LLM thresholds and Azure STT/TTS expert options; registered `'azure'` provider type.
Admin UI Backend: Env & Provider Tests `admin_ui/backend/api/config.py`, `admin_ui/backend/api/local_ai.py`	Strip surrounding quotes from `.env` values; added Azure provider connection test using `AZURE_SPEECH_KEY` with region validation and token endpoint verification; added `model_config = {"protected_namespaces": ()}` to `SwitchModelRequest`.
Security: Key Injection `src/config/security.py`	Inject `AZURE_SPEECH_KEY` into matching provider blocks (`azure_stt`, `azure_tts`, or type `azure`) when present.
Default Config & Templates `config/ai-agent.yaml`	Added `azure_stt` and `azure_tts` provider templates with sensible defaults (region, language, voice, formats, timeouts).
Requirements & Logging `requirements.txt`, `src/pipelines/openai.py`	Added `azure-cognitiveservices-speech>=1.41.0`; improved OpenAI debug logging (messages count and last message preview).
Admin Backend Minor `admin_ui/backend/api/config.py`	Quote-stripping when reading env values and Azure-specific connection test added.
Public Exports `src/pipelines/__init__.py`	Exported new Azure adapters in package public exports.

Sequence Diagram(s)

sequenceDiagram
    participant UI as Frontend UI
    participant API as Backend API
    participant Env as Env/.env
    participant Azure as Azure Speech Service
    participant Orch as Pipeline Orchestrator / Engine

    UI->>API: Request provider connection test / save config
    API->>Env: Read AZURE_SPEECH_KEY (strip quotes)
    alt key present
        API->>Azure: POST token request to region-specific token endpoint
        Azure-->>API: 200 / 401 / other
        API-->>UI: Return validated result (success/error)
    else
        API-->>UI: Return error (key not configured)
    end

    UI->>Orch: Configure provider (azure_stt / azure_tts)
    Orch->>Orch: Hydrate & validate provider config (region, key)
    Orch->>Orch: Register adapter factory (fast/realtime/tts)
    Orch-->>UI: Adapters available

sequenceDiagram
    participant Agent as Engine/Agent
    participant STT as Azure STT Adapter
    participant TranscriptQ as Pipeline Transcript Queue
    participant TTS as Azure TTS Adapter
    participant Playback as Streaming Playback Manager

    Agent->>STT: Stream audio frames
    STT->>STT: Buffer / VAD / batch frames
    STT->>Agent: Deliver transcripts (including early flush)
    Agent->>TranscriptQ: Enqueue early transcripts
    Agent->>Agent: Check aggregation thresholds (min_words/min_chars)
    Agent->>TTS: Request synthesis (SSML with prosody/lang)
    TTS->>Playback: Stream chunks (or return file) with format conversion
    Playback->>Playback: Batch session upserts every 1s
    Playback-->>Agent: Playback complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: add MiniMax LLM pipeline adapter #263 — modifies inject_provider_api_keys similarly (adds provider-specific env-key injection logic), likely related to the AZURE key injection changes.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete compared to the required template. It lacks Summary section, Related Issue(s), explicit Implementation Notes with file lists and design decisions, comprehensive Testing section with checkboxes, test call IDs, and Documentation section with checked items.	Fill out all template sections: add a Summary, list related Linear/GitHub issues, detail key files touched and design decisions, document testing performed with specific commands and call IDs, and specify which documentation was updated.
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Azure provider STT and TTS' is concise and accurately reflects the main feature addition—comprehensive Azure Speech Service support for both STT and TTS with multiple adapters, configuration models, and UI components.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

admin_ui/backend/api/config.py

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/pipelines/openai.py (1)

480-487: ⚠️ Potential issue | 🟠 Major

Avoid logging full LLM message payloads.

This logs complete conversation content at debug level, which can leak sensitive user/tool data into logs.

🔧 Proposed fix

         logger.debug(
             "OpenAI chat completion request",
             call_id=call_id,
             model=payload.get("model"),
             temperature=payload.get("temperature"),
             tools_count=len(payload.get("tools", [])),
-            messages=json.dumps(payload.get("messages", []), ensure_ascii=False),
+            messages_count=len(payload.get("messages", [])),
+            last_message_preview=str(
+                (payload.get("messages", [])[-1].get("content", "") if payload.get("messages") else "")
+            )[:120],
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/openai.py` around lines 480 - 487, The logger.debug call in
openai.py is currently logging full conversation content via
payload.get("messages"), which can leak sensitive data; update the logger.debug
invocation (the call referencing logger.debug, call_id and payload) to avoid
including full messages—replace messages=json.dumps(payload.get("messages", []),
...) with non-sensitive metadata such as
messages_count=len(payload.get("messages", [])) and optionally a redacted or
truncated indicator (e.g., messages_preview or messages_hash) so the full
message bodies are never written to logs.

🧹 Nitpick comments (3)

admin_ui/frontend/src/components/config/PipelineForm.tsx (1)

575-582: Constrain Azure STT variant override to supported values.

Line 576 currently allows arbitrary text. Invalid values can be persisted and break runtime routing for Azure STT variant selection.

💡 Suggested fix

-                                    <FormInput
-                                        label="Azure STT Variant Override"
-                                        value={localConfig.options?.stt?.variant || ''}
-                                        onChange={(e) => updateRoleOptions('stt', { variant: e.target.value || undefined })}
-                                        placeholder="realtime (or fast)"
-                                        tooltip="Override the variant set on the provider: 'realtime' or 'fast'."
-                                        disabled={!showSttExpert}
-                                    />
+                                    <FormSelect
+                                        label="Azure STT Variant Override"
+                                        value={localConfig.options?.stt?.variant || ''}
+                                        onChange={(e) => updateRoleOptions('stt', { variant: e.target.value || undefined })}
+                                        options={[
+                                            { value: '', label: 'Use provider default' },
+                                            { value: 'realtime', label: 'realtime' },
+                                            { value: 'fast', label: 'fast' },
+                                        ]}
+                                        tooltip="Override the variant set on the provider."
+                                        disabled={!showSttExpert}
+                                    />

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 575 -
582, The Azure STT variant input in PipelineForm allows arbitrary text and
should be constrained to supported values; update the FormInput for "Azure STT
Variant Override" (in PipelineForm) to only accept "realtime" or "fast" by
replacing free-text entry with a controlled selection or by validating inside
the onChange handler before calling updateRoleOptions('stt', ...); ensure empty
value maps to undefined and that localConfig.options?.stt?.variant is set only
to "realtime" or "fast" (or cleared) so invalid values cannot be persisted.

config/ai-agent.yaml (1)

1518-1521: Align Azure TTS default format with the UI template.

azure_tts.output_format here is raw-8khz-16bit-mono-pcm, while the quick-add template uses riff-8khz-16bit-mono-pcm (Line 270 in admin_ui/frontend/src/pages/ProvidersPage.tsx). This drift causes different runtime behavior depending on where the provider was created.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/ai-agent.yaml` around lines 1518 - 1521, The azure_tts.output_format
value is inconsistent with the UI quick-add template and should be changed from
"raw-8khz-16bit-mono-pcm" to "riff-8khz-16bit-mono-pcm" so providers created via
the UI and config behave the same; update the azure_tts.output_format entry (and
verify target_encoding/target_sample_rate_hz remain correct) to match the UI
template reference used in ProvidersPage.tsx.

src/pipelines/orchestrator.py (1)

629-639: Normalize and validate variant before alias logging.

If variant is invalid (e.g., typo), alias correctly falls back to realtime, but the log still reports azure_stt_<invalid>, which is misleading during troubleshooting.

Proposed fix

@@
-            variant = (self._azure_stt_provider_config.variant or "realtime").lower()
+            raw_variant = str(self._azure_stt_provider_config.variant or "realtime").strip().lower()
+            variant = raw_variant if raw_variant in {"fast", "realtime"} else "realtime"
+            if raw_variant not in {"fast", "realtime"}:
+                logger.warning(
+                    "Invalid Azure STT variant configured; defaulting alias to realtime",
+                    configured_variant=raw_variant,
+                )
             alias_factory = fast_factory if variant == "fast" else realtime_factory
             self.register_factory("azure_stt", alias_factory)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/orchestrator.py` around lines 629 - 639, The log uses the raw
possibly-invalid provider value for stt_alias causing misleading messages;
validate and normalize the variant before both selecting the factory and
logging. Compute a validated string (e.g., chosen_variant = "fast" if
normalized_variant == "fast" else "realtime"), use chosen_variant when selecting
alias_factory and calling register_factory("azure_stt", alias_factory), and use
the same chosen_variant in logger.info stt_alias (e.g.,
f"azure_stt_{chosen_variant}") so the log reflects the actual behaviour.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@admin_ui/backend/api/config.py`:
- Around line 1423-1426: Normalize and validate the Azure region coming from
provider_config before constructing token_url: retrieve region via
provider_config.get('region', 'eastus'), then trim surrounding whitespace (e.g.,
region = region.strip()), normalize casing if desired (e.g., lower()), and
validate against an allowlist or simple regex that permits only expected
characters (letters, numbers, hyphens) and rejects empty or malformed values; if
validation fails, fall back to the default 'eastus' or raise a clear error. Use
the normalized/validated region variable when building token_url instead of the
raw provider_config value to avoid malformed endpoints.

In `@admin_ui/frontend/src/pages/ProvidersPage.tsx`:
- Around line 255-275: The templates object includes azure_stt and azure_tts but
the modal’s "Modular Providers (Cloud)" template list rendering never creates
checkbox entries for those keys; update the checkbox rendering logic (where
template checkboxes are built—e.g., the component or loop that renders the
templates list in ProvidersPage, likely the function that maps over templates)
to include items for 'azure_stt' and 'azure_tts' with matching keys, labels, and
capabilities so they become selectable in the modal; ensure the checkbox
value/name matches the template keys (azure_stt, azure_tts) and that any handler
that collects selected templates (e.g., onToggleTemplate / selectedTemplates
state) recognizes them.
- Around line 601-603: The Azure form is being shadowed by the earlier
"realtime" branch because providerName.includes('realtime') matches names like
"azure_stt_realtime"; update the routing logic in ProvidersPage (where
providerName is checked) so Azure-specific names take precedence or the checks
are more specific: either move the azure check (the block that renders
AzureProviderForm) above the realtime/OpenAI check, or restrict the realtime
check (e.g., exclude names containing 'azure' or use a more precise pattern) so
that AzureProviderForm is rendered for Azure providers (identify the
providerName variable and the conditional that returns AzureProviderForm to
locate where to change the order/condition).

In `@src/config/security.py`:
- Around line 219-228: The current injection logic only matches provider names
starting with "azure_stt" or equal to "azure_tts", which misses custom keys;
update the loop that iterates over providers_block (using azure_speech_key,
provider_name, provider_cfg) to also detect Azure providers by inspecting
provider_cfg fields (e.g., check provider_cfg.get("type",
"").lower().startswith("azure") or provider_cfg.get("provider",
"").lower().startswith("azure")) and, when such a type/provider indicates Azure,
set provider_cfg["api_key"] = azure_speech_key and write back
providers_block[provider_name] = provider_cfg; keep the existing
isinstance(provider_cfg, dict) guard and only inject when azure_speech_key is
truthy.

In `@src/core/streaming_playback_manager.py`:
- Around line 849-858: Pending streamed bytes (bytes_since_last_upsert) are only
upserted when chunk is None, causing lost counts on other exit paths; modify the
streaming loop in streaming_playback_manager.py so that before any
break/return/exception path you flush bytes_since_last_upsert by fetching the
session via await self.session_store.get_by_call_id(call_id), adding
bytes_since_last_upsert to sess.streaming_bytes_sent, setting
sess.streaming_jitter_buffer_depth = jitter_buffer.qsize(), and calling await
self.session_store.upsert_call(sess); do this in a single shared cleanup (e.g.,
a finally block or a helper flush_pending_bytes(call_id,
bytes_since_last_upsert, jitter_buffer)) and apply the same change to the second
occurrence referenced around the lines for bytes_since_last_upsert at the other
block (873-881) so all exit paths persist the counter.
- Around line 856-857: Replace the silent except in the stats persistence path
that currently reads "except Exception: pass" with a non-fatal logging statement
so failed session upserts still surface diagnostics; catch Exception as e and
call the module's logger (e.g., logger.debug or process_logger.debug) with a
clear message about the failed session upsert and exc_info=True to preserve the
traceback, then continue (do not re-raise) so behavior stays non-fatal.

In `@src/engine.py`:
- Line 9098: The warning-level debug messages that log TTS DM override details
(the logger.warning call that prints "DEBUG: TTS Adapter DM Override evaluated
as: {_tts_dm_override} on adapter {pipeline.tts_adapter.__class__.__name__}")
are flooding alerts; change these to use logger.debug (or remove entirely) in
both occurrences (the call referencing _tts_dm_override and
pipeline.tts_adapter.__class__.__name__) so TTS hot paths no longer emit
warnings per turn/greeting—locate the two logger.warning calls and replace them
with logger.debug(...) (or delete if unnecessary) to keep debug info at the
appropriate log level.
- Around line 10039-10042: The min_words and min_chars are read directly from
pipeline.llm_options and can be non-positive, which makes threshold_met
(computed from words and chars) always true; clamp these values to positive
bounds before use (e.g., replace the direct int(...) reads for min_words and
min_chars with clamped values such as max(1, parsed_value) or another positive
minimum) so that threshold_met = words >= min_words or chars >= min_chars
behaves correctly; update the lines that set min_words, min_chars and ensure any
tests relying on aggregation thresholds reflect the minimum clamp.
- Around line 3296-3301: The code erroneously treats _pipeline_tasks as a
collection; change the cleanup to pop a single asyncio.Task for the session
(task = getattr(self, "_pipeline_tasks", {}).pop(session.call_id, None)) then if
task and not task.done(): task.cancel(); ensure the subsequent cleanup always
runs by not relying on iteration (leave the existing pops of _pipeline_queues,
_pipeline_transcript_queues, and _pipeline_forced unchanged) and apply the same
single-Task pop/cancel fix to the other occurrence around _pipeline_tasks at the
3928–3933 block; use the symbols _pipeline_tasks, _pipeline_queues,
_pipeline_transcript_queues, _pipeline_forced, and session.call_id to locate and
update both blocks.
- Around line 4411-4429: The background STT flush task is created with
asyncio.create_task(_) and therefore is not tracked for per-call cleanup;
replace that call with self._fire_and_forget_for_call(call_id,
_flush_and_process(), name=f"pipeline-stt-flush-{call_id}") so the task is added
to the call's _call_bg_tasks tracking. Also wrap the await of
stt.flush_speech(...) inside asyncio.wait_for(..., timeout=<reasonable_secs>)
inside _flush_and_process to avoid indefinite hangs; keep existing logging
(logger.debug / logger.warning / logger.error) and the use of
pipeline.options_summary().get("stt", {}) and stt.flush_speech to locate the
correct place.

In `@src/pipelines/azure.py`:
- Around line 625-631: The VAD timeout keys are never exposed because
_compose_options doesn't include "vad_silence_timeout_ms" and
"vad_initial_silence_timeout_ms", so start_stream's reads are ignored; update
_compose_options to merge and return these two keys (taking values from provider
defaults and runtime overrides) so merged passed into start_stream contains
them, and make the same change for the other timeout-setting location that
mirrors this logic (the second block that sets
SpeechServiceConnection_EndSilenceTimeoutMs /
SpeechServiceConnection_InitialSilenceTimeoutMs). Locate the functions/methods
named _compose_options and start_stream and ensure _compose_options includes the
keys exactly "vad_silence_timeout_ms" and "vad_initial_silence_timeout_ms" in
its returned dict so the timeout-setting code uses the configured values.
- Around line 635-638: The code changes sample_rate_hz from 8000 to 16000 when
stream_format == "pcm16_16k" (in the start_stream flow) but send_audio forwards
raw PCM bytes unchanged, causing a sample-rate mismatch; either perform proper
audio resampling of the payload bytes to 16 kHz before sending (implement a
resample step where start_stream sets sample_rate_hz and call it from
send_audio), or stop mutating sample_rate_hz and instead validate/raise an error
if the payload sample rate doesn't match the declared stream_format; update the
functions start_stream and send_audio to reference the same sample rate (or call
a shared resample utility) so timing/pitch remain correct.
- Around line 761-768: The temporary AzureSTTFastAdapter created as fast_adapter
in the realtime fallback can open an aiohttp.ClientSession and must be closed to
avoid session leaks; wrap the creation and call to
fast_adapter.transcribe(call_id, audio_pcm16, sample_rate_hz, options) in a
try/finally (or an async context manager) and close the adapter in the finally
block—prefer using an async close if available (await fast_adapter.aclose() or
await fast_adapter.close() if async) or call fast_adapter.close() if
synchronous; ensure the finally always runs even on error so AzureSTTFastAdapter
instances are consistently cleaned up.
- Around line 381-397: The explicit flush in flush_speech currently hard-codes
16000 when calling _execute_transcription which breaks buffered audio recorded
at 8000 Hz; change flush_speech so it reads the sample rate associated with the
buffered audio (e.g., the attribute used when filling _audio_buffer such as
self._buffer_sample_rate or the pipeline's self._sample_rate) and pass that
sample rate to _execute_transcription instead of 16000, falling back to 16000
only if no buffer-specific sample rate is available; update references in
flush_speech to use that variable when calling _execute_transcription and ensure
any places that set/clear the buffer also maintain the buffer sample rate.
- Around line 940-1002: The streaming path currently only detects raw μ-law via
is_raw_mulaw and skips PCM handling; add detection for raw A-law (e.g.,
is_raw_alaw = "alaw" in fmt_lower and fmt_lower.startswith("raw-")) and treat it
the same as μ-law: update the branch that currently checks if is_raw_mulaw and
target_encoding == "mulaw" to also handle is_raw_alaw when target_encoding ==
"alaw" (or unify the check to if (is_raw_mulaw and target_encoding=="mulaw") or
(is_raw_alaw and target_encoding=="alaw")), so raw A-law payloads are yielded
directly instead of being processed through the PCM/resample/_to_target_format
path (refer to variables is_raw_mulaw/is_raw_alaw, fmt_lower, pcm_payload,
target_encoding, resample_audio, and _to_target_format).

---

Outside diff comments:
In `@src/pipelines/openai.py`:
- Around line 480-487: The logger.debug call in openai.py is currently logging
full conversation content via payload.get("messages"), which can leak sensitive
data; update the logger.debug invocation (the call referencing logger.debug,
call_id and payload) to avoid including full messages—replace
messages=json.dumps(payload.get("messages", []), ...) with non-sensitive
metadata such as messages_count=len(payload.get("messages", [])) and optionally
a redacted or truncated indicator (e.g., messages_preview or messages_hash) so
the full message bodies are never written to logs.

---

Nitpick comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 575-582: The Azure STT variant input in PipelineForm allows
arbitrary text and should be constrained to supported values; update the
FormInput for "Azure STT Variant Override" (in PipelineForm) to only accept
"realtime" or "fast" by replacing free-text entry with a controlled selection or
by validating inside the onChange handler before calling
updateRoleOptions('stt', ...); ensure empty value maps to undefined and that
localConfig.options?.stt?.variant is set only to "realtime" or "fast" (or
cleared) so invalid values cannot be persisted.

In `@config/ai-agent.yaml`:
- Around line 1518-1521: The azure_tts.output_format value is inconsistent with
the UI quick-add template and should be changed from "raw-8khz-16bit-mono-pcm"
to "riff-8khz-16bit-mono-pcm" so providers created via the UI and config behave
the same; update the azure_tts.output_format entry (and verify
target_encoding/target_sample_rate_hz remain correct) to match the UI template
reference used in ProvidersPage.tsx.

In `@src/pipelines/orchestrator.py`:
- Around line 629-639: The log uses the raw possibly-invalid provider value for
stt_alias causing misleading messages; validate and normalize the variant before
both selecting the factory and logging. Compute a validated string (e.g.,
chosen_variant = "fast" if normalized_variant == "fast" else "realtime"), use
chosen_variant when selecting alias_factory and calling
register_factory("azure_stt", alias_factory), and use the same chosen_variant in
logger.info stt_alias (e.g., f"azure_stt_{chosen_variant}") so the log reflects
the actual behaviour.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 579a363 and 0efb99b.

📒 Files selected for processing (17)

admin_ui/backend/api/config.py
admin_ui/backend/api/local_ai.py
admin_ui/frontend/src/components/config/PipelineForm.tsx
admin_ui/frontend/src/components/config/providers/AzureProviderForm.tsx
admin_ui/frontend/src/pages/ProvidersPage.tsx
admin_ui/frontend/src/utils/providerNaming.ts
config/ai-agent.yaml
requirements.txt
src/config.py
src/config/__init__.py
src/config/security.py
src/core/streaming_playback_manager.py
src/engine.py
src/pipelines/__init__.py
src/pipelines/azure.py
src/pipelines/openai.py
src/pipelines/orchestrator.py

admin_ui/backend/api/config.py

admin_ui/frontend/src/pages/ProvidersPage.tsx

src/config/security.py

src/core/streaming_playback_manager.py

src/pipelines/azure.py

hkjarral · 2026-03-02T23:33:29Z

PR Review — Excellent Azure Integration! 🎉

@egorky — thank you for this comprehensive Azure Speech Service integration! I have been meaning to work on it but never got a chance to handle it. This is a major contribution. Thank you again for working on this, several community members had requested this multiple times. PR #261 brings a major new provider to AVA with both STT and TTS capabilities. The architecture follows our existing provider patterns well:

✅ Pydantic models for AzureSTTProviderConfig and AzureTTSProviderConfig — clean and type-safe
✅ Dual STT variants (realtime + fast) with proper factory registration
✅ Environment variable injection via config/security.py — follows existing credential handling
✅ Full Admin UI support with AzureProviderForm.tsx and provider templates
✅ Streaming TTS with format conversion and jitter buffer integration
✅ Pipeline orchestrator registration follows TransportOrchestrator patterns
✅ Sequence diagrams in PR description — very helpful for reviewers
✅ LLM aggregation thresholds (min_words, min_chars) — frequently requested feature

The new features are solid:

AzureSTTFastAdapter provides batch transcription via REST API — useful for short utterances
AzureSTTRealtimeAdapter with VAD and continuous recognition — great for real-time conversations
AzureTTSAdapter with SSML prosody/lang support and streaming playback
Frontend forms properly expose region, voice, and format options

🔴 Critical: Fix SSRF Vulnerability in Azure Region Handling

Location: admin_ui/backend/api/config.py (~line 1423-1426)

The Azure region from provider_config is used directly to construct token_url without validation. This is flagged as Critical by CodeQL and must be fixed before merge.

What to do: Validate and normalize the region before building the URL:

import re

AZURE_REGION_PATTERN = re.compile(r"^[a-z0-9-]{1,50}$")
AZURE_REGIONS_ALLOWLIST = {"eastus", "eastus2", "westus", "westus2", "westeurope", "northeurope", ...}  # Add all valid regions

raw_region = provider_config.get("region", "eastus")
region = str(raw_region).strip().lower()

if not AZURE_REGION_PATTERN.match(region) or region not in AZURE_REGIONS_ALLOWLIST:
    return {"status": "error", "message": f"Invalid Azure region: {raw_region}"}

token_url = f"https://{region}.api.cognitive.microsoft.com/sts/v1.0/issueToken"

🔴 Critical: Close `AzureSTTFastAdapter` Session in Realtime Fallback

Location: src/pipelines/azure.py (~lines 761-768)

The temporary fast_adapter opens an aiohttp.ClientSession but is never closed, causing session leaks.

What to do: Wrap in try/finally:

fast_adapter = AzureSTTFastAdapter(self._azure_stt_provider_config, self.cfg)
try:
    result = await fast_adapter.transcribe(call_id, audio_pcm16, sample_rate_hz, options)
finally:
    await fast_adapter.close()  # or aclose() if async

🔴 Critical: Fix Sample Rate Mismatch in STT Streaming

Location: src/pipelines/azure.py (~lines 635-638)

start_stream changes sample_rate_hz to 16000 for pcm16_16k, but send_audio forwards raw PCM bytes unchanged — this causes pitch/timing errors for 8kHz audio.

What to do: Either:

Resample audio in send_audio using resampler.resample_audio() before sending, OR
Validate that incoming sample rate matches declared stream_format and raise an error if mismatched

🔴 Critical: Fix Hard-coded Sample Rate in `flush_speech`

Location: src/pipelines/azure.py (~lines 381-397)

Flush hard-codes 16000 Hz when calling _execute_transcription, breaking 8kHz buffered audio.

What to do: Track the buffer's sample rate and use it:

# When buffering audio, also store: self._buffer_sample_rate = sample_rate_hz
# In flush_speech:
sample_rate = getattr(self, "_buffer_sample_rate", 16000)
result = await self._execute_transcription(call_id, audio_bytes, sample_rate, options)

🟡 Major: Avoid Logging Full LLM Payloads

Location: src/pipelines/openai.py (~lines 480-487)

Full conversation content logged at debug level can leak sensitive user/tool data.

What to do: Replace:

messages=json.dumps(payload.get("messages", []), ensure_ascii=False),

with:

messages_count=len(payload.get("messages", [])),

🟡 Major: Clamp Aggregation Thresholds to Positive Values

Location: src/engine.py (~lines 10039-10042)

min_words and min_chars can be ≤0, making threshold_met always true.

What to do:

min_words = max(1, int(pipeline.llm_options.get("min_words", 1)))
min_chars = max(1, int(pipeline.llm_options.get("min_chars", 1)))

🟡 Major: Expose VAD Timeout Config

Location: src/pipelines/azure.py (~lines 625-631)

_compose_options doesn't include vad_silence_timeout_ms / vad_initial_silence_timeout_ms, so configured values are ignored.

What to do: Add these keys to the dict returned by _compose_options.

🟡 Moderate: Fix Azure Provider UI Shadowing

Location: admin_ui/frontend/src/pages/ProvidersPage.tsx (~lines 601-603)

providerName.includes('realtime') matches azure_stt_realtime and renders OpenAI form instead of Azure form.

What to do: Move the Azure check above the realtime check, or make the pattern more specific:

if (providerName.startsWith('azure_')) {
  return <AzureProviderForm ... />;
}

🟡 Moderate: Constrain Azure STT Variant to Valid Values

Location: admin_ui/frontend/src/components/config/PipelineForm.tsx (~lines 575-582)

Free-text input allows arbitrary values; invalid variants break runtime routing.

What to do: Replace FormInput with FormSelect:

<FormSelect
  label="Azure STT Variant Override"
  value={localConfig.options?.stt?.variant || ''}
  onChange={(e) => updateRoleOptions('stt', { variant: e.target.value || undefined })}
  options={[
    { value: '', label: 'Use provider default' },
    { value: 'realtime', label: 'realtime' },
    { value: 'fast', label: 'fast' },
  ]}
  tooltip="Override the variant set on the provider."
  disabled={!showSttExpert}
/>

🟡 Minor: Align `output_format` Defaults

Location: config/ai-agent.yaml (~lines 1518-1521)

Config uses raw-8khz-16bit-mono-pcm, but UI template uses riff-8khz-16bit-mono-pcm. Pick one (prefer riff-... for broader compatibility) and align both.

🟡 Minor: Add Raw A-law Detection

Location: src/pipelines/azure.py (~lines 940-1002)

Only μ-law detected; A-law payloads incorrectly go through PCM/resample path. Add is_raw_alaw detection mirroring is_raw_mulaw.

Summary

Priority	Count	Items
🔴 Critical	4	SSRF fix, session leak, sample rate mismatch, flush sample rate
🟡 Major	3	PII logging, threshold clamping, VAD config
🟡 Moderate	2	UI shadowing, variant validation
🟡 Minor	2	Format alignment, A-law detection

This is a substantial and well-architected contribution! Once the 4 critical items are addressed, we can merge. The Azure adapters follow existing patterns nicely, and the LLM aggregation thresholds are a great addition.

Looking forward to seeing this land — thanks for the thorough work! 🙌

SSRF prevention: - Add validate_azure_region() shared validator in src/config.py - Wire into AzureSTTProviderConfig/AzureTTSProviderConfig via Pydantic field_validator - Guard all _build_azure_*_url() functions in azure.py - Validate region in admin API test-connection endpoint Correctness fixes: - Fix _pipeline_tasks iteration bug (Dict[str, Task], not set) in engine cleanup - Track STT flush task via _fire_and_forget_for_call with 5s timeout - Demote "DEBUG: TTS Adapter DM Override" from warning to debug level - Clamp aggregation min_words/min_chars to >= 1 - Add VAD timeout keys to AzureSTTRealtimeAdapter._compose_options - Close temp AzureSTTFastAdapter in realtime fallback (session leak) - Remove silent sample_rate mutation in start_stream; log warning on mismatch - Track buffer sample rate in flush_speech instead of hard-coding 16000 - Handle raw A-law format in TTS streaming passthrough - Flush pending byte counters in streaming_playback_manager finally block - Replace silent except with debug logging for session upsert failures Security & validation: - Constrain STT variant to Literal["realtime", "fast"] in config - Normalize/validate variant in orchestrator with warning on invalid input - Add type-based Azure key injection in security.py (cfg type == "azure") - Remove full message payload logging in OpenAI adapter Frontend: - Move Azure form check above realtime check to prevent routing shadow - Add azure_stt/azure_tts checkbox entries to quick-add modal - Replace free-text variant input with constrained select dropdown Config: - Align azure_tts output_format to riff-8khz-16bit-mono-pcm (matches UI) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hkjarral · 2026-03-12T21:54:13Z

@coderabbitai review

coderabbitai · 2026-03-12T21:54:23Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

src/core/streaming_playback_manager.py (1)

875-882: ⚠️ Potential issue | 🟡 Minor

Only clear the periodic batch after a successful upsert.

If get_by_call_id(call_id) returns None, Line 881 still zeroes bytes_since_last_upsert, so that second of audio is permanently undercounted. The end-of-stream path already avoids dropping the batch until it is actually persisted.

♻️ Suggested change

                         bytes_since_last_upsert += len(chunk)
                         if time.time() - last_upsert_time >= 1.0:
                             sess = await self.session_store.get_by_call_id(call_id)
                             if sess:
                                 sess.streaming_bytes_sent += bytes_since_last_upsert
                                 sess.streaming_jitter_buffer_depth = jitter_buffer.qsize()
                                 await self.session_store.upsert_call(sess)
-                            bytes_since_last_upsert = 0
-                            last_upsert_time = time.time()
+                                bytes_since_last_upsert = 0
+                                last_upsert_time = time.time()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/core/streaming_playback_manager.py` around lines 875 - 882, The periodic
batch is being cleared regardless of persistence success, causing lost audio
counts; change the block using get_by_call_id(call_id) so that
bytes_since_last_upsert and last_upsert_time are only reset after a successful
fetch and await self.session_store.upsert_call(sess) completes (i.e., when sess
is truthy and upsert_call has returned); leave bytes_since_last_upsert and
last_upsert_time unchanged when sess is None (optionally log a warning) so the
next periodic tick will include the unpersisted bytes.

🧹 Nitpick comments (5)

src/pipelines/azure.py (2)

746-761: Cleanup exception handling is acceptable but could benefit from debug logging.

The try-except-pass blocks during push_stream.close() and recognizer.stop_continuous_recognition() are reasonable for cleanup code where we want to proceed regardless of errors. However, adding debug-level logging would help diagnose issues in production.

♻️ Optional: Add debug logging for cleanup failures

         if push_stream:
             try:
                 push_stream.close()
-            except Exception:
-                pass
+            except Exception as exc:
+                logger.debug("Failed to close Azure push stream", call_id=call_id, exc_info=exc)
                 
         # Stop recognizer
         recognizer = session.get("recognizer")
         if recognizer:
             try:
                 logger.debug("Stopping Azure SDK continuous recognition", call_id=call_id)
                 await asyncio.to_thread(recognizer.stop_continuous_recognition)
-            except Exception:
-                pass
+            except Exception as exc:
+                logger.debug("Failed to stop Azure recognizer", call_id=call_id, exc_info=exc)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 746 - 761, The cleanup try/except blocks
silently swallow exceptions; update them to log debug-level details so failures
are visible: around push_stream.close() capture the exception and call
logger.debug with context (e.g., "push_stream.close failed", include call_id and
the exception), and similarly around recognizer.stop_continuous_recognition (use
await asyncio.to_thread(recognizer.stop_continuous_recognition) inside the try
and on except call logger.debug with "recognizer.stop_continuous_recognition
failed", call_id, and exception); keep the exception suppressed after logging to
preserve current behavior and reference session/get("recognizer") where
applicable.

353-358: Consider debug-level logging for VAD exceptions.

The silent except: pass block at lines 357-358 could mask unexpected VAD library issues. While this shouldn't block the flow, a debug log would aid troubleshooting.

♻️ Optional: Add debug logging

                 try:
                     if self._vad.is_speech(frame, sample_rate_hz):
                         has_speech = True
                         break
-                except Exception:
-                    pass
+                except Exception as exc:
+                    logger.debug("VAD speech detection failed for frame", exc_info=exc)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 353 - 358, The silent except block
around self._vad.is_speech(frame, sample_rate_hz) swallows VAD errors; change it
to catch Exception as e and emit a debug-level log with the exception and
minimal context (e.g., sample_rate_hz and a short identifier for frame) before
continuing. Use the existing logger on the class if available (e.g.,
self._logger) or logging.getLogger(__name__) and call logger.debug(...) with the
exception info so the code still continues but failures are visible for
troubleshooting.

admin_ui/frontend/src/components/config/PipelineForm.tsx (1)

430-447: Mirror the backend lower bound in these new inputs.

These are minimum thresholds, but the controls still accept 0 and negative values. Adding min={1} keeps the form aligned with the server-side clamp and avoids confusing “saved 0, got 1” round-trips.

♻️ Suggested change

                             <FormInput
                                 label="LLM Min Words Threshold"
                                 type="number"
+                                min={1}
                                 value={localConfig.options?.llm?.aggregation_min_words ?? ''}
                                 onChange={(e) => updateRoleOptions('llm', { aggregation_min_words: e.target.value ? parseInt(e.target.value, 10) : undefined })}
                                 placeholder="Auto"
                                 tooltip="Minimum words to wait before sending transcript to LLM."
                                 disabled={!showLlmExpert}
                             />
                             <FormInput
                                 label="LLM Min Chars Threshold"
                                 type="number"
+                                min={1}
                                 value={localConfig.options?.llm?.aggregation_min_chars ?? ''}
                                 onChange={(e) => updateRoleOptions('llm', { aggregation_min_chars: e.target.value ? parseInt(e.target.value, 10) : undefined })}
                                 placeholder="Auto"
                                 tooltip="Minimum characters to wait before sending transcript to LLM."
                                 disabled={!showLlmExpert}
                             />

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 430 -
447, The two number inputs in PipelineForm.tsx for "LLM Min Words Threshold" and
"LLM Min Chars Threshold" accept 0/negative values but the backend clamps them;
update the two FormInput components (the ones bound to
localConfig.options?.llm?.aggregation_min_words and aggregation_min_chars and
using updateRoleOptions('llm', { aggregation_min_words/aggregation_min_chars:
... })) to include min={1} so the UI prevents 0/negative values and matches
server-side lower bound.

src/engine.py (1)

9100-9109: Extract downstream override resolution into one helper.

The same branching is duplicated in two hot paths; centralizing it will prevent divergence.

♻️ Refactor sketch

+    def _use_streaming_for_tts_adapter(self, tts_adapter: Any) -> bool:
+        mode = (getattr(tts_adapter, "downstream_mode_override", "auto") or "auto").strip().lower()
+        if mode == "stream":
+            return True
+        if mode == "file":
+            return False
+        return self.config.downstream_mode != "file"

Then replace both blocks with:

-                        _tts_dm_override = getattr(pipeline.tts_adapter, "downstream_mode_override", "auto") or "auto"
-                        ...
-                        if _tts_dm_override == "stream":
-                            use_streaming_playback = True
-                        elif _tts_dm_override == "file":
-                            use_streaming_playback = False
-                        else:
-                            use_streaming_playback = self.config.downstream_mode != "file"
+                        use_streaming_playback = self._use_streaming_for_tts_adapter(pipeline.tts_adapter)

Also applies to: 9627-9635

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/engine.py` around lines 9100 - 9109, Extract the repeated downstream-mode
resolution into a single helper (e.g., Engine.resolve_downstream_playback or
utils.get_effective_downstream_mode) that takes the pipeline (or tts_adapter and
global config) and returns either the resolved mode string
("stream"|"file"|"auto") or the boolean use_streaming_playback; move the logic
that reads getattr(pipeline.tts_adapter, "downstream_mode_override", "auto") or
"auto" and the branch that maps "stream"/"file"/other to a playback decision
into that helper, replace both duplicated blocks (the one using
pipeline.tts_adapter.downstream_mode_override and the one at the other hot path)
with a single call to the helper, and ensure the helper uses
pipeline.tts_adapter.__class__.__name__ only for logging so behavior is
unchanged.

src/pipelines/orchestrator.py (1)

1528-1544: Add Azure provider hint for better user experience.

The _format_placeholder_error method provides helpful hints for misconfigured providers (OpenAI, Google, Deepgram, etc.), but Azure is missing. When users encounter Azure placeholder errors, they won't get guidance like other providers.

✨ Proposed fix to add Azure hint

         elif provider == "minimax":
             hints.append("Set MINIMAX_API_KEY and configure providers.minimax_llm.")
+        elif provider == "azure":
+            hints.append("Set AZURE_SPEECH_KEY and configure providers.azure_stt/providers.azure_tts.")
 
         hint = f" Hint: {' '.join(hints)}" if hints else ""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/orchestrator.py` around lines 1528 - 1544, In
_format_placeholder_error, the provider-specific hints list is missing guidance
for Azure: locate the block that builds hints (variable hints and parameter
provider) and add an elif branch for provider == "azure" that appends a clear
hint (e.g., "Set AZURE_API_KEY or
AZURE_TENANT_ID/AZURE_CLIENT_ID/AZURE_CLIENT_SECRET and configure
providers.azure_* (llm/stt/tts) as needed"). Ensure the new branch follows the
same pattern and wording style as the other provider branches so Azure users
receive the same configuration guidance.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 75-79: The effect in PipelineForm.tsx is watching the wrong STT
option key (vad_silence_ms) so saved Azure timeout overrides appear unset;
update the condition and dependency array to check for the actual keys used by
the backend (config?.options?.stt?.vad_silence_timeout_ms and/or
config?.options?.stt?.vad_initial_silence_timeout_ms) along with
timestamp_granularities, and keep calling setShowSttExpert(true) when either
timeout or timestamp_granularities is present so the expert STT section
correctly shows on reload.

In `@src/engine.py`:
- Line 9250: The transcript queue is being assigned to
self._pipeline_transcript_queues[call_id] too late so ChannelTalkingFinished may
call flush_speech() before the queue exists and transcripts are dropped; move
the initialization of transcript_queue into the earlier setup path (before any
events that can trigger ChannelTalkingFinished/flush_speech()), ensuring
self._pipeline_transcript_queues[call_id] is created immediately when call_id is
allocated (affecting the code around where transcript_queue is constructed and
referenced—ensure the same early initialization is applied to the code paths
touched in the region around the current _pipeline_transcript_queues assignment
and the code covered by the earlier block that spans the flush_speech() call).
- Around line 10044-10046: The current parsing of
pipeline.llm_options["aggregation_min_words"] and ["aggregation_min_chars"] can
raise on non-numeric values; update the min_words and min_chars computation in
the block that sets min_words, min_chars, and threshold_met to defensively parse
values (wrap parsing in try/except catching ValueError/TypeError), coerce valid
numeric strings to int, fall back to the existing defaults (3 for words, 12 for
chars) on error, then enforce min 1 via max(1, parsed) before computing
threshold_met = words >= min_words or chars >= min_chars; reference the
variables min_words, min_chars, pipeline.llm_options, and threshold_met when
making the change.

In `@src/pipelines/azure.py`:
- Around line 29-32: The type annotation aiohttp.ClientSession is evaluated at
import time and fails when aiohttp is None; fix by making those annotations
lazy: add "from typing import TYPE_CHECKING, Any" and either (a) wrap "import
aiohttp" inside an if TYPE_CHECKING block and keep annotations as
aiohttp.ClientSession, or (b) change the annotations that reference
aiohttp.ClientSession to the string form "aiohttp.ClientSession", or (c) fall
back to Any (e.g., session: Any) when aiohttp is not available; update every
function/method signature and attribute annotation that currently uses
aiohttp.ClientSession (the places noted in the review) so the module can import
without aiohttp installed.

In `@src/pipelines/openai.py`:
- Around line 480-488: The debug log in the OpenAI chat completion block is
exposing raw message content via last_message_preview and assumes each message
is a dict; change the logger.debug call (where logger.debug is invoked with
call_id, model, temperature, tools_count, messages_count) to stop including raw
content and instead log metadata only: for the last message, safely check type
and extract role and content length (e.g., role=str(msg.get("role")) if
isinstance(msg, dict) else "unknown", length=len(msg.get("content","")) if
isinstance(msg, dict) and isinstance(msg.get("content",""), str) else 0) so you
avoid .get crashes and eliminate PII risk.

---

Duplicate comments:
In `@src/core/streaming_playback_manager.py`:
- Around line 875-882: The periodic batch is being cleared regardless of
persistence success, causing lost audio counts; change the block using
get_by_call_id(call_id) so that bytes_since_last_upsert and last_upsert_time are
only reset after a successful fetch and await
self.session_store.upsert_call(sess) completes (i.e., when sess is truthy and
upsert_call has returned); leave bytes_since_last_upsert and last_upsert_time
unchanged when sess is None (optionally log a warning) so the next periodic tick
will include the unpersisted bytes.

---

Nitpick comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 430-447: The two number inputs in PipelineForm.tsx for "LLM Min
Words Threshold" and "LLM Min Chars Threshold" accept 0/negative values but the
backend clamps them; update the two FormInput components (the ones bound to
localConfig.options?.llm?.aggregation_min_words and aggregation_min_chars and
using updateRoleOptions('llm', { aggregation_min_words/aggregation_min_chars:
... })) to include min={1} so the UI prevents 0/negative values and matches
server-side lower bound.

In `@src/engine.py`:
- Around line 9100-9109: Extract the repeated downstream-mode resolution into a
single helper (e.g., Engine.resolve_downstream_playback or
utils.get_effective_downstream_mode) that takes the pipeline (or tts_adapter and
global config) and returns either the resolved mode string
("stream"|"file"|"auto") or the boolean use_streaming_playback; move the logic
that reads getattr(pipeline.tts_adapter, "downstream_mode_override", "auto") or
"auto" and the branch that maps "stream"/"file"/other to a playback decision
into that helper, replace both duplicated blocks (the one using
pipeline.tts_adapter.downstream_mode_override and the one at the other hot path)
with a single call to the helper, and ensure the helper uses
pipeline.tts_adapter.__class__.__name__ only for logging so behavior is
unchanged.

In `@src/pipelines/azure.py`:
- Around line 746-761: The cleanup try/except blocks silently swallow
exceptions; update them to log debug-level details so failures are visible:
around push_stream.close() capture the exception and call logger.debug with
context (e.g., "push_stream.close failed", include call_id and the exception),
and similarly around recognizer.stop_continuous_recognition (use await
asyncio.to_thread(recognizer.stop_continuous_recognition) inside the try and on
except call logger.debug with "recognizer.stop_continuous_recognition failed",
call_id, and exception); keep the exception suppressed after logging to preserve
current behavior and reference session/get("recognizer") where applicable.
- Around line 353-358: The silent except block around self._vad.is_speech(frame,
sample_rate_hz) swallows VAD errors; change it to catch Exception as e and emit
a debug-level log with the exception and minimal context (e.g., sample_rate_hz
and a short identifier for frame) before continuing. Use the existing logger on
the class if available (e.g., self._logger) or logging.getLogger(__name__) and
call logger.debug(...) with the exception info so the code still continues but
failures are visible for troubleshooting.

In `@src/pipelines/orchestrator.py`:
- Around line 1528-1544: In _format_placeholder_error, the provider-specific
hints list is missing guidance for Azure: locate the block that builds hints
(variable hints and parameter provider) and add an elif branch for provider ==
"azure" that appends a clear hint (e.g., "Set AZURE_API_KEY or
AZURE_TENANT_ID/AZURE_CLIENT_ID/AZURE_CLIENT_SECRET and configure
providers.azure_* (llm/stt/tts) as needed"). Ensure the new branch follows the
same pattern and wording style as the other provider branches so Azure users
receive the same configuration guidance.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4a817814-2ae7-46fc-a7f0-c50ac015ad87

📥 Commits

Reviewing files that changed from the base of the PR and between 0efb99b and 2a901df.

📒 Files selected for processing (12)

admin_ui/backend/api/config.py
admin_ui/frontend/src/components/config/PipelineForm.tsx
admin_ui/frontend/src/pages/ProvidersPage.tsx
config/ai-agent.yaml
src/config.py
src/config/__init__.py
src/config/security.py
src/core/streaming_playback_manager.py
src/engine.py
src/pipelines/azure.py
src/pipelines/openai.py
src/pipelines/orchestrator.py

🚧 Files skipped from review as they are similar to previous changes (3)

src/config/security.py
src/config/init.py
config/ai-agent.yaml

coderabbitai · 2026-03-12T22:09:21Z

admin_ui/frontend/src/components/config/PipelineForm.tsx

    useEffect(() => {
-        if (Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) {
+        if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) || config?.options?.stt?.vad_silence_ms !== undefined) {
            setShowSttExpert(true);
        }
-    }, [config?.options?.stt?.timestamp_granularities]);
+    }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms]);


⚠️ Potential issue | 🟡 Minor

This hook is watching the wrong Azure STT option key.

src/config.py exposes Azure timeout overrides as vad_silence_timeout_ms and vad_initial_silence_timeout_ms, but this effect is keyed off options.stt.vad_silence_ms. Any saved Azure timeout override will be treated as “unset” when the form reloads.

♻️ Suggested change

- if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) || config?.options?.stt?.vad_silence_ms !== undefined) { + if ( + (Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) || + config?.options?.stt?.vad_silence_timeout_ms !== undefined || + config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined + ) { setShowSttExpert(true); } - }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms]); + }, [ + config?.options?.stt?.timestamp_granularities, + config?.options?.stt?.vad_silence_timeout_ms, + config?.options?.stt?.vad_initial_silence_timeout_ms, + ]);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

useEffect(() => {

if (Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) {

if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) || config?.options?.stt?.vad_silence_ms !== undefined) {

setShowSttExpert(true);

}

}, [config?.options?.stt?.timestamp_granularities]);

}, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms]);

useEffect(() => {

if (

(Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) ||

config?.options?.stt?.vad_silence_timeout_ms !== undefined ||

config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined

) {

setShowSttExpert(true);

}

}, [

config?.options?.stt?.timestamp_granularities,

config?.options?.stt?.vad_silence_timeout_ms,

config?.options?.stt?.vad_initial_silence_timeout_ms,

]);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 75 - 79, The effect in PipelineForm.tsx is watching the wrong STT option key (vad_silence_ms) so saved Azure timeout overrides appear unset; update the condition and dependency array to check for the actual keys used by the backend (config?.options?.stt?.vad_silence_timeout_ms and/or config?.options?.stt?.vad_initial_silence_timeout_ms) along with timestamp_granularities, and keep calling setShowSttExpert(true) when either timeout or timestamp_granularities is present so the expert STT section correctly shows on reload.

src/engine.py

src/pipelines/azure.py

src/pipelines/openai.py

- Add `from __future__ import annotations` to azure.py to prevent AttributeError when aiohttp is not installed (type annotations referencing aiohttp.ClientSession are deferred by PEP 563) - Initialize _pipeline_transcript_queues earlier in _ensure_pipeline_runner to prevent race condition where ChannelTalkingFinished flush drops transcripts before _pipeline_runner creates the queue - Guard aggregation threshold parsing with try/except to handle non-numeric config values without crashing the turn loop - Expand PipelineForm STT expert useEffect dependencies to include Azure-specific keys (variant, vad_silence_timeout_ms) so expert panel auto-opens when Azure overrides are saved - Remove message content from OpenAI debug log (PII risk) and make last-message extraction type-safe for non-dict message items Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

admin_ui/frontend/src/components/config/PipelineForm.tsx (1)

76-82: ⚠️ Potential issue | 🟡 Minor

Include vad_initial_silence_timeout_ms in STT expert auto-open checks.

Line 76 and Line 82 still miss config?.options?.stt?.vad_initial_silence_timeout_ms, so saved Azure initial-silence overrides may not auto-show expert mode on reload.

♻️ Suggested fix

-        if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0)
-            || config?.options?.stt?.vad_silence_ms !== undefined
-            || config?.options?.stt?.variant !== undefined
-            || config?.options?.stt?.vad_silence_timeout_ms !== undefined) {
+        if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0)
+            || config?.options?.stt?.vad_silence_ms !== undefined
+            || config?.options?.stt?.variant !== undefined
+            || config?.options?.stt?.vad_silence_timeout_ms !== undefined
+            || config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined) {
             setShowSttExpert(true);
         }
-    }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms, config?.options?.stt?.variant, config?.options?.stt?.vad_silence_timeout_ms]);
+    }, [
+        config?.options?.stt?.timestamp_granularities,
+        config?.options?.stt?.vad_silence_ms,
+        config?.options?.stt?.variant,
+        config?.options?.stt?.vad_silence_timeout_ms,
+        config?.options?.stt?.vad_initial_silence_timeout_ms,
+    ]);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 76 -
82, The STT expert auto-open condition misses checking
config?.options?.stt?.vad_initial_silence_timeout_ms, so saved Azure
initial-silence overrides won't trigger setShowSttExpert(true) on load; update
the conditional inside the effect (the expression that currently checks
timestamp_granularities, vad_silence_ms, variant, vad_silence_timeout_ms) to
also include config?.options?.stt?.vad_initial_silence_timeout_ms and add it to
the effect dependency array so the effect re-runs when
vad_initial_silence_timeout_ms changes.

src/pipelines/azure.py (1)

1014-1016: ⚠️ Potential issue | 🟠 Major

Raw A-law/μ-law streaming path still has payload-corruption edges.

On Line 1014, odd-byte carry is applied to all formats, including raw μ-law/A-law (1 byte/sample). Also, on Lines 1025-1038, raw A-law falls into PCM conversion when target_encoding != "alaw".

Proposed fix

-                    if len(pcm_payload) % 2 != 0:
+                    is_companded_raw = is_raw_mulaw or is_raw_alaw
+                    if (not is_companded_raw) and len(pcm_payload) % 2 != 0:
                         leftover_byte = pcm_payload[-1:]
                         pcm_payload = pcm_payload[:-1]
@@
-                    if (is_raw_mulaw and target_encoding == "mulaw") or (is_raw_alaw and target_encoding == "alaw"):
+                    if (is_raw_mulaw and target_encoding == "mulaw") or (is_raw_alaw and target_encoding == "alaw"):
                         # Already in target format — yield directly
                         converted = pcm_payload
+                    elif is_raw_alaw:
+                        raise RuntimeError(
+                            "Azure TTS raw A-law cannot be converted through PCM path. "
+                            "Use target_encoding='alaw' or choose PCM/μ-law output_format."
+                        )
                     else:
                         # PCM16 LE raw bytes — determine source sample rate from format string
                         audio_bytes = pcm_payload
@@
                 return

Also applies to: 1021-1038

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 1014 - 1016, The leftover-byte
truncation and PCM conversion are being applied indiscriminately to raw
A-law/μ-law data: restrict the odd-byte carry logic around
pcm_payload/leftover_byte so it only runs for true PCM formats (e.g., when
bytes_per_sample > 1 or source_encoding in a pcm_encodings set), and ensure the
branch that handles raw A-law/μ-law (check target_encoding == "alaw" or
source_encoding == "alaw"/"ulaw") bypasses the PCM conversion block (the code
currently around pcm_payload and the conversion block using target_encoding !=
"alaw"); update the conditionals so raw alaw/ulaw follow their dedicated path
and never fall into the PCM conversion logic.

🧹 Nitpick comments (3)

admin_ui/frontend/src/components/config/PipelineForm.tsx (1)

42-44: Auto-enable TTS expert mode for saved Azure TTS overrides.

showTtsExpert auto-open logic only checks OpenAI/Groq keys. If Azure voice_name/output_format is saved, expert mode won’t auto-enable on reload.

♻️ Suggested parity update

     const [showTtsExpert, setShowTtsExpert] = useState<boolean>(
-        () => config?.options?.tts?.response_format !== undefined || config?.options?.tts?.max_input_chars !== undefined
+        () =>
+            config?.options?.tts?.response_format !== undefined ||
+            config?.options?.tts?.max_input_chars !== undefined ||
+            config?.options?.tts?.voice_name !== undefined ||
+            config?.options?.tts?.output_format !== undefined
     );

     useEffect(() => {
-        if (config?.options?.tts?.response_format !== undefined || config?.options?.tts?.max_input_chars !== undefined) {
+        if (
+            config?.options?.tts?.response_format !== undefined ||
+            config?.options?.tts?.max_input_chars !== undefined ||
+            config?.options?.tts?.voice_name !== undefined ||
+            config?.options?.tts?.output_format !== undefined
+        ) {
             setShowTtsExpert(true);
         }
-    }, [config?.options?.tts?.response_format, config?.options?.tts?.max_input_chars]);
+    }, [
+        config?.options?.tts?.response_format,
+        config?.options?.tts?.max_input_chars,
+        config?.options?.tts?.voice_name,
+        config?.options?.tts?.output_format,
+    ]);

Also applies to: 85-88, 641-658

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 42 -
44, The showTtsExpert initial state only checks OpenAI/Groq-related fields;
update the initializer and any related toggling logic (where setShowTtsExpert is
used) to also consider saved Azure TTS overrides by checking
config?.options?.tts?.voice_name and config?.options?.tts?.output_format (or any
Azure-specific keys used elsewhere); ensure the same parity change is applied to
the other places mentioned (the other showTtsExpert checks around the component
and any form load logic) so that expert mode auto-enables when Azure TTS
settings are present.

src/engine.py (1)

9103-9112: Extract downstream-mode resolution into a shared helper.

This logic is duplicated in two hot paths and can drift. A small helper (e.g., _resolve_tts_streaming_mode(tts_adapter)) will keep behavior consistent.

Also applies to: 9631-9639

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/engine.py` around lines 9103 - 9112, Extract the duplicated
downstream-mode resolution into a single helper function named
_resolve_tts_streaming_mode(tts_adapter) that returns a boolean (True for
streaming playback, False for file playback); inside the helper replicate the
existing logic: read getattr(tts_adapter, "downstream_mode_override", "auto") or
"auto", evaluate "stream" -> True, "file" -> False, otherwise consult
self.config.downstream_mode != "file" (pass self.config or make helper a method
if it needs access to self); then replace the duplicate blocks around
pipeline.tts_adapter and use_streaming_playback with a call to
_resolve_tts_streaming_mode(pipeline.tts_adapter) to ensure consistent behavior
across both locations.

src/pipelines/azure.py (1)

748-751: Don’t silently swallow stream teardown errors.

except ...: pass on Lines 750-751 and 761-762 hides cleanup failures and makes stuck-stream incidents hard to debug.

Proposed fix

         if push_stream:
             try:
                 push_stream.close()
-            except Exception:
-                pass
+            except Exception:
+                logger.debug("Azure SDK push_stream.close failed", call_id=call_id, exc_info=True)
@@
             try:
                 logger.debug("Stopping Azure SDK continuous recognition", call_id=call_id)
                 await asyncio.to_thread(recognizer.stop_continuous_recognition)
-            except Exception:
-                pass
+            except Exception:
+                logger.debug("Azure SDK recognizer stop failed", call_id=call_id, exc_info=True)

Also applies to: 760-762

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 748 - 751, Replace the bare "except:
pass" around push_stream.close() with explicit error handling: catch Exception
as e and log the exception (including stack trace) with the module/logger used
in this file (e.g., logger.exception or similar) and include contextual info
(stream name, request id, etc.), and do the same for the other symmetric
teardown block (the similar close() call at lines ~760-762) so teardown failures
are visible instead of silently swallowed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 433-450: The onChange handlers in PipelineForm.tsx for the LLM
thresholds (FormInput for aggregation_min_words and aggregation_min_chars) pass
parseInt results directly to updateRoleOptions, allowing NaN, zero or negative
values to be saved; change each onChange to: read e.target.value, if empty set
undefined, else parseInt and if the result is a finite integer > 0 keep it,
otherwise set undefined (or leave previous valid), then call
updateRoleOptions('llm', { aggregation_min_words: ... }) /
updateRoleOptions('llm', { aggregation_min_chars: ... }) accordingly; ensure you
reference the localConfig.options?.llm fields and respect showLlmExpert
disabling.

In `@src/engine.py`:
- Line 4420: The log call that currently includes transcript content —
logger.debug("Early STT flush returned transcript", call_id=call_id,
transcript_preview=transcript[:50]) — must stop emitting PII; replace the
transcript_preview field with metadata only (e.g., transcript_length =
len(transcript), is_truncated = True/False, and an optional pii_detected flag)
so logs contain size/flags instead of actual transcript text; update the
logger.debug invocation accordingly and ensure any tests/consumers use the new
metadata fields.

In `@src/pipelines/azure.py`:
- Around line 470-476: Remove the transcript preview from logs to avoid leaking
PII: in the Azure STT logging calls that use logger.info with keys call_id,
request_id, latency_ms, transcript_preview and the transcript variable, stop
including transcript_preview (or replace it with a non-sensitive field such as
transcript_length or a redacted boolean) and update the logger.info calls
accordingly; apply the exact same change to the other occurrence that mirrors
this pattern so neither location logs transcript/text previews.
- Around line 361-378: The buffer is allowed to grow during long silence because
audio is always appended; change logic in the block using _audio_buffer,
_is_speaking, has_speech, bytes_per_frame and _max_silence_frames so you only
extend _audio_buffer when speech is ongoing (self._is_speaking) or when
has_speech is True (to start capturing speech), and when neither is true
trim/limit the buffer to a small pre-roll cap (e.g., max bytes = bytes_per_frame
* self._max_silence_frames) instead of unbounded extend; update
_buffer_sample_rate as before and keep the existing silence counting
(_silence_frames) and send/clear behavior unchanged.

---

Duplicate comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 76-82: The STT expert auto-open condition misses checking
config?.options?.stt?.vad_initial_silence_timeout_ms, so saved Azure
initial-silence overrides won't trigger setShowSttExpert(true) on load; update
the conditional inside the effect (the expression that currently checks
timestamp_granularities, vad_silence_ms, variant, vad_silence_timeout_ms) to
also include config?.options?.stt?.vad_initial_silence_timeout_ms and add it to
the effect dependency array so the effect re-runs when
vad_initial_silence_timeout_ms changes.

In `@src/pipelines/azure.py`:
- Around line 1014-1016: The leftover-byte truncation and PCM conversion are
being applied indiscriminately to raw A-law/μ-law data: restrict the odd-byte
carry logic around pcm_payload/leftover_byte so it only runs for true PCM
formats (e.g., when bytes_per_sample > 1 or source_encoding in a pcm_encodings
set), and ensure the branch that handles raw A-law/μ-law (check target_encoding
== "alaw" or source_encoding == "alaw"/"ulaw") bypasses the PCM conversion block
(the code currently around pcm_payload and the conversion block using
target_encoding != "alaw"); update the conditionals so raw alaw/ulaw follow
their dedicated path and never fall into the PCM conversion logic.

---

Nitpick comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 42-44: The showTtsExpert initial state only checks
OpenAI/Groq-related fields; update the initializer and any related toggling
logic (where setShowTtsExpert is used) to also consider saved Azure TTS
overrides by checking config?.options?.tts?.voice_name and
config?.options?.tts?.output_format (or any Azure-specific keys used elsewhere);
ensure the same parity change is applied to the other places mentioned (the
other showTtsExpert checks around the component and any form load logic) so that
expert mode auto-enables when Azure TTS settings are present.

In `@src/engine.py`:
- Around line 9103-9112: Extract the duplicated downstream-mode resolution into
a single helper function named _resolve_tts_streaming_mode(tts_adapter) that
returns a boolean (True for streaming playback, False for file playback); inside
the helper replicate the existing logic: read getattr(tts_adapter,
"downstream_mode_override", "auto") or "auto", evaluate "stream" -> True, "file"
-> False, otherwise consult self.config.downstream_mode != "file" (pass
self.config or make helper a method if it needs access to self); then replace
the duplicate blocks around pipeline.tts_adapter and use_streaming_playback with
a call to _resolve_tts_streaming_mode(pipeline.tts_adapter) to ensure consistent
behavior across both locations.

In `@src/pipelines/azure.py`:
- Around line 748-751: Replace the bare "except: pass" around
push_stream.close() with explicit error handling: catch Exception as e and log
the exception (including stack trace) with the module/logger used in this file
(e.g., logger.exception or similar) and include contextual info (stream name,
request id, etc.), and do the same for the other symmetric teardown block (the
similar close() call at lines ~760-762) so teardown failures are visible instead
of silently swallowed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c3171372-c54a-4729-aa80-0f81cd958642

📥 Commits

Reviewing files that changed from the base of the PR and between 2a901df and 20df03a.

📒 Files selected for processing (4)

admin_ui/frontend/src/components/config/PipelineForm.tsx
src/engine.py
src/pipelines/azure.py
src/pipelines/openai.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/pipelines/openai.py

admin_ui/frontend/src/components/config/PipelineForm.tsx

coderabbitai · 2026-03-12T22:24:22Z

src/engine.py

+                                    timeout=5,
+                                )
+                                if transcript:
+                                    logger.debug("Early STT flush returned transcript", call_id=call_id, transcript_preview=transcript[:50])


⚠️ Potential issue | 🟠 Major

Avoid logging transcript content in engine logs.

Line 4420 logs transcript_preview, which can leak PII/sensitive caller data into logs. Prefer metadata (length/flags) instead of content.

🔧 Proposed fix

- logger.debug("Early STT flush returned transcript", call_id=call_id, transcript_preview=transcript[:50]) + logger.debug( + "Early STT flush returned transcript", + call_id=call_id, + transcript_len=len(transcript), + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.debug("Early STT flush returned transcript", call_id=call_id, transcript_preview=transcript[:50])

logger.debug(

"Early STT flush returned transcript",

call_id=call_id,

transcript_len=len(transcript),

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engine.py` at line 4420, The log call that currently includes transcript content — logger.debug("Early STT flush returned transcript", call_id=call_id, transcript_preview=transcript[:50]) — must stop emitting PII; replace the transcript_preview field with metadata only (e.g., transcript_length = len(transcript), is_truncated = True/False, and an optional pii_detected flag) so logs contain size/flags instead of actual transcript text; update the logger.debug invocation accordingly and ensure any tests/consumers use the new metadata fields.

src/pipelines/azure.py

…put hardening - Remove transcript/text preview strings from logs (PII risk): replace transcript_preview and text_preview with metadata-only fields (transcript_len, transcript_chars, text_chars) in engine.py and azure.py - Guard pre-speech audio buffer: only extend _audio_buffer when speech is detected or speaking is active, preventing unbounded memory growth during long silences - Harden LLM threshold inputs in PipelineForm: add min=1, step=1 attrs and parseInt guards with Number.isFinite + Math.max(1, parsed) clamping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

admin_ui/frontend/src/components/config/PipelineForm.tsx (1)

76-83: ⚠️ Potential issue | 🟡 Minor

Include vad_initial_silence_timeout_ms in STT expert auto-open detection.

Line 76-83 still won’t auto-open STT expert mode when only options.stt.vad_initial_silence_timeout_ms is present in saved config.

♻️ Suggested patch

-        if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0)
-            || config?.options?.stt?.vad_silence_ms !== undefined
-            || config?.options?.stt?.variant !== undefined
-            || config?.options?.stt?.vad_silence_timeout_ms !== undefined) {
+        if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0)
+            || config?.options?.stt?.vad_silence_ms !== undefined
+            || config?.options?.stt?.variant !== undefined
+            || config?.options?.stt?.vad_silence_timeout_ms !== undefined
+            || config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined) {
             setShowSttExpert(true);
         }
-    }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms, config?.options?.stt?.variant, config?.options?.stt?.vad_silence_timeout_ms]);
+    }, [
+        config?.options?.stt?.timestamp_granularities,
+        config?.options?.stt?.vad_silence_ms,
+        config?.options?.stt?.variant,
+        config?.options?.stt?.vad_silence_timeout_ms,
+        config?.options?.stt?.vad_initial_silence_timeout_ms,
+    ]);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 76 -
83, The effect that auto-opens STT expert mode is missing checks for
options.stt.vad_initial_silence_timeout_ms; update the conditional that sets
setShowSttExpert(true) to also check for
config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined (and add that
same property to the effect dependency array) so the form will auto-open when
only vad_initial_silence_timeout_ms is present; refer to the existing
setShowSttExpert call and the config?.options?.stt timestamp/vad checks to
locate where to add the new property.

🧹 Nitpick comments (6)

admin_ui/frontend/src/components/config/PipelineForm.tsx (1)

666-672: Prefer a select for Azure TTS output format override (not free text).

Free-text here is error-prone and can create invalid runtime configs. Reusing the curated format options (as in AzureProviderForm.tsx) would reduce misconfiguration risk.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 666 -
672, Replace the free-text input for "Azure TTS Output Format Override" with a
select that reuses the curated format options from AzureProviderForm (the same
list of allowed output formats), binding the selected value to
localConfig.options?.tts?.output_format and calling updateRoleOptions('tts', {
output_format: selectedValue || undefined }) on change; ensure an empty/cleared
selection maps to undefined and keep the disabled state controlled by
showTtsExpert so the field is disabled when showTtsExpert is false.

src/pipelines/azure.py (4)

754-768: Cleanup exceptions silently swallowed.

Both push_stream.close() and recognizer.stop_continuous_recognition exceptions are caught and ignored. While cleanup shouldn't raise to callers, logging at debug level aids troubleshooting.

♻️ Proposed fix

         if push_stream:
             try:
                 push_stream.close()
-            except Exception:
-                pass
+            except Exception as exc:
+                logger.debug("Failed to close Azure push stream", call_id=call_id, exc_info=exc)

         # Stop recognizer
         recognizer = session.get("recognizer")
         if recognizer:
             try:
                 # Stop blocks until final processed segments finish, which may take seconds.
                 # In asyncio context, best to run it in thread to avoid blocking loop if slow.
                 logger.debug("Stopping Azure SDK continuous recognition", call_id=call_id)
                 await asyncio.to_thread(recognizer.stop_continuous_recognition)
-            except Exception:
-                pass
+            except Exception as exc:
+                logger.debug("Failed to stop Azure recognizer", call_id=call_id, exc_info=exc)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 754 - 768, The cleanup currently
swallows exceptions for push_stream.close() and
recognizer.stop_continuous_recognition; modify both except blocks to capture the
exception (e.g., except Exception as e) and log the error details at debug level
using the existing logger (include context like call_id and exc_info or the
exception message) instead of silently passing so failures during cleanup are
recorded for troubleshooting while still not propagating.

153-160: Incomplete XML escaping in SSML builder.

The SSML text escaping handles the five XML special characters, but the voice_name, lang, lang_tag, prosody_pitch, and prosody_rate values are interpolated directly into attribute positions without escaping. While these values typically come from configuration rather than user input, a malformed voice name could break or inject into the SSML.

♻️ Consider escaping attribute values

+def _escape_attr(val: str) -> str:
+    """Escape a value for use in an XML attribute (single-quoted)."""
+    return val.replace("&", "&amp;").replace("'", "&apos;").replace("<", "&lt;")
+
 def _build_ssml(
     text: str,
     voice_name: str,
     language: str,
     prosody_pitch: Optional[str] = None,
     prosody_rate: Optional[str] = None,
     lang_tag: Optional[str] = None,
 ) -> str:
     ...
-    lang = language or ("-".join(voice_name.split("-")[:2]) if "-" in voice_name else "en-US")
+    lang = _escape_attr(language or ("-".join(voice_name.split("-")[:2]) if "-" in voice_name else "en-US"))
+    voice_name = _escape_attr(voice_name)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 153 - 160, The SSML builder escapes text
content but not attribute values; update the code that constructs the SSML
(where safe_text is used and where voice_name, lang, lang_tag, prosody_pitch,
and prosody_rate are interpolated) to escape attribute values as well. Add or
reuse an XML attribute-escaping helper (e.g., escape_xml_attr similar to
safe_text but ensuring quotes and other XML entities are encoded) and apply it
to voice_name, lang, lang_tag, prosody_pitch, and prosody_rate before
interpolation into attributes so malformed names cannot break or inject into the
SSML.

354-359: Consider logging VAD frame exceptions at debug level.

The VAD is_speech call can fail on malformed frames, but silently swallowing exceptions makes debugging audio issues harder. A debug-level log would help troubleshooting without cluttering production logs.

♻️ Proposed fix

                 try:
                     if self._vad.is_speech(frame, sample_rate_hz):
                         has_speech = True
                         break
-                except Exception:
-                    pass
+                except Exception as exc:
+                    logger.debug("VAD is_speech check failed", frame_offset=offset, exc_info=exc)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 354 - 359, The code currently swallows
exceptions from self._vad.is_speech; change the bare except to catch Exception
as e and emit a debug-level log before continuing so malformed frames are
diagnosable. Specifically, in the loop where self._vad.is_speech(frame,
sample_rate_hz) is called (and has_speech is set), replace "except Exception:
pass" with "except Exception as e: <logger>.debug(...)" and include the
exception string and minimal context (e.g., sample_rate_hz, len(frame) or frame
metadata) in the message; keep the existing control flow (do not re-raise) so
has_speech behavior is unchanged. Use the module or instance logger used
elsewhere in the file (e.g., logger or self._logger) to ensure consistent
logging.

908-910: Unreachable code after early return.

The yield statement on line 910 is unreachable because line 909 returns early. This is a quirk needed to make the function an async generator, but could be clearer.

♻️ Clearer pattern

     async def synthesize(
         self,
         call_id: str,
         text: str,
         options: Dict[str, Any],
     ) -> AsyncIterator[bytes]:
         if not text:
-            return
-            yield  # make this an async generator
+            return  # Early exit for empty text
+            yield b""  # type: ignore[misc]  # unreachable; makes this an async generator

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 908 - 910, The early return makes the
subsequent yield unreachable; replace this unclear pattern by removing the
unreachable yield and explicitly preserve the async-generator signature with a
clear no-op yield guard (e.g., place a conditional no-op yield like an "if
False: yield" / dummy yield at function start) so the function remains an async
generator while keeping the early "if not text: return" path; update the
function containing the "text" parameter accordingly (replace the unreachable
"yield" after the return with the no-op yield guard).

src/engine.py (1)

9103-9112: Extract duplicated TTS downstream-mode resolution into one helper.

The same override block appears in greeting and main turn synthesis paths. Centralizing it would prevent drift.

♻️ Suggested refactor

+    def _resolve_tts_use_streaming_playback(self, tts_adapter: Any) -> bool:
+        override = getattr(tts_adapter, "downstream_mode_override", "auto") or "auto"
+        logger.debug(
+            "TTS downstream mode override evaluated",
+            override=override,
+            adapter=tts_adapter.__class__.__name__,
+        )
+        if override == "stream":
+            return True
+        if override == "file":
+            return False
+        return self.config.downstream_mode != "file"

Then replace both duplicated blocks with:

-                        _tts_dm_override = getattr(pipeline.tts_adapter, "downstream_mode_override", "auto") or "auto"
-                        logger.debug(f"TTS Adapter DM Override evaluated as: {_tts_dm_override} on adapter {pipeline.tts_adapter.__class__.__name__}")
-                        if _tts_dm_override == "stream":
-                            use_streaming_playback = True
-                        elif _tts_dm_override == "file":
-                            use_streaming_playback = False
-                        else:
-                            use_streaming_playback = self.config.downstream_mode != "file"
+                        use_streaming_playback = self._resolve_tts_use_streaming_playback(pipeline.tts_adapter)

Also applies to: 9631-9639

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/engine.py` around lines 9103 - 9112, Extract the duplicated TTS
downstream-mode resolution into a single helper (e.g.,
resolve_tts_downstream_mode or get_effective_tts_downstream_mode) that takes the
pipeline instance (or pipeline.tts_adapter) and self.config as inputs, reads
pipeline.tts_adapter.downstream_mode_override using getattr with a default of
"auto", logs the evaluated override (include adapter class name), and returns a
boolean for use_streaming_playback (True if override == "stream", False if
override == "file", else compare self.config.downstream_mode != "file"); then
replace both duplicated blocks (the one using
pipeline.tts_adapter.downstream_mode_override in greeting and main synthesis
paths) with calls to this helper and assign its boolean return to
use_streaming_playback.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pipelines/azure.py`:
- Around line 1060-1070: The non-streaming decode branch only passthroughs
"mulaw" but not "alaw", causing raw A-law audio to fall through incorrectly;
update the conditional in the non-streaming path that currently checks
native_encoding == "mulaw" to also allow "alaw" (e.g., native_encoding in
("mulaw", "alaw")) and passthrough when target_encoding matches, and mirror any
logging/handling used in the streaming path; verify related helpers
(_to_target_format, resample_audio) remain unchanged and update any tests that
expect A-law passthrough.
- Around line 696-697: The call to
recognizer.start_continuous_recognition_async() returns a ResultFuture that must
be awaited to ensure startup completed and to surface local init errors; update
the code where recognizer.start_continuous_recognition_async() is invoked to
capture its return (e.g., result_future =
recognizer.start_continuous_recognition_async()) and call result_future.get()
(synchronously) after wiring the event handlers (such as the canceled handler)
so the method blocks until continuous recognition has actually started and any
immediate errors are raised.

---

Duplicate comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 76-83: The effect that auto-opens STT expert mode is missing
checks for options.stt.vad_initial_silence_timeout_ms; update the conditional
that sets setShowSttExpert(true) to also check for
config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined (and add that
same property to the effect dependency array) so the form will auto-open when
only vad_initial_silence_timeout_ms is present; refer to the existing
setShowSttExpert call and the config?.options?.stt timestamp/vad checks to
locate where to add the new property.

---

Nitpick comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 666-672: Replace the free-text input for "Azure TTS Output Format
Override" with a select that reuses the curated format options from
AzureProviderForm (the same list of allowed output formats), binding the
selected value to localConfig.options?.tts?.output_format and calling
updateRoleOptions('tts', { output_format: selectedValue || undefined }) on
change; ensure an empty/cleared selection maps to undefined and keep the
disabled state controlled by showTtsExpert so the field is disabled when
showTtsExpert is false.

In `@src/engine.py`:
- Around line 9103-9112: Extract the duplicated TTS downstream-mode resolution
into a single helper (e.g., resolve_tts_downstream_mode or
get_effective_tts_downstream_mode) that takes the pipeline instance (or
pipeline.tts_adapter) and self.config as inputs, reads
pipeline.tts_adapter.downstream_mode_override using getattr with a default of
"auto", logs the evaluated override (include adapter class name), and returns a
boolean for use_streaming_playback (True if override == "stream", False if
override == "file", else compare self.config.downstream_mode != "file"); then
replace both duplicated blocks (the one using
pipeline.tts_adapter.downstream_mode_override in greeting and main synthesis
paths) with calls to this helper and assign its boolean return to
use_streaming_playback.

In `@src/pipelines/azure.py`:
- Around line 754-768: The cleanup currently swallows exceptions for
push_stream.close() and recognizer.stop_continuous_recognition; modify both
except blocks to capture the exception (e.g., except Exception as e) and log the
error details at debug level using the existing logger (include context like
call_id and exc_info or the exception message) instead of silently passing so
failures during cleanup are recorded for troubleshooting while still not
propagating.
- Around line 153-160: The SSML builder escapes text content but not attribute
values; update the code that constructs the SSML (where safe_text is used and
where voice_name, lang, lang_tag, prosody_pitch, and prosody_rate are
interpolated) to escape attribute values as well. Add or reuse an XML
attribute-escaping helper (e.g., escape_xml_attr similar to safe_text but
ensuring quotes and other XML entities are encoded) and apply it to voice_name,
lang, lang_tag, prosody_pitch, and prosody_rate before interpolation into
attributes so malformed names cannot break or inject into the SSML.
- Around line 354-359: The code currently swallows exceptions from
self._vad.is_speech; change the bare except to catch Exception as e and emit a
debug-level log before continuing so malformed frames are diagnosable.
Specifically, in the loop where self._vad.is_speech(frame, sample_rate_hz) is
called (and has_speech is set), replace "except Exception: pass" with "except
Exception as e: <logger>.debug(...)" and include the exception string and
minimal context (e.g., sample_rate_hz, len(frame) or frame metadata) in the
message; keep the existing control flow (do not re-raise) so has_speech behavior
is unchanged. Use the module or instance logger used elsewhere in the file
(e.g., logger or self._logger) to ensure consistent logging.
- Around line 908-910: The early return makes the subsequent yield unreachable;
replace this unclear pattern by removing the unreachable yield and explicitly
preserve the async-generator signature with a clear no-op yield guard (e.g.,
place a conditional no-op yield like an "if False: yield" / dummy yield at
function start) so the function remains an async generator while keeping the
early "if not text: return" path; update the function containing the "text"
parameter accordingly (replace the unreachable "yield" after the return with the
no-op yield guard).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a267e24e-67df-4881-82e5-c05076b7d0a3

📥 Commits

Reviewing files that changed from the base of the PR and between 20df03a and 80e79a4.

📒 Files selected for processing (3)

admin_ui/frontend/src/components/config/PipelineForm.tsx
src/engine.py
src/pipelines/azure.py

src/pipelines/azure.py

…ssthrough - Call .get() on start_continuous_recognition_async() ResultFuture to surface startup errors instead of silently continuing - Add A-law passthrough branch in non-streaming TTS decode path to match the streaming path's handling of raw A-law audio Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

src/pipelines/azure.py (3)

485-489: Add availability check for aiohttp before use.

AzureSTTRealtimeAdapter.start_stream checks if not speechsdk: before using the SDK (line 618), but _ensure_session uses aiohttp.ClientSession without checking if aiohttp is None. If aiohttp isn't installed, this will raise an unclear AttributeError: 'NoneType' object has no attribute 'ClientSession'.

♻️ Proposed fix for clearer error handling

     async def _ensure_session(self) -> None:
         if self._session and not self._session.closed:
             return
+        if aiohttp is None:
+            raise RuntimeError(
+                "aiohttp package is not installed. Required for Azure STT Fast / Azure TTS."
+            )
         factory = self._session_factory or aiohttp.ClientSession
         self._session = factory()

Apply the same fix to AzureTTSAdapter._ensure_session at lines 1089-1093.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 485 - 489, The _ensure_session methods
in AzureSTTRealtimeAdapter and AzureTTSAdapter currently call
aiohttp.ClientSession without verifying aiohttp is available; update both
_ensure_session implementations (the methods named _ensure_session) to check if
aiohttp is None and raise a clear RuntimeError (or ValueError) with an
explanatory message (e.g., "aiohttp is required for HTTP sessions; please
install aiohttp") before attempting to use aiohttp.ClientSession, preserving the
existing factory override logic (_session_factory) and behavior when a session
already exists.

751-768: Log cleanup errors at debug level instead of silently ignoring.

Static analysis flags the try-except-pass blocks during cleanup. While errors during cleanup are typically non-fatal, logging at debug level aids troubleshooting without disrupting the flow.

♻️ Optional: add debug logging for cleanup errors

         # Close audio stream so Azure knows input is done
         push_stream = session.get("push_stream")
         if push_stream:
             try:
                 push_stream.close()
-            except Exception:
-                pass
+            except Exception as exc:
+                logger.debug("Azure SDK push_stream.close() failed", call_id=call_id, exc_info=exc)
                 
         # Stop recognizer
         recognizer = session.get("recognizer")
         if recognizer:
             try:
                 # Stop blocks until final processed segments finish, which may take seconds.
                 # In asyncio context, best to run it in thread to avoid blocking loop if slow.
                 logger.debug("Stopping Azure SDK continuous recognition", call_id=call_id)
                 await asyncio.to_thread(recognizer.stop_continuous_recognition)
-            except Exception:
-                pass
+            except Exception as exc:
+                logger.debug("Azure SDK stop_continuous_recognition failed", call_id=call_id, exc_info=exc)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 751 - 768, Replace the silent cleanup
except/pass blocks with debug-level logging: when closing push_stream
(session["push_stream"]) or stopping recognizer (session["recognizer"] /
recognizer.stop_continuous_recognition) catch Exception as e and call
logger.debug with a short message including call_id and the exception (or use
exc_info=True) so errors during cleanup are recorded but non-fatal; keep the
existing await asyncio.to_thread(recognizer.stop_continuous_recognition)
behavior and only change the except handlers to log at debug instead of pass.

354-359: Consider logging VAD frame processing errors at debug level.

The try-except-pass block silently discards any VAD errors, which could make debugging difficult if VAD consistently fails on certain audio patterns.

♻️ Optional: log at debug level

                 try:
                     if self._vad.is_speech(frame, sample_rate_hz):
                         has_speech = True
                         break
-                except Exception:
-                    pass
+                except Exception as exc:
+                    logger.debug("VAD frame check failed", exc_info=exc)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pipelines/azure.py` around lines 354 - 359, The try/except currently
swallows VAD errors in the self._vad.is_speech(frame, sample_rate_hz) call;
change the except to capture the exception (e.g., except Exception as e) and log
it at debug level including context (frame/sample_rate_hz and the exception
message/stack) using the existing logger (e.g., self._logger.debug or module
logger) before continuing, so VAD failures are recorded for debugging without
altering control flow.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pipelines/azure.py`:
- Around line 1027-1044: The code currently assumes A-law or mu-law bytes can be
treated as PCM16 LE and converted or passed through, which silently corrupts
audio for unsupported alaw↔mulaw conversions; update both the streaming branch
(where is_raw_mulaw / is_raw_alaw and target_encoding are checked before using
pcm_payload, fmt_lower, resample_audio, _to_target_format) and the non-streaming
branch (where raw alaw/mulaw is passed through) to explicitly detect mismatched
encodings (e.g., is_raw_alaw and target_encoding == "mulaw" or is_raw_mulaw and
target_encoding == "alaw") and raise a clear exception (or return an error)
instead of attempting PCM resampling/formatting, ensuring only identical
passthroughs or supported conversions reach resample_audio/_to_target_format.
- Around line 142-178: The SSML builder _build_ssml currently only escapes text
content but inserts prosody_pitch, prosody_rate, lang_tag, voice_name (and
derived lang) directly into attribute values, risking malformed XML; add an XML
attribute-escaping step (e.g., a small helper like _escape_xml_attr that
replaces &, <, >, " and ' with &amp;, &lt;, &gt;, &quot;, &apos;) and apply it
to voice_name, lang, lang_tag, and each prosody attribute value before
constructing prosody_attrs and the <voice>/<lang> wrappers so all attribute
values are safe.

---

Nitpick comments:
In `@src/pipelines/azure.py`:
- Around line 485-489: The _ensure_session methods in AzureSTTRealtimeAdapter
and AzureTTSAdapter currently call aiohttp.ClientSession without verifying
aiohttp is available; update both _ensure_session implementations (the methods
named _ensure_session) to check if aiohttp is None and raise a clear
RuntimeError (or ValueError) with an explanatory message (e.g., "aiohttp is
required for HTTP sessions; please install aiohttp") before attempting to use
aiohttp.ClientSession, preserving the existing factory override logic
(_session_factory) and behavior when a session already exists.
- Around line 751-768: Replace the silent cleanup except/pass blocks with
debug-level logging: when closing push_stream (session["push_stream"]) or
stopping recognizer (session["recognizer"] /
recognizer.stop_continuous_recognition) catch Exception as e and call
logger.debug with a short message including call_id and the exception (or use
exc_info=True) so errors during cleanup are recorded but non-fatal; keep the
existing await asyncio.to_thread(recognizer.stop_continuous_recognition)
behavior and only change the except handlers to log at debug instead of pass.
- Around line 354-359: The try/except currently swallows VAD errors in the
self._vad.is_speech(frame, sample_rate_hz) call; change the except to capture
the exception (e.g., except Exception as e) and log it at debug level including
context (frame/sample_rate_hz and the exception message/stack) using the
existing logger (e.g., self._logger.debug or module logger) before continuing,
so VAD failures are recorded for debugging without altering control flow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8e7b9e2-91e7-41e1-9671-7be2f6e34a5a

📥 Commits

Reviewing files that changed from the base of the PR and between 80e79a4 and 493ab13.

📒 Files selected for processing (1)

src/pipelines/azure.py

coderabbitai · 2026-03-12T22:49:02Z

src/pipelines/azure.py

+def _build_ssml(
+    text: str,
+    voice_name: str,
+    language: str,
+    prosody_pitch: Optional[str] = None,
+    prosody_rate: Optional[str] = None,
+    lang_tag: Optional[str] = None,
+) -> str:
+    """Build a minimal SSML document for Azure TTS, with optional prosody and lang controls."""
+    # Derive xml:lang from voice_name locale prefix (e.g. "en-US-JennyNeural" -> "en-US")
+    lang = language or ("-".join(voice_name.split("-")[:2]) if "-" in voice_name else "en-US")
+    safe_text = (
+        text
+        .replace("&", "&amp;")
+        .replace("<", "&lt;")
+        .replace(">", "&gt;")
+        .replace('"', "&quot;")
+        .replace("'", "&apos;")
+    )
+    # Build optional <prosody> wrapper
+    prosody_attrs = []
+    if prosody_pitch:
+        prosody_attrs.append(f"pitch='{prosody_pitch}'")
+    if prosody_rate:
+        prosody_attrs.append(f"rate='{prosody_rate}'")
+    if prosody_attrs:
+        inner = f"<prosody {' '.join(prosody_attrs)}>{safe_text}</prosody>"
+    else:
+        inner = safe_text
+    # For multilingual voices, wrap with <lang xml:lang="..."> to specify spoken language
+    if lang_tag:
+        inner = f"<lang xml:lang='{lang_tag}'>{inner}</lang>"
+    return (
+        f"<speak version='1.0' xml:lang='{lang}'>"
+        f"<voice name='{voice_name}'>{inner}</voice>"
+        f"</speak>"
+    )


⚠️ Potential issue | 🟡 Minor

SSML attribute values are not escaped, risking malformed XML or injection.

The safe_text variable properly escapes XML entities, but prosody_pitch, prosody_rate, lang_tag, and voice_name are embedded directly into single-quoted attributes without escaping. If any of these contain a single quote (e.g., pitch="default'"), the SSML becomes malformed or potentially exploitable.

🐛 Proposed fix to escape attribute values

+def _escape_attr(value: str) -> str: + """Escape a value for use in an XML single-quoted attribute.""" + return ( + value + .replace("&", "&") + .replace("<", "<") + .replace(">", ">") + .replace("'", "'") + ) + def _build_ssml( text: str, voice_name: str, language: str, prosody_pitch: Optional[str] = None, prosody_rate: Optional[str] = None, lang_tag: Optional[str] = None, ) -> str: """Build a minimal SSML document for Azure TTS, with optional prosody and lang controls.""" # Derive xml:lang from voice_name locale prefix (e.g. "en-US-JennyNeural" -> "en-US") lang = language or ("-".join(voice_name.split("-")[:2]) if "-" in voice_name else "en-US") safe_text = ( text .replace("&", "&") .replace("<", "<") .replace(">", ">") .replace('"', """) .replace("'", "'") ) # Build optional <prosody> wrapper prosody_attrs = [] if prosody_pitch: - prosody_attrs.append(f"pitch='{prosody_pitch}'") + prosody_attrs.append(f"pitch='{_escape_attr(prosody_pitch)}'") if prosody_rate: - prosody_attrs.append(f"rate='{prosody_rate}'") + prosody_attrs.append(f"rate='{_escape_attr(prosody_rate)}'") if prosody_attrs: inner = f"<prosody {' '.join(prosody_attrs)}>{safe_text}</prosody>" else: inner = safe_text # For multilingual voices, wrap with <lang xml:lang="..."> to specify spoken language if lang_tag: - inner = f"<lang xml:lang='{lang_tag}'>{inner}</lang>" + inner = f"<lang xml:lang='{_escape_attr(lang_tag)}'>{inner}</lang>" + safe_lang = _escape_attr(lang) + safe_voice = _escape_attr(voice_name) return ( - f"<speak version='1.0' xml:lang='{lang}'>" - f"<voice name='{voice_name}'>{inner}</voice>" + f"<speak version='1.0' xml:lang='{safe_lang}'>" + f"<voice name='{safe_voice}'>{inner}</voice>" f"</speak>" )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 142 - 178, The SSML builder _build_ssml currently only escapes text content but inserts prosody_pitch, prosody_rate, lang_tag, voice_name (and derived lang) directly into attribute values, risking malformed XML; add an XML attribute-escaping step (e.g., a small helper like _escape_xml_attr that replaces &, <, >, " and ' with &, <, >, ", ') and apply it to voice_name, lang, lang_tag, and each prosody attribute value before constructing prosody_attrs and the <voice>/<lang> wrappers so all attribute values are safe.

coderabbitai · 2026-03-12T22:49:02Z

src/pipelines/azure.py

+                    if (is_raw_mulaw and target_encoding == "mulaw") or (is_raw_alaw and target_encoding == "alaw"):
+                        # Already in target format — yield directly
+                        converted = pcm_payload
+                    else:
+                        # PCM16 LE raw bytes — determine source sample rate from format string
+                        audio_bytes = pcm_payload
+                        fmt_l = fmt_lower
+                        if "8khz" in fmt_l:
+                            source_rate = 8000
+                        elif "16khz" in fmt_l:
+                            source_rate = 16000
+                        elif "24khz" in fmt_l:
+                            source_rate = 24000
+                        else:
+                            source_rate = 8000
+                        if source_rate != target_rate:
+                            audio_bytes, _ = resample_audio(audio_bytes, source_rate, target_rate)
+                        converted = _to_target_format(audio_bytes, target_encoding)


⚠️ Potential issue | 🟠 Major

Unsupported encoding conversion silently corrupts audio.

Both streaming and non-streaming paths handle mulaw↔mulaw and alaw↔alaw passthrough, but mismatched combinations (e.g., output_format=raw-8khz-8bit-mono-alaw with target_encoding=mulaw) fall through incorrectly:

Streaming (line 1030-1044): A-law bytes are treated as PCM16 LE, producing garbage.

Non-streaming (line 1071-1073): A-law bytes are passed through as-is but interpreted as mulaw downstream.

Since there's no alaw↔mulaw converter, the code should raise an explicit error rather than silently corrupt audio.

🐛 Proposed fix for streaming path

if (is_raw_mulaw and target_encoding == "mulaw") or (is_raw_alaw and target_encoding == "alaw"): # Already in target format — yield directly converted = pcm_payload + elif is_raw_mulaw or is_raw_alaw: + # Source is mulaw/alaw but target doesn't match — unsupported conversion + raise RuntimeError( + f"Azure TTS: cannot convert raw {('mulaw' if is_raw_mulaw else 'alaw')} " + f"to target_encoding='{target_encoding}'. Use matching output_format or target_encoding." + ) else: # PCM16 LE raw bytes — determine source sample rate from format string

🐛 Proposed fix for non-streaming path

if native_encoding == "mulaw" and target_encoding == "mulaw": # Already in mulaw at 8 kHz — yield directly converted = audio_bytes elif native_encoding == "alaw" and target_encoding == "alaw": # Already in A-law at 8 kHz — yield directly converted = audio_bytes + elif native_encoding in ("mulaw", "alaw"): + # Source is mulaw/alaw but target doesn't match — unsupported conversion + raise RuntimeError( + f"Azure TTS: cannot convert {native_encoding} to target_encoding='{target_encoding}'. " + "Use matching output_format or target_encoding." + ) elif native_encoding in ("pcm16", "pcm"): # Resample if needed, then convert to target encoding if source_rate != target_rate: audio_bytes, _ = resample_audio(audio_bytes, source_rate, target_rate) converted = _to_target_format(audio_bytes, target_encoding) else: # Unknown encoding — pass through as-is converted = audio_bytes

Also applies to: 1060-1073

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 1027 - 1044, The code currently assumes A-law or mu-law bytes can be treated as PCM16 LE and converted or passed through, which silently corrupts audio for unsupported alaw↔mulaw conversions; update both the streaming branch (where is_raw_mulaw / is_raw_alaw and target_encoding are checked before using pcm_payload, fmt_lower, resample_audio, _to_target_format) and the non-streaming branch (where raw alaw/mulaw is passed through) to explicitly detect mismatched encodings (e.g., is_raw_alaw and target_encoding == "mulaw" or is_raw_mulaw and target_encoding == "alaw") and raise a clear exception (or return an error) instead of attempting PCM resampling/formatting, ensuring only identical passthroughs or supported conversions reach resample_audio/_to_target_format.

Ivan Garcia and others added 2 commits March 1, 2026 12:33

Azure Provder

0efb99b

egorky requested a review from hkjarral as a code owner March 2, 2026 23:00

github-advanced-security bot found potential problems Mar 2, 2026

View reviewed changes

admin_ui/backend/api/config.py Fixed Show fixed Hide fixed

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

hkjarral added the good first challenge label Mar 3, 2026

hkjarral and others added 2 commits March 12, 2026 14:51

Merge branch 'main' into azure-provider

2a901df

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

src/pipelines/azure.py Outdated Show resolved Hide resolved

src/pipelines/azure.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

hkjarral merged commit 428851e into hkjarral:main Mar 12, 2026
8 checks passed

Conversation

egorky commented Mar 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (2 warnings)

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hkjarral commented Mar 2, 2026

PR Review — Excellent Azure Integration! 🎉

🔴 Critical: Fix SSRF Vulnerability in Azure Region Handling

🔴 Critical: Close AzureSTTFastAdapter Session in Realtime Fallback

🔴 Critical: Fix Sample Rate Mismatch in STT Streaming

🔴 Critical: Fix Hard-coded Sample Rate in flush_speech

🟡 Major: Avoid Logging Full LLM Payloads

🟡 Major: Clamp Aggregation Thresholds to Positive Values

🟡 Major: Expose VAD Timeout Config

🟡 Moderate: Fix Azure Provider UI Shadowing

🟡 Moderate: Constrain Azure STT Variant to Valid Values

🟡 Minor: Align output_format Defaults

🟡 Minor: Add Raw A-law Detection

Summary

Uh oh!

hkjarral commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

egorky commented Mar 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 2, 2026 •

edited

Loading

🔴 Critical: Close `AzureSTTFastAdapter` Session in Realtime Fallback

🔴 Critical: Fix Hard-coded Sample Rate in `flush_speech`

🟡 Minor: Align `output_format` Defaults