Conversation
- Add AzureSTTProviderConfig and AzureTTSProviderConfig Pydantic models to config.py - Inject AZURE_SPEECH_KEY from env in config/security.py - Implement AzureSTTFastAdapter (fast transcription REST API), AzureSTTRealtimeAdapter (one-shot STT REST), and AzureTTSAdapter in src/pipelines/azure.py - Register azure_stt_fast, azure_stt_realtime, azure_tts factories in orchestrator; azure_stt alias routes by variant field - Add AzureProviderForm.tsx to admin UI with full parameter coverage: region, variant (realtime/fast), language, voice name, output format, target encoding/sample rate, chunk size, timeouts, custom endpoints - Register 'azure' in REGISTERED_PROVIDER_TYPES (providerNaming.ts) - Add azure_stt and azure_tts quick-add templates to ProvidersPage - Add Azure expert settings panels to PipelineForm (variant/language overrides for STT; voice name/output format overrides for TTS)
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Microsoft Azure Speech Service support (STT/TTS): new provider models, Azure adapters (fast/realtime STT, TTS), orchestrator factories, frontend provider UI/forms, backend env/key handling and validation, engine transcript queuing, streaming batching, requirements, and provider templates. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as Frontend UI
participant API as Backend API
participant Env as Env/.env
participant Azure as Azure Speech Service
participant Orch as Pipeline Orchestrator / Engine
UI->>API: Request provider connection test / save config
API->>Env: Read AZURE_SPEECH_KEY (strip quotes)
alt key present
API->>Azure: POST token request to region-specific token endpoint
Azure-->>API: 200 / 401 / other
API-->>UI: Return validated result (success/error)
else
API-->>UI: Return error (key not configured)
end
UI->>Orch: Configure provider (azure_stt / azure_tts)
Orch->>Orch: Hydrate & validate provider config (region, key)
Orch->>Orch: Register adapter factory (fast/realtime/tts)
Orch-->>UI: Adapters available
sequenceDiagram
participant Agent as Engine/Agent
participant STT as Azure STT Adapter
participant TranscriptQ as Pipeline Transcript Queue
participant TTS as Azure TTS Adapter
participant Playback as Streaming Playback Manager
Agent->>STT: Stream audio frames
STT->>STT: Buffer / VAD / batch frames
STT->>Agent: Deliver transcripts (including early flush)
Agent->>TranscriptQ: Enqueue early transcripts
Agent->>Agent: Check aggregation thresholds (min_words/min_chars)
Agent->>TTS: Request synthesis (SSML with prosody/lang)
TTS->>Playback: Stream chunks (or return file) with format conversion
Playback->>Playback: Batch session upserts every 1s
Playback-->>Agent: Playback complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pipelines/openai.py (1)
480-487:⚠️ Potential issue | 🟠 MajorAvoid logging full LLM message payloads.
This logs complete conversation content at debug level, which can leak sensitive user/tool data into logs.
🔧 Proposed fix
logger.debug( "OpenAI chat completion request", call_id=call_id, model=payload.get("model"), temperature=payload.get("temperature"), tools_count=len(payload.get("tools", [])), - messages=json.dumps(payload.get("messages", []), ensure_ascii=False), + messages_count=len(payload.get("messages", [])), + last_message_preview=str( + (payload.get("messages", [])[-1].get("content", "") if payload.get("messages") else "") + )[:120], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/openai.py` around lines 480 - 487, The logger.debug call in openai.py is currently logging full conversation content via payload.get("messages"), which can leak sensitive data; update the logger.debug invocation (the call referencing logger.debug, call_id and payload) to avoid including full messages—replace messages=json.dumps(payload.get("messages", []), ...) with non-sensitive metadata such as messages_count=len(payload.get("messages", [])) and optionally a redacted or truncated indicator (e.g., messages_preview or messages_hash) so the full message bodies are never written to logs.
🧹 Nitpick comments (3)
admin_ui/frontend/src/components/config/PipelineForm.tsx (1)
575-582: Constrain Azure STT variant override to supported values.Line 576 currently allows arbitrary text. Invalid values can be persisted and break runtime routing for Azure STT variant selection.
💡 Suggested fix
- <FormInput - label="Azure STT Variant Override" - value={localConfig.options?.stt?.variant || ''} - onChange={(e) => updateRoleOptions('stt', { variant: e.target.value || undefined })} - placeholder="realtime (or fast)" - tooltip="Override the variant set on the provider: 'realtime' or 'fast'." - disabled={!showSttExpert} - /> + <FormSelect + label="Azure STT Variant Override" + value={localConfig.options?.stt?.variant || ''} + onChange={(e) => updateRoleOptions('stt', { variant: e.target.value || undefined })} + options={[ + { value: '', label: 'Use provider default' }, + { value: 'realtime', label: 'realtime' }, + { value: 'fast', label: 'fast' }, + ]} + tooltip="Override the variant set on the provider." + disabled={!showSttExpert} + />🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 575 - 582, The Azure STT variant input in PipelineForm allows arbitrary text and should be constrained to supported values; update the FormInput for "Azure STT Variant Override" (in PipelineForm) to only accept "realtime" or "fast" by replacing free-text entry with a controlled selection or by validating inside the onChange handler before calling updateRoleOptions('stt', ...); ensure empty value maps to undefined and that localConfig.options?.stt?.variant is set only to "realtime" or "fast" (or cleared) so invalid values cannot be persisted.config/ai-agent.yaml (1)
1518-1521: Align Azure TTS default format with the UI template.
azure_tts.output_formathere israw-8khz-16bit-mono-pcm, while the quick-add template usesriff-8khz-16bit-mono-pcm(Line 270 inadmin_ui/frontend/src/pages/ProvidersPage.tsx). This drift causes different runtime behavior depending on where the provider was created.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/ai-agent.yaml` around lines 1518 - 1521, The azure_tts.output_format value is inconsistent with the UI quick-add template and should be changed from "raw-8khz-16bit-mono-pcm" to "riff-8khz-16bit-mono-pcm" so providers created via the UI and config behave the same; update the azure_tts.output_format entry (and verify target_encoding/target_sample_rate_hz remain correct) to match the UI template reference used in ProvidersPage.tsx.src/pipelines/orchestrator.py (1)
629-639: Normalize and validatevariantbefore alias logging.If
variantis invalid (e.g., typo), alias correctly falls back to realtime, but the log still reportsazure_stt_<invalid>, which is misleading during troubleshooting.Proposed fix
@@ - variant = (self._azure_stt_provider_config.variant or "realtime").lower() + raw_variant = str(self._azure_stt_provider_config.variant or "realtime").strip().lower() + variant = raw_variant if raw_variant in {"fast", "realtime"} else "realtime" + if raw_variant not in {"fast", "realtime"}: + logger.warning( + "Invalid Azure STT variant configured; defaulting alias to realtime", + configured_variant=raw_variant, + ) alias_factory = fast_factory if variant == "fast" else realtime_factory self.register_factory("azure_stt", alias_factory)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/orchestrator.py` around lines 629 - 639, The log uses the raw possibly-invalid provider value for stt_alias causing misleading messages; validate and normalize the variant before both selecting the factory and logging. Compute a validated string (e.g., chosen_variant = "fast" if normalized_variant == "fast" else "realtime"), use chosen_variant when selecting alias_factory and calling register_factory("azure_stt", alias_factory), and use the same chosen_variant in logger.info stt_alias (e.g., f"azure_stt_{chosen_variant}") so the log reflects the actual behaviour.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@admin_ui/backend/api/config.py`:
- Around line 1423-1426: Normalize and validate the Azure region coming from
provider_config before constructing token_url: retrieve region via
provider_config.get('region', 'eastus'), then trim surrounding whitespace (e.g.,
region = region.strip()), normalize casing if desired (e.g., lower()), and
validate against an allowlist or simple regex that permits only expected
characters (letters, numbers, hyphens) and rejects empty or malformed values; if
validation fails, fall back to the default 'eastus' or raise a clear error. Use
the normalized/validated region variable when building token_url instead of the
raw provider_config value to avoid malformed endpoints.
In `@admin_ui/frontend/src/pages/ProvidersPage.tsx`:
- Around line 255-275: The templates object includes azure_stt and azure_tts but
the modal’s "Modular Providers (Cloud)" template list rendering never creates
checkbox entries for those keys; update the checkbox rendering logic (where
template checkboxes are built—e.g., the component or loop that renders the
templates list in ProvidersPage, likely the function that maps over templates)
to include items for 'azure_stt' and 'azure_tts' with matching keys, labels, and
capabilities so they become selectable in the modal; ensure the checkbox
value/name matches the template keys (azure_stt, azure_tts) and that any handler
that collects selected templates (e.g., onToggleTemplate / selectedTemplates
state) recognizes them.
- Around line 601-603: The Azure form is being shadowed by the earlier
"realtime" branch because providerName.includes('realtime') matches names like
"azure_stt_realtime"; update the routing logic in ProvidersPage (where
providerName is checked) so Azure-specific names take precedence or the checks
are more specific: either move the azure check (the block that renders
AzureProviderForm) above the realtime/OpenAI check, or restrict the realtime
check (e.g., exclude names containing 'azure' or use a more precise pattern) so
that AzureProviderForm is rendered for Azure providers (identify the
providerName variable and the conditional that returns AzureProviderForm to
locate where to change the order/condition).
In `@src/config/security.py`:
- Around line 219-228: The current injection logic only matches provider names
starting with "azure_stt" or equal to "azure_tts", which misses custom keys;
update the loop that iterates over providers_block (using azure_speech_key,
provider_name, provider_cfg) to also detect Azure providers by inspecting
provider_cfg fields (e.g., check provider_cfg.get("type",
"").lower().startswith("azure") or provider_cfg.get("provider",
"").lower().startswith("azure")) and, when such a type/provider indicates Azure,
set provider_cfg["api_key"] = azure_speech_key and write back
providers_block[provider_name] = provider_cfg; keep the existing
isinstance(provider_cfg, dict) guard and only inject when azure_speech_key is
truthy.
In `@src/core/streaming_playback_manager.py`:
- Around line 849-858: Pending streamed bytes (bytes_since_last_upsert) are only
upserted when chunk is None, causing lost counts on other exit paths; modify the
streaming loop in streaming_playback_manager.py so that before any
break/return/exception path you flush bytes_since_last_upsert by fetching the
session via await self.session_store.get_by_call_id(call_id), adding
bytes_since_last_upsert to sess.streaming_bytes_sent, setting
sess.streaming_jitter_buffer_depth = jitter_buffer.qsize(), and calling await
self.session_store.upsert_call(sess); do this in a single shared cleanup (e.g.,
a finally block or a helper flush_pending_bytes(call_id,
bytes_since_last_upsert, jitter_buffer)) and apply the same change to the second
occurrence referenced around the lines for bytes_since_last_upsert at the other
block (873-881) so all exit paths persist the counter.
- Around line 856-857: Replace the silent except in the stats persistence path
that currently reads "except Exception: pass" with a non-fatal logging statement
so failed session upserts still surface diagnostics; catch Exception as e and
call the module's logger (e.g., logger.debug or process_logger.debug) with a
clear message about the failed session upsert and exc_info=True to preserve the
traceback, then continue (do not re-raise) so behavior stays non-fatal.
In `@src/engine.py`:
- Line 9098: The warning-level debug messages that log TTS DM override details
(the logger.warning call that prints "DEBUG: TTS Adapter DM Override evaluated
as: {_tts_dm_override} on adapter {pipeline.tts_adapter.__class__.__name__}")
are flooding alerts; change these to use logger.debug (or remove entirely) in
both occurrences (the call referencing _tts_dm_override and
pipeline.tts_adapter.__class__.__name__) so TTS hot paths no longer emit
warnings per turn/greeting—locate the two logger.warning calls and replace them
with logger.debug(...) (or delete if unnecessary) to keep debug info at the
appropriate log level.
- Around line 10039-10042: The min_words and min_chars are read directly from
pipeline.llm_options and can be non-positive, which makes threshold_met
(computed from words and chars) always true; clamp these values to positive
bounds before use (e.g., replace the direct int(...) reads for min_words and
min_chars with clamped values such as max(1, parsed_value) or another positive
minimum) so that threshold_met = words >= min_words or chars >= min_chars
behaves correctly; update the lines that set min_words, min_chars and ensure any
tests relying on aggregation thresholds reflect the minimum clamp.
- Around line 3296-3301: The code erroneously treats _pipeline_tasks as a
collection; change the cleanup to pop a single asyncio.Task for the session
(task = getattr(self, "_pipeline_tasks", {}).pop(session.call_id, None)) then if
task and not task.done(): task.cancel(); ensure the subsequent cleanup always
runs by not relying on iteration (leave the existing pops of _pipeline_queues,
_pipeline_transcript_queues, and _pipeline_forced unchanged) and apply the same
single-Task pop/cancel fix to the other occurrence around _pipeline_tasks at the
3928–3933 block; use the symbols _pipeline_tasks, _pipeline_queues,
_pipeline_transcript_queues, _pipeline_forced, and session.call_id to locate and
update both blocks.
- Around line 4411-4429: The background STT flush task is created with
asyncio.create_task(_) and therefore is not tracked for per-call cleanup;
replace that call with self._fire_and_forget_for_call(call_id,
_flush_and_process(), name=f"pipeline-stt-flush-{call_id}") so the task is added
to the call's _call_bg_tasks tracking. Also wrap the await of
stt.flush_speech(...) inside asyncio.wait_for(..., timeout=<reasonable_secs>)
inside _flush_and_process to avoid indefinite hangs; keep existing logging
(logger.debug / logger.warning / logger.error) and the use of
pipeline.options_summary().get("stt", {}) and stt.flush_speech to locate the
correct place.
In `@src/pipelines/azure.py`:
- Around line 625-631: The VAD timeout keys are never exposed because
_compose_options doesn't include "vad_silence_timeout_ms" and
"vad_initial_silence_timeout_ms", so start_stream's reads are ignored; update
_compose_options to merge and return these two keys (taking values from provider
defaults and runtime overrides) so merged passed into start_stream contains
them, and make the same change for the other timeout-setting location that
mirrors this logic (the second block that sets
SpeechServiceConnection_EndSilenceTimeoutMs /
SpeechServiceConnection_InitialSilenceTimeoutMs). Locate the functions/methods
named _compose_options and start_stream and ensure _compose_options includes the
keys exactly "vad_silence_timeout_ms" and "vad_initial_silence_timeout_ms" in
its returned dict so the timeout-setting code uses the configured values.
- Around line 635-638: The code changes sample_rate_hz from 8000 to 16000 when
stream_format == "pcm16_16k" (in the start_stream flow) but send_audio forwards
raw PCM bytes unchanged, causing a sample-rate mismatch; either perform proper
audio resampling of the payload bytes to 16 kHz before sending (implement a
resample step where start_stream sets sample_rate_hz and call it from
send_audio), or stop mutating sample_rate_hz and instead validate/raise an error
if the payload sample rate doesn't match the declared stream_format; update the
functions start_stream and send_audio to reference the same sample rate (or call
a shared resample utility) so timing/pitch remain correct.
- Around line 761-768: The temporary AzureSTTFastAdapter created as fast_adapter
in the realtime fallback can open an aiohttp.ClientSession and must be closed to
avoid session leaks; wrap the creation and call to
fast_adapter.transcribe(call_id, audio_pcm16, sample_rate_hz, options) in a
try/finally (or an async context manager) and close the adapter in the finally
block—prefer using an async close if available (await fast_adapter.aclose() or
await fast_adapter.close() if async) or call fast_adapter.close() if
synchronous; ensure the finally always runs even on error so AzureSTTFastAdapter
instances are consistently cleaned up.
- Around line 381-397: The explicit flush in flush_speech currently hard-codes
16000 when calling _execute_transcription which breaks buffered audio recorded
at 8000 Hz; change flush_speech so it reads the sample rate associated with the
buffered audio (e.g., the attribute used when filling _audio_buffer such as
self._buffer_sample_rate or the pipeline's self._sample_rate) and pass that
sample rate to _execute_transcription instead of 16000, falling back to 16000
only if no buffer-specific sample rate is available; update references in
flush_speech to use that variable when calling _execute_transcription and ensure
any places that set/clear the buffer also maintain the buffer sample rate.
- Around line 940-1002: The streaming path currently only detects raw μ-law via
is_raw_mulaw and skips PCM handling; add detection for raw A-law (e.g.,
is_raw_alaw = "alaw" in fmt_lower and fmt_lower.startswith("raw-")) and treat it
the same as μ-law: update the branch that currently checks if is_raw_mulaw and
target_encoding == "mulaw" to also handle is_raw_alaw when target_encoding ==
"alaw" (or unify the check to if (is_raw_mulaw and target_encoding=="mulaw") or
(is_raw_alaw and target_encoding=="alaw")), so raw A-law payloads are yielded
directly instead of being processed through the PCM/resample/_to_target_format
path (refer to variables is_raw_mulaw/is_raw_alaw, fmt_lower, pcm_payload,
target_encoding, resample_audio, and _to_target_format).
---
Outside diff comments:
In `@src/pipelines/openai.py`:
- Around line 480-487: The logger.debug call in openai.py is currently logging
full conversation content via payload.get("messages"), which can leak sensitive
data; update the logger.debug invocation (the call referencing logger.debug,
call_id and payload) to avoid including full messages—replace
messages=json.dumps(payload.get("messages", []), ...) with non-sensitive
metadata such as messages_count=len(payload.get("messages", [])) and optionally
a redacted or truncated indicator (e.g., messages_preview or messages_hash) so
the full message bodies are never written to logs.
---
Nitpick comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 575-582: The Azure STT variant input in PipelineForm allows
arbitrary text and should be constrained to supported values; update the
FormInput for "Azure STT Variant Override" (in PipelineForm) to only accept
"realtime" or "fast" by replacing free-text entry with a controlled selection or
by validating inside the onChange handler before calling
updateRoleOptions('stt', ...); ensure empty value maps to undefined and that
localConfig.options?.stt?.variant is set only to "realtime" or "fast" (or
cleared) so invalid values cannot be persisted.
In `@config/ai-agent.yaml`:
- Around line 1518-1521: The azure_tts.output_format value is inconsistent with
the UI quick-add template and should be changed from "raw-8khz-16bit-mono-pcm"
to "riff-8khz-16bit-mono-pcm" so providers created via the UI and config behave
the same; update the azure_tts.output_format entry (and verify
target_encoding/target_sample_rate_hz remain correct) to match the UI template
reference used in ProvidersPage.tsx.
In `@src/pipelines/orchestrator.py`:
- Around line 629-639: The log uses the raw possibly-invalid provider value for
stt_alias causing misleading messages; validate and normalize the variant before
both selecting the factory and logging. Compute a validated string (e.g.,
chosen_variant = "fast" if normalized_variant == "fast" else "realtime"), use
chosen_variant when selecting alias_factory and calling
register_factory("azure_stt", alias_factory), and use the same chosen_variant in
logger.info stt_alias (e.g., f"azure_stt_{chosen_variant}") so the log reflects
the actual behaviour.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (17)
admin_ui/backend/api/config.pyadmin_ui/backend/api/local_ai.pyadmin_ui/frontend/src/components/config/PipelineForm.tsxadmin_ui/frontend/src/components/config/providers/AzureProviderForm.tsxadmin_ui/frontend/src/pages/ProvidersPage.tsxadmin_ui/frontend/src/utils/providerNaming.tsconfig/ai-agent.yamlrequirements.txtsrc/config.pysrc/config/__init__.pysrc/config/security.pysrc/core/streaming_playback_manager.pysrc/engine.pysrc/pipelines/__init__.pysrc/pipelines/azure.pysrc/pipelines/openai.pysrc/pipelines/orchestrator.py
PR Review — Excellent Azure Integration! 🎉@egorky — thank you for this comprehensive Azure Speech Service integration! I have been meaning to work on it but never got a chance to handle it. This is a major contribution. Thank you again for working on this, several community members had requested this multiple times. PR #261 brings a major new provider to AVA with both STT and TTS capabilities. The architecture follows our existing provider patterns well:
The new features are solid:
🔴 Critical: Fix SSRF Vulnerability in Azure Region HandlingLocation: The Azure region from What to do: Validate and normalize the region before building the URL: import re
AZURE_REGION_PATTERN = re.compile(r"^[a-z0-9-]{1,50}$")
AZURE_REGIONS_ALLOWLIST = {"eastus", "eastus2", "westus", "westus2", "westeurope", "northeurope", ...} # Add all valid regions
raw_region = provider_config.get("region", "eastus")
region = str(raw_region).strip().lower()
if not AZURE_REGION_PATTERN.match(region) or region not in AZURE_REGIONS_ALLOWLIST:
return {"status": "error", "message": f"Invalid Azure region: {raw_region}"}
token_url = f"https://{region}.api.cognitive.microsoft.com/sts/v1.0/issueToken"🔴 Critical: Close
|
| Priority | Count | Items |
|---|---|---|
| 🔴 Critical | 4 | SSRF fix, session leak, sample rate mismatch, flush sample rate |
| 🟡 Major | 3 | PII logging, threshold clamping, VAD config |
| 🟡 Moderate | 2 | UI shadowing, variant validation |
| 🟡 Minor | 2 | Format alignment, A-law detection |
This is a substantial and well-architected contribution! Once the 4 critical items are addressed, we can merge. The Azure adapters follow existing patterns nicely, and the LLM aggregation thresholds are a great addition.
Looking forward to seeing this land — thanks for the thorough work! 🙌
SSRF prevention: - Add validate_azure_region() shared validator in src/config.py - Wire into AzureSTTProviderConfig/AzureTTSProviderConfig via Pydantic field_validator - Guard all _build_azure_*_url() functions in azure.py - Validate region in admin API test-connection endpoint Correctness fixes: - Fix _pipeline_tasks iteration bug (Dict[str, Task], not set) in engine cleanup - Track STT flush task via _fire_and_forget_for_call with 5s timeout - Demote "DEBUG: TTS Adapter DM Override" from warning to debug level - Clamp aggregation min_words/min_chars to >= 1 - Add VAD timeout keys to AzureSTTRealtimeAdapter._compose_options - Close temp AzureSTTFastAdapter in realtime fallback (session leak) - Remove silent sample_rate mutation in start_stream; log warning on mismatch - Track buffer sample rate in flush_speech instead of hard-coding 16000 - Handle raw A-law format in TTS streaming passthrough - Flush pending byte counters in streaming_playback_manager finally block - Replace silent except with debug logging for session upsert failures Security & validation: - Constrain STT variant to Literal["realtime", "fast"] in config - Normalize/validate variant in orchestrator with warning on invalid input - Add type-based Azure key injection in security.py (cfg type == "azure") - Remove full message payload logging in OpenAI adapter Frontend: - Move Azure form check above realtime check to prevent routing shadow - Add azure_stt/azure_tts checkbox entries to quick-add modal - Replace free-text variant input with constrained select dropdown Config: - Align azure_tts output_format to riff-8khz-16bit-mono-pcm (matches UI) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
src/core/streaming_playback_manager.py (1)
875-882:⚠️ Potential issue | 🟡 MinorOnly clear the periodic batch after a successful upsert.
If
get_by_call_id(call_id)returnsNone, Line 881 still zeroesbytes_since_last_upsert, so that second of audio is permanently undercounted. The end-of-stream path already avoids dropping the batch until it is actually persisted.♻️ Suggested change
bytes_since_last_upsert += len(chunk) if time.time() - last_upsert_time >= 1.0: sess = await self.session_store.get_by_call_id(call_id) if sess: sess.streaming_bytes_sent += bytes_since_last_upsert sess.streaming_jitter_buffer_depth = jitter_buffer.qsize() await self.session_store.upsert_call(sess) - bytes_since_last_upsert = 0 - last_upsert_time = time.time() + bytes_since_last_upsert = 0 + last_upsert_time = time.time()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/streaming_playback_manager.py` around lines 875 - 882, The periodic batch is being cleared regardless of persistence success, causing lost audio counts; change the block using get_by_call_id(call_id) so that bytes_since_last_upsert and last_upsert_time are only reset after a successful fetch and await self.session_store.upsert_call(sess) completes (i.e., when sess is truthy and upsert_call has returned); leave bytes_since_last_upsert and last_upsert_time unchanged when sess is None (optionally log a warning) so the next periodic tick will include the unpersisted bytes.
🧹 Nitpick comments (5)
src/pipelines/azure.py (2)
746-761: Cleanup exception handling is acceptable but could benefit from debug logging.The try-except-pass blocks during
push_stream.close()andrecognizer.stop_continuous_recognition()are reasonable for cleanup code where we want to proceed regardless of errors. However, adding debug-level logging would help diagnose issues in production.♻️ Optional: Add debug logging for cleanup failures
if push_stream: try: push_stream.close() - except Exception: - pass + except Exception as exc: + logger.debug("Failed to close Azure push stream", call_id=call_id, exc_info=exc) # Stop recognizer recognizer = session.get("recognizer") if recognizer: try: logger.debug("Stopping Azure SDK continuous recognition", call_id=call_id) await asyncio.to_thread(recognizer.stop_continuous_recognition) - except Exception: - pass + except Exception as exc: + logger.debug("Failed to stop Azure recognizer", call_id=call_id, exc_info=exc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 746 - 761, The cleanup try/except blocks silently swallow exceptions; update them to log debug-level details so failures are visible: around push_stream.close() capture the exception and call logger.debug with context (e.g., "push_stream.close failed", include call_id and the exception), and similarly around recognizer.stop_continuous_recognition (use await asyncio.to_thread(recognizer.stop_continuous_recognition) inside the try and on except call logger.debug with "recognizer.stop_continuous_recognition failed", call_id, and exception); keep the exception suppressed after logging to preserve current behavior and reference session/get("recognizer") where applicable.
353-358: Consider debug-level logging for VAD exceptions.The silent
except: passblock at lines 357-358 could mask unexpected VAD library issues. While this shouldn't block the flow, a debug log would aid troubleshooting.♻️ Optional: Add debug logging
try: if self._vad.is_speech(frame, sample_rate_hz): has_speech = True break - except Exception: - pass + except Exception as exc: + logger.debug("VAD speech detection failed for frame", exc_info=exc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 353 - 358, The silent except block around self._vad.is_speech(frame, sample_rate_hz) swallows VAD errors; change it to catch Exception as e and emit a debug-level log with the exception and minimal context (e.g., sample_rate_hz and a short identifier for frame) before continuing. Use the existing logger on the class if available (e.g., self._logger) or logging.getLogger(__name__) and call logger.debug(...) with the exception info so the code still continues but failures are visible for troubleshooting.admin_ui/frontend/src/components/config/PipelineForm.tsx (1)
430-447: Mirror the backend lower bound in these new inputs.These are minimum thresholds, but the controls still accept
0and negative values. Addingmin={1}keeps the form aligned with the server-side clamp and avoids confusing “saved 0, got 1” round-trips.♻️ Suggested change
<FormInput label="LLM Min Words Threshold" type="number" + min={1} value={localConfig.options?.llm?.aggregation_min_words ?? ''} onChange={(e) => updateRoleOptions('llm', { aggregation_min_words: e.target.value ? parseInt(e.target.value, 10) : undefined })} placeholder="Auto" tooltip="Minimum words to wait before sending transcript to LLM." disabled={!showLlmExpert} /> <FormInput label="LLM Min Chars Threshold" type="number" + min={1} value={localConfig.options?.llm?.aggregation_min_chars ?? ''} onChange={(e) => updateRoleOptions('llm', { aggregation_min_chars: e.target.value ? parseInt(e.target.value, 10) : undefined })} placeholder="Auto" tooltip="Minimum characters to wait before sending transcript to LLM." disabled={!showLlmExpert} />🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 430 - 447, The two number inputs in PipelineForm.tsx for "LLM Min Words Threshold" and "LLM Min Chars Threshold" accept 0/negative values but the backend clamps them; update the two FormInput components (the ones bound to localConfig.options?.llm?.aggregation_min_words and aggregation_min_chars and using updateRoleOptions('llm', { aggregation_min_words/aggregation_min_chars: ... })) to include min={1} so the UI prevents 0/negative values and matches server-side lower bound.src/engine.py (1)
9100-9109: Extract downstream override resolution into one helper.The same branching is duplicated in two hot paths; centralizing it will prevent divergence.
♻️ Refactor sketch
+ def _use_streaming_for_tts_adapter(self, tts_adapter: Any) -> bool: + mode = (getattr(tts_adapter, "downstream_mode_override", "auto") or "auto").strip().lower() + if mode == "stream": + return True + if mode == "file": + return False + return self.config.downstream_mode != "file"Then replace both blocks with:
- _tts_dm_override = getattr(pipeline.tts_adapter, "downstream_mode_override", "auto") or "auto" - ... - if _tts_dm_override == "stream": - use_streaming_playback = True - elif _tts_dm_override == "file": - use_streaming_playback = False - else: - use_streaming_playback = self.config.downstream_mode != "file" + use_streaming_playback = self._use_streaming_for_tts_adapter(pipeline.tts_adapter)Also applies to: 9627-9635
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine.py` around lines 9100 - 9109, Extract the repeated downstream-mode resolution into a single helper (e.g., Engine.resolve_downstream_playback or utils.get_effective_downstream_mode) that takes the pipeline (or tts_adapter and global config) and returns either the resolved mode string ("stream"|"file"|"auto") or the boolean use_streaming_playback; move the logic that reads getattr(pipeline.tts_adapter, "downstream_mode_override", "auto") or "auto" and the branch that maps "stream"/"file"/other to a playback decision into that helper, replace both duplicated blocks (the one using pipeline.tts_adapter.downstream_mode_override and the one at the other hot path) with a single call to the helper, and ensure the helper uses pipeline.tts_adapter.__class__.__name__ only for logging so behavior is unchanged.src/pipelines/orchestrator.py (1)
1528-1544: Add Azure provider hint for better user experience.The
_format_placeholder_errormethod provides helpful hints for misconfigured providers (OpenAI, Google, Deepgram, etc.), but Azure is missing. When users encounter Azure placeholder errors, they won't get guidance like other providers.✨ Proposed fix to add Azure hint
elif provider == "minimax": hints.append("Set MINIMAX_API_KEY and configure providers.minimax_llm.") + elif provider == "azure": + hints.append("Set AZURE_SPEECH_KEY and configure providers.azure_stt/providers.azure_tts.") hint = f" Hint: {' '.join(hints)}" if hints else ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/orchestrator.py` around lines 1528 - 1544, In _format_placeholder_error, the provider-specific hints list is missing guidance for Azure: locate the block that builds hints (variable hints and parameter provider) and add an elif branch for provider == "azure" that appends a clear hint (e.g., "Set AZURE_API_KEY or AZURE_TENANT_ID/AZURE_CLIENT_ID/AZURE_CLIENT_SECRET and configure providers.azure_* (llm/stt/tts) as needed"). Ensure the new branch follows the same pattern and wording style as the other provider branches so Azure users receive the same configuration guidance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 75-79: The effect in PipelineForm.tsx is watching the wrong STT
option key (vad_silence_ms) so saved Azure timeout overrides appear unset;
update the condition and dependency array to check for the actual keys used by
the backend (config?.options?.stt?.vad_silence_timeout_ms and/or
config?.options?.stt?.vad_initial_silence_timeout_ms) along with
timestamp_granularities, and keep calling setShowSttExpert(true) when either
timeout or timestamp_granularities is present so the expert STT section
correctly shows on reload.
In `@src/engine.py`:
- Line 9250: The transcript queue is being assigned to
self._pipeline_transcript_queues[call_id] too late so ChannelTalkingFinished may
call flush_speech() before the queue exists and transcripts are dropped; move
the initialization of transcript_queue into the earlier setup path (before any
events that can trigger ChannelTalkingFinished/flush_speech()), ensuring
self._pipeline_transcript_queues[call_id] is created immediately when call_id is
allocated (affecting the code around where transcript_queue is constructed and
referenced—ensure the same early initialization is applied to the code paths
touched in the region around the current _pipeline_transcript_queues assignment
and the code covered by the earlier block that spans the flush_speech() call).
- Around line 10044-10046: The current parsing of
pipeline.llm_options["aggregation_min_words"] and ["aggregation_min_chars"] can
raise on non-numeric values; update the min_words and min_chars computation in
the block that sets min_words, min_chars, and threshold_met to defensively parse
values (wrap parsing in try/except catching ValueError/TypeError), coerce valid
numeric strings to int, fall back to the existing defaults (3 for words, 12 for
chars) on error, then enforce min 1 via max(1, parsed) before computing
threshold_met = words >= min_words or chars >= min_chars; reference the
variables min_words, min_chars, pipeline.llm_options, and threshold_met when
making the change.
In `@src/pipelines/azure.py`:
- Around line 29-32: The type annotation aiohttp.ClientSession is evaluated at
import time and fails when aiohttp is None; fix by making those annotations
lazy: add "from typing import TYPE_CHECKING, Any" and either (a) wrap "import
aiohttp" inside an if TYPE_CHECKING block and keep annotations as
aiohttp.ClientSession, or (b) change the annotations that reference
aiohttp.ClientSession to the string form "aiohttp.ClientSession", or (c) fall
back to Any (e.g., session: Any) when aiohttp is not available; update every
function/method signature and attribute annotation that currently uses
aiohttp.ClientSession (the places noted in the review) so the module can import
without aiohttp installed.
In `@src/pipelines/openai.py`:
- Around line 480-488: The debug log in the OpenAI chat completion block is
exposing raw message content via last_message_preview and assumes each message
is a dict; change the logger.debug call (where logger.debug is invoked with
call_id, model, temperature, tools_count, messages_count) to stop including raw
content and instead log metadata only: for the last message, safely check type
and extract role and content length (e.g., role=str(msg.get("role")) if
isinstance(msg, dict) else "unknown", length=len(msg.get("content","")) if
isinstance(msg, dict) and isinstance(msg.get("content",""), str) else 0) so you
avoid .get crashes and eliminate PII risk.
---
Duplicate comments:
In `@src/core/streaming_playback_manager.py`:
- Around line 875-882: The periodic batch is being cleared regardless of
persistence success, causing lost audio counts; change the block using
get_by_call_id(call_id) so that bytes_since_last_upsert and last_upsert_time are
only reset after a successful fetch and await
self.session_store.upsert_call(sess) completes (i.e., when sess is truthy and
upsert_call has returned); leave bytes_since_last_upsert and last_upsert_time
unchanged when sess is None (optionally log a warning) so the next periodic tick
will include the unpersisted bytes.
---
Nitpick comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 430-447: The two number inputs in PipelineForm.tsx for "LLM Min
Words Threshold" and "LLM Min Chars Threshold" accept 0/negative values but the
backend clamps them; update the two FormInput components (the ones bound to
localConfig.options?.llm?.aggregation_min_words and aggregation_min_chars and
using updateRoleOptions('llm', { aggregation_min_words/aggregation_min_chars:
... })) to include min={1} so the UI prevents 0/negative values and matches
server-side lower bound.
In `@src/engine.py`:
- Around line 9100-9109: Extract the repeated downstream-mode resolution into a
single helper (e.g., Engine.resolve_downstream_playback or
utils.get_effective_downstream_mode) that takes the pipeline (or tts_adapter and
global config) and returns either the resolved mode string
("stream"|"file"|"auto") or the boolean use_streaming_playback; move the logic
that reads getattr(pipeline.tts_adapter, "downstream_mode_override", "auto") or
"auto" and the branch that maps "stream"/"file"/other to a playback decision
into that helper, replace both duplicated blocks (the one using
pipeline.tts_adapter.downstream_mode_override and the one at the other hot path)
with a single call to the helper, and ensure the helper uses
pipeline.tts_adapter.__class__.__name__ only for logging so behavior is
unchanged.
In `@src/pipelines/azure.py`:
- Around line 746-761: The cleanup try/except blocks silently swallow
exceptions; update them to log debug-level details so failures are visible:
around push_stream.close() capture the exception and call logger.debug with
context (e.g., "push_stream.close failed", include call_id and the exception),
and similarly around recognizer.stop_continuous_recognition (use await
asyncio.to_thread(recognizer.stop_continuous_recognition) inside the try and on
except call logger.debug with "recognizer.stop_continuous_recognition failed",
call_id, and exception); keep the exception suppressed after logging to preserve
current behavior and reference session/get("recognizer") where applicable.
- Around line 353-358: The silent except block around self._vad.is_speech(frame,
sample_rate_hz) swallows VAD errors; change it to catch Exception as e and emit
a debug-level log with the exception and minimal context (e.g., sample_rate_hz
and a short identifier for frame) before continuing. Use the existing logger on
the class if available (e.g., self._logger) or logging.getLogger(__name__) and
call logger.debug(...) with the exception info so the code still continues but
failures are visible for troubleshooting.
In `@src/pipelines/orchestrator.py`:
- Around line 1528-1544: In _format_placeholder_error, the provider-specific
hints list is missing guidance for Azure: locate the block that builds hints
(variable hints and parameter provider) and add an elif branch for provider ==
"azure" that appends a clear hint (e.g., "Set AZURE_API_KEY or
AZURE_TENANT_ID/AZURE_CLIENT_ID/AZURE_CLIENT_SECRET and configure
providers.azure_* (llm/stt/tts) as needed"). Ensure the new branch follows the
same pattern and wording style as the other provider branches so Azure users
receive the same configuration guidance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4a817814-2ae7-46fc-a7f0-c50ac015ad87
📒 Files selected for processing (12)
admin_ui/backend/api/config.pyadmin_ui/frontend/src/components/config/PipelineForm.tsxadmin_ui/frontend/src/pages/ProvidersPage.tsxconfig/ai-agent.yamlsrc/config.pysrc/config/__init__.pysrc/config/security.pysrc/core/streaming_playback_manager.pysrc/engine.pysrc/pipelines/azure.pysrc/pipelines/openai.pysrc/pipelines/orchestrator.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/config/security.py
- src/config/init.py
- config/ai-agent.yaml
| useEffect(() => { | ||
| if (Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) { | ||
| if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) || config?.options?.stt?.vad_silence_ms !== undefined) { | ||
| setShowSttExpert(true); | ||
| } | ||
| }, [config?.options?.stt?.timestamp_granularities]); | ||
| }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms]); |
There was a problem hiding this comment.
This hook is watching the wrong Azure STT option key.
src/config.py exposes Azure timeout overrides as vad_silence_timeout_ms and vad_initial_silence_timeout_ms, but this effect is keyed off options.stt.vad_silence_ms. Any saved Azure timeout override will be treated as “unset” when the form reloads.
♻️ Suggested change
- if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) || config?.options?.stt?.vad_silence_ms !== undefined) {
+ if (
+ (Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) ||
+ config?.options?.stt?.vad_silence_timeout_ms !== undefined ||
+ config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined
+ ) {
setShowSttExpert(true);
}
- }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms]);
+ }, [
+ config?.options?.stt?.timestamp_granularities,
+ config?.options?.stt?.vad_silence_timeout_ms,
+ config?.options?.stt?.vad_initial_silence_timeout_ms,
+ ]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| useEffect(() => { | |
| if (Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) { | |
| if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) || config?.options?.stt?.vad_silence_ms !== undefined) { | |
| setShowSttExpert(true); | |
| } | |
| }, [config?.options?.stt?.timestamp_granularities]); | |
| }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms]); | |
| useEffect(() => { | |
| if ( | |
| (Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) || | |
| config?.options?.stt?.vad_silence_timeout_ms !== undefined || | |
| config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined | |
| ) { | |
| setShowSttExpert(true); | |
| } | |
| }, [ | |
| config?.options?.stt?.timestamp_granularities, | |
| config?.options?.stt?.vad_silence_timeout_ms, | |
| config?.options?.stt?.vad_initial_silence_timeout_ms, | |
| ]); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 75 -
79, The effect in PipelineForm.tsx is watching the wrong STT option key
(vad_silence_ms) so saved Azure timeout overrides appear unset; update the
condition and dependency array to check for the actual keys used by the backend
(config?.options?.stt?.vad_silence_timeout_ms and/or
config?.options?.stt?.vad_initial_silence_timeout_ms) along with
timestamp_granularities, and keep calling setShowSttExpert(true) when either
timeout or timestamp_granularities is present so the expert STT section
correctly shows on reload.
- Add `from __future__ import annotations` to azure.py to prevent AttributeError when aiohttp is not installed (type annotations referencing aiohttp.ClientSession are deferred by PEP 563) - Initialize _pipeline_transcript_queues earlier in _ensure_pipeline_runner to prevent race condition where ChannelTalkingFinished flush drops transcripts before _pipeline_runner creates the queue - Guard aggregation threshold parsing with try/except to handle non-numeric config values without crashing the turn loop - Expand PipelineForm STT expert useEffect dependencies to include Azure-specific keys (variant, vad_silence_timeout_ms) so expert panel auto-opens when Azure overrides are saved - Remove message content from OpenAI debug log (PII risk) and make last-message extraction type-safe for non-dict message items Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
admin_ui/frontend/src/components/config/PipelineForm.tsx (1)
76-82:⚠️ Potential issue | 🟡 MinorInclude
vad_initial_silence_timeout_msin STT expert auto-open checks.Line 76 and Line 82 still miss
config?.options?.stt?.vad_initial_silence_timeout_ms, so saved Azure initial-silence overrides may not auto-show expert mode on reload.♻️ Suggested fix
- if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) - || config?.options?.stt?.vad_silence_ms !== undefined - || config?.options?.stt?.variant !== undefined - || config?.options?.stt?.vad_silence_timeout_ms !== undefined) { + if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) + || config?.options?.stt?.vad_silence_ms !== undefined + || config?.options?.stt?.variant !== undefined + || config?.options?.stt?.vad_silence_timeout_ms !== undefined + || config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined) { setShowSttExpert(true); } - }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms, config?.options?.stt?.variant, config?.options?.stt?.vad_silence_timeout_ms]); + }, [ + config?.options?.stt?.timestamp_granularities, + config?.options?.stt?.vad_silence_ms, + config?.options?.stt?.variant, + config?.options?.stt?.vad_silence_timeout_ms, + config?.options?.stt?.vad_initial_silence_timeout_ms, + ]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 76 - 82, The STT expert auto-open condition misses checking config?.options?.stt?.vad_initial_silence_timeout_ms, so saved Azure initial-silence overrides won't trigger setShowSttExpert(true) on load; update the conditional inside the effect (the expression that currently checks timestamp_granularities, vad_silence_ms, variant, vad_silence_timeout_ms) to also include config?.options?.stt?.vad_initial_silence_timeout_ms and add it to the effect dependency array so the effect re-runs when vad_initial_silence_timeout_ms changes.src/pipelines/azure.py (1)
1014-1016:⚠️ Potential issue | 🟠 MajorRaw A-law/μ-law streaming path still has payload-corruption edges.
On Line 1014, odd-byte carry is applied to all formats, including raw μ-law/A-law (1 byte/sample). Also, on Lines 1025-1038, raw A-law falls into PCM conversion when
target_encoding != "alaw".Proposed fix
- if len(pcm_payload) % 2 != 0: + is_companded_raw = is_raw_mulaw or is_raw_alaw + if (not is_companded_raw) and len(pcm_payload) % 2 != 0: leftover_byte = pcm_payload[-1:] pcm_payload = pcm_payload[:-1] @@ - if (is_raw_mulaw and target_encoding == "mulaw") or (is_raw_alaw and target_encoding == "alaw"): + if (is_raw_mulaw and target_encoding == "mulaw") or (is_raw_alaw and target_encoding == "alaw"): # Already in target format — yield directly converted = pcm_payload + elif is_raw_alaw: + raise RuntimeError( + "Azure TTS raw A-law cannot be converted through PCM path. " + "Use target_encoding='alaw' or choose PCM/μ-law output_format." + ) else: # PCM16 LE raw bytes — determine source sample rate from format string audio_bytes = pcm_payload @@ returnAlso applies to: 1021-1038
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 1014 - 1016, The leftover-byte truncation and PCM conversion are being applied indiscriminately to raw A-law/μ-law data: restrict the odd-byte carry logic around pcm_payload/leftover_byte so it only runs for true PCM formats (e.g., when bytes_per_sample > 1 or source_encoding in a pcm_encodings set), and ensure the branch that handles raw A-law/μ-law (check target_encoding == "alaw" or source_encoding == "alaw"/"ulaw") bypasses the PCM conversion block (the code currently around pcm_payload and the conversion block using target_encoding != "alaw"); update the conditionals so raw alaw/ulaw follow their dedicated path and never fall into the PCM conversion logic.
🧹 Nitpick comments (3)
admin_ui/frontend/src/components/config/PipelineForm.tsx (1)
42-44: Auto-enable TTS expert mode for saved Azure TTS overrides.
showTtsExpertauto-open logic only checks OpenAI/Groq keys. If Azurevoice_name/output_formatis saved, expert mode won’t auto-enable on reload.♻️ Suggested parity update
const [showTtsExpert, setShowTtsExpert] = useState<boolean>( - () => config?.options?.tts?.response_format !== undefined || config?.options?.tts?.max_input_chars !== undefined + () => + config?.options?.tts?.response_format !== undefined || + config?.options?.tts?.max_input_chars !== undefined || + config?.options?.tts?.voice_name !== undefined || + config?.options?.tts?.output_format !== undefined ); useEffect(() => { - if (config?.options?.tts?.response_format !== undefined || config?.options?.tts?.max_input_chars !== undefined) { + if ( + config?.options?.tts?.response_format !== undefined || + config?.options?.tts?.max_input_chars !== undefined || + config?.options?.tts?.voice_name !== undefined || + config?.options?.tts?.output_format !== undefined + ) { setShowTtsExpert(true); } - }, [config?.options?.tts?.response_format, config?.options?.tts?.max_input_chars]); + }, [ + config?.options?.tts?.response_format, + config?.options?.tts?.max_input_chars, + config?.options?.tts?.voice_name, + config?.options?.tts?.output_format, + ]);Also applies to: 85-88, 641-658
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 42 - 44, The showTtsExpert initial state only checks OpenAI/Groq-related fields; update the initializer and any related toggling logic (where setShowTtsExpert is used) to also consider saved Azure TTS overrides by checking config?.options?.tts?.voice_name and config?.options?.tts?.output_format (or any Azure-specific keys used elsewhere); ensure the same parity change is applied to the other places mentioned (the other showTtsExpert checks around the component and any form load logic) so that expert mode auto-enables when Azure TTS settings are present.src/engine.py (1)
9103-9112: Extract downstream-mode resolution into a shared helper.This logic is duplicated in two hot paths and can drift. A small helper (e.g.,
_resolve_tts_streaming_mode(tts_adapter)) will keep behavior consistent.Also applies to: 9631-9639
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine.py` around lines 9103 - 9112, Extract the duplicated downstream-mode resolution into a single helper function named _resolve_tts_streaming_mode(tts_adapter) that returns a boolean (True for streaming playback, False for file playback); inside the helper replicate the existing logic: read getattr(tts_adapter, "downstream_mode_override", "auto") or "auto", evaluate "stream" -> True, "file" -> False, otherwise consult self.config.downstream_mode != "file" (pass self.config or make helper a method if it needs access to self); then replace the duplicate blocks around pipeline.tts_adapter and use_streaming_playback with a call to _resolve_tts_streaming_mode(pipeline.tts_adapter) to ensure consistent behavior across both locations.src/pipelines/azure.py (1)
748-751: Don’t silently swallow stream teardown errors.
except ...: passon Lines 750-751 and 761-762 hides cleanup failures and makes stuck-stream incidents hard to debug.Proposed fix
if push_stream: try: push_stream.close() - except Exception: - pass + except Exception: + logger.debug("Azure SDK push_stream.close failed", call_id=call_id, exc_info=True) @@ try: logger.debug("Stopping Azure SDK continuous recognition", call_id=call_id) await asyncio.to_thread(recognizer.stop_continuous_recognition) - except Exception: - pass + except Exception: + logger.debug("Azure SDK recognizer stop failed", call_id=call_id, exc_info=True)Also applies to: 760-762
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 748 - 751, Replace the bare "except: pass" around push_stream.close() with explicit error handling: catch Exception as e and log the exception (including stack trace) with the module/logger used in this file (e.g., logger.exception or similar) and include contextual info (stream name, request id, etc.), and do the same for the other symmetric teardown block (the similar close() call at lines ~760-762) so teardown failures are visible instead of silently swallowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 433-450: The onChange handlers in PipelineForm.tsx for the LLM
thresholds (FormInput for aggregation_min_words and aggregation_min_chars) pass
parseInt results directly to updateRoleOptions, allowing NaN, zero or negative
values to be saved; change each onChange to: read e.target.value, if empty set
undefined, else parseInt and if the result is a finite integer > 0 keep it,
otherwise set undefined (or leave previous valid), then call
updateRoleOptions('llm', { aggregation_min_words: ... }) /
updateRoleOptions('llm', { aggregation_min_chars: ... }) accordingly; ensure you
reference the localConfig.options?.llm fields and respect showLlmExpert
disabling.
In `@src/engine.py`:
- Line 4420: The log call that currently includes transcript content —
logger.debug("Early STT flush returned transcript", call_id=call_id,
transcript_preview=transcript[:50]) — must stop emitting PII; replace the
transcript_preview field with metadata only (e.g., transcript_length =
len(transcript), is_truncated = True/False, and an optional pii_detected flag)
so logs contain size/flags instead of actual transcript text; update the
logger.debug invocation accordingly and ensure any tests/consumers use the new
metadata fields.
In `@src/pipelines/azure.py`:
- Around line 470-476: Remove the transcript preview from logs to avoid leaking
PII: in the Azure STT logging calls that use logger.info with keys call_id,
request_id, latency_ms, transcript_preview and the transcript variable, stop
including transcript_preview (or replace it with a non-sensitive field such as
transcript_length or a redacted boolean) and update the logger.info calls
accordingly; apply the exact same change to the other occurrence that mirrors
this pattern so neither location logs transcript/text previews.
- Around line 361-378: The buffer is allowed to grow during long silence because
audio is always appended; change logic in the block using _audio_buffer,
_is_speaking, has_speech, bytes_per_frame and _max_silence_frames so you only
extend _audio_buffer when speech is ongoing (self._is_speaking) or when
has_speech is True (to start capturing speech), and when neither is true
trim/limit the buffer to a small pre-roll cap (e.g., max bytes = bytes_per_frame
* self._max_silence_frames) instead of unbounded extend; update
_buffer_sample_rate as before and keep the existing silence counting
(_silence_frames) and send/clear behavior unchanged.
---
Duplicate comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 76-82: The STT expert auto-open condition misses checking
config?.options?.stt?.vad_initial_silence_timeout_ms, so saved Azure
initial-silence overrides won't trigger setShowSttExpert(true) on load; update
the conditional inside the effect (the expression that currently checks
timestamp_granularities, vad_silence_ms, variant, vad_silence_timeout_ms) to
also include config?.options?.stt?.vad_initial_silence_timeout_ms and add it to
the effect dependency array so the effect re-runs when
vad_initial_silence_timeout_ms changes.
In `@src/pipelines/azure.py`:
- Around line 1014-1016: The leftover-byte truncation and PCM conversion are
being applied indiscriminately to raw A-law/μ-law data: restrict the odd-byte
carry logic around pcm_payload/leftover_byte so it only runs for true PCM
formats (e.g., when bytes_per_sample > 1 or source_encoding in a pcm_encodings
set), and ensure the branch that handles raw A-law/μ-law (check target_encoding
== "alaw" or source_encoding == "alaw"/"ulaw") bypasses the PCM conversion block
(the code currently around pcm_payload and the conversion block using
target_encoding != "alaw"); update the conditionals so raw alaw/ulaw follow
their dedicated path and never fall into the PCM conversion logic.
---
Nitpick comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 42-44: The showTtsExpert initial state only checks
OpenAI/Groq-related fields; update the initializer and any related toggling
logic (where setShowTtsExpert is used) to also consider saved Azure TTS
overrides by checking config?.options?.tts?.voice_name and
config?.options?.tts?.output_format (or any Azure-specific keys used elsewhere);
ensure the same parity change is applied to the other places mentioned (the
other showTtsExpert checks around the component and any form load logic) so that
expert mode auto-enables when Azure TTS settings are present.
In `@src/engine.py`:
- Around line 9103-9112: Extract the duplicated downstream-mode resolution into
a single helper function named _resolve_tts_streaming_mode(tts_adapter) that
returns a boolean (True for streaming playback, False for file playback); inside
the helper replicate the existing logic: read getattr(tts_adapter,
"downstream_mode_override", "auto") or "auto", evaluate "stream" -> True, "file"
-> False, otherwise consult self.config.downstream_mode != "file" (pass
self.config or make helper a method if it needs access to self); then replace
the duplicate blocks around pipeline.tts_adapter and use_streaming_playback with
a call to _resolve_tts_streaming_mode(pipeline.tts_adapter) to ensure consistent
behavior across both locations.
In `@src/pipelines/azure.py`:
- Around line 748-751: Replace the bare "except: pass" around
push_stream.close() with explicit error handling: catch Exception as e and log
the exception (including stack trace) with the module/logger used in this file
(e.g., logger.exception or similar) and include contextual info (stream name,
request id, etc.), and do the same for the other symmetric teardown block (the
similar close() call at lines ~760-762) so teardown failures are visible instead
of silently swallowed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c3171372-c54a-4729-aa80-0f81cd958642
📒 Files selected for processing (4)
admin_ui/frontend/src/components/config/PipelineForm.tsxsrc/engine.pysrc/pipelines/azure.pysrc/pipelines/openai.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/pipelines/openai.py
src/engine.py
Outdated
| timeout=5, | ||
| ) | ||
| if transcript: | ||
| logger.debug("Early STT flush returned transcript", call_id=call_id, transcript_preview=transcript[:50]) |
There was a problem hiding this comment.
Avoid logging transcript content in engine logs.
Line 4420 logs transcript_preview, which can leak PII/sensitive caller data into logs. Prefer metadata (length/flags) instead of content.
🔧 Proposed fix
- logger.debug("Early STT flush returned transcript", call_id=call_id, transcript_preview=transcript[:50])
+ logger.debug(
+ "Early STT flush returned transcript",
+ call_id=call_id,
+ transcript_len=len(transcript),
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.debug("Early STT flush returned transcript", call_id=call_id, transcript_preview=transcript[:50]) | |
| logger.debug( | |
| "Early STT flush returned transcript", | |
| call_id=call_id, | |
| transcript_len=len(transcript), | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engine.py` at line 4420, The log call that currently includes transcript
content — logger.debug("Early STT flush returned transcript", call_id=call_id,
transcript_preview=transcript[:50]) — must stop emitting PII; replace the
transcript_preview field with metadata only (e.g., transcript_length =
len(transcript), is_truncated = True/False, and an optional pii_detected flag)
so logs contain size/flags instead of actual transcript text; update the
logger.debug invocation accordingly and ensure any tests/consumers use the new
metadata fields.
…put hardening - Remove transcript/text preview strings from logs (PII risk): replace transcript_preview and text_preview with metadata-only fields (transcript_len, transcript_chars, text_chars) in engine.py and azure.py - Guard pre-speech audio buffer: only extend _audio_buffer when speech is detected or speaking is active, preventing unbounded memory growth during long silences - Harden LLM threshold inputs in PipelineForm: add min=1, step=1 attrs and parseInt guards with Number.isFinite + Math.max(1, parsed) clamping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
admin_ui/frontend/src/components/config/PipelineForm.tsx (1)
76-83:⚠️ Potential issue | 🟡 MinorInclude
vad_initial_silence_timeout_msin STT expert auto-open detection.Line 76-83 still won’t auto-open STT expert mode when only
options.stt.vad_initial_silence_timeout_msis present in saved config.♻️ Suggested patch
- if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) - || config?.options?.stt?.vad_silence_ms !== undefined - || config?.options?.stt?.variant !== undefined - || config?.options?.stt?.vad_silence_timeout_ms !== undefined) { + if ((Array.isArray(config?.options?.stt?.timestamp_granularities) && config.options.stt.timestamp_granularities.length > 0) + || config?.options?.stt?.vad_silence_ms !== undefined + || config?.options?.stt?.variant !== undefined + || config?.options?.stt?.vad_silence_timeout_ms !== undefined + || config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined) { setShowSttExpert(true); } - }, [config?.options?.stt?.timestamp_granularities, config?.options?.stt?.vad_silence_ms, config?.options?.stt?.variant, config?.options?.stt?.vad_silence_timeout_ms]); + }, [ + config?.options?.stt?.timestamp_granularities, + config?.options?.stt?.vad_silence_ms, + config?.options?.stt?.variant, + config?.options?.stt?.vad_silence_timeout_ms, + config?.options?.stt?.vad_initial_silence_timeout_ms, + ]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 76 - 83, The effect that auto-opens STT expert mode is missing checks for options.stt.vad_initial_silence_timeout_ms; update the conditional that sets setShowSttExpert(true) to also check for config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined (and add that same property to the effect dependency array) so the form will auto-open when only vad_initial_silence_timeout_ms is present; refer to the existing setShowSttExpert call and the config?.options?.stt timestamp/vad checks to locate where to add the new property.
🧹 Nitpick comments (6)
admin_ui/frontend/src/components/config/PipelineForm.tsx (1)
666-672: Prefer a select for Azure TTS output format override (not free text).Free-text here is error-prone and can create invalid runtime configs. Reusing the curated format options (as in
AzureProviderForm.tsx) would reduce misconfiguration risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@admin_ui/frontend/src/components/config/PipelineForm.tsx` around lines 666 - 672, Replace the free-text input for "Azure TTS Output Format Override" with a select that reuses the curated format options from AzureProviderForm (the same list of allowed output formats), binding the selected value to localConfig.options?.tts?.output_format and calling updateRoleOptions('tts', { output_format: selectedValue || undefined }) on change; ensure an empty/cleared selection maps to undefined and keep the disabled state controlled by showTtsExpert so the field is disabled when showTtsExpert is false.src/pipelines/azure.py (4)
754-768: Cleanup exceptions silently swallowed.Both
push_stream.close()andrecognizer.stop_continuous_recognitionexceptions are caught and ignored. While cleanup shouldn't raise to callers, logging at debug level aids troubleshooting.♻️ Proposed fix
if push_stream: try: push_stream.close() - except Exception: - pass + except Exception as exc: + logger.debug("Failed to close Azure push stream", call_id=call_id, exc_info=exc) # Stop recognizer recognizer = session.get("recognizer") if recognizer: try: # Stop blocks until final processed segments finish, which may take seconds. # In asyncio context, best to run it in thread to avoid blocking loop if slow. logger.debug("Stopping Azure SDK continuous recognition", call_id=call_id) await asyncio.to_thread(recognizer.stop_continuous_recognition) - except Exception: - pass + except Exception as exc: + logger.debug("Failed to stop Azure recognizer", call_id=call_id, exc_info=exc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 754 - 768, The cleanup currently swallows exceptions for push_stream.close() and recognizer.stop_continuous_recognition; modify both except blocks to capture the exception (e.g., except Exception as e) and log the error details at debug level using the existing logger (include context like call_id and exc_info or the exception message) instead of silently passing so failures during cleanup are recorded for troubleshooting while still not propagating.
153-160: Incomplete XML escaping in SSML builder.The SSML text escaping handles the five XML special characters, but the
voice_name,lang,lang_tag,prosody_pitch, andprosody_ratevalues are interpolated directly into attribute positions without escaping. While these values typically come from configuration rather than user input, a malformed voice name could break or inject into the SSML.♻️ Consider escaping attribute values
+def _escape_attr(val: str) -> str: + """Escape a value for use in an XML attribute (single-quoted).""" + return val.replace("&", "&").replace("'", "'").replace("<", "<") + def _build_ssml( text: str, voice_name: str, language: str, prosody_pitch: Optional[str] = None, prosody_rate: Optional[str] = None, lang_tag: Optional[str] = None, ) -> str: ... - lang = language or ("-".join(voice_name.split("-")[:2]) if "-" in voice_name else "en-US") + lang = _escape_attr(language or ("-".join(voice_name.split("-")[:2]) if "-" in voice_name else "en-US")) + voice_name = _escape_attr(voice_name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 153 - 160, The SSML builder escapes text content but not attribute values; update the code that constructs the SSML (where safe_text is used and where voice_name, lang, lang_tag, prosody_pitch, and prosody_rate are interpolated) to escape attribute values as well. Add or reuse an XML attribute-escaping helper (e.g., escape_xml_attr similar to safe_text but ensuring quotes and other XML entities are encoded) and apply it to voice_name, lang, lang_tag, prosody_pitch, and prosody_rate before interpolation into attributes so malformed names cannot break or inject into the SSML.
354-359: Consider logging VAD frame exceptions at debug level.The VAD
is_speechcall can fail on malformed frames, but silently swallowing exceptions makes debugging audio issues harder. A debug-level log would help troubleshooting without cluttering production logs.♻️ Proposed fix
try: if self._vad.is_speech(frame, sample_rate_hz): has_speech = True break - except Exception: - pass + except Exception as exc: + logger.debug("VAD is_speech check failed", frame_offset=offset, exc_info=exc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 354 - 359, The code currently swallows exceptions from self._vad.is_speech; change the bare except to catch Exception as e and emit a debug-level log before continuing so malformed frames are diagnosable. Specifically, in the loop where self._vad.is_speech(frame, sample_rate_hz) is called (and has_speech is set), replace "except Exception: pass" with "except Exception as e: <logger>.debug(...)" and include the exception string and minimal context (e.g., sample_rate_hz, len(frame) or frame metadata) in the message; keep the existing control flow (do not re-raise) so has_speech behavior is unchanged. Use the module or instance logger used elsewhere in the file (e.g., logger or self._logger) to ensure consistent logging.
908-910: Unreachable code after early return.The
yieldstatement on line 910 is unreachable because line 909 returns early. This is a quirk needed to make the function an async generator, but could be clearer.♻️ Clearer pattern
async def synthesize( self, call_id: str, text: str, options: Dict[str, Any], ) -> AsyncIterator[bytes]: if not text: - return - yield # make this an async generator + return # Early exit for empty text + yield b"" # type: ignore[misc] # unreachable; makes this an async generator🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 908 - 910, The early return makes the subsequent yield unreachable; replace this unclear pattern by removing the unreachable yield and explicitly preserve the async-generator signature with a clear no-op yield guard (e.g., place a conditional no-op yield like an "if False: yield" / dummy yield at function start) so the function remains an async generator while keeping the early "if not text: return" path; update the function containing the "text" parameter accordingly (replace the unreachable "yield" after the return with the no-op yield guard).src/engine.py (1)
9103-9112: Extract duplicated TTS downstream-mode resolution into one helper.The same override block appears in greeting and main turn synthesis paths. Centralizing it would prevent drift.
♻️ Suggested refactor
+ def _resolve_tts_use_streaming_playback(self, tts_adapter: Any) -> bool: + override = getattr(tts_adapter, "downstream_mode_override", "auto") or "auto" + logger.debug( + "TTS downstream mode override evaluated", + override=override, + adapter=tts_adapter.__class__.__name__, + ) + if override == "stream": + return True + if override == "file": + return False + return self.config.downstream_mode != "file"Then replace both duplicated blocks with:
- _tts_dm_override = getattr(pipeline.tts_adapter, "downstream_mode_override", "auto") or "auto" - logger.debug(f"TTS Adapter DM Override evaluated as: {_tts_dm_override} on adapter {pipeline.tts_adapter.__class__.__name__}") - if _tts_dm_override == "stream": - use_streaming_playback = True - elif _tts_dm_override == "file": - use_streaming_playback = False - else: - use_streaming_playback = self.config.downstream_mode != "file" + use_streaming_playback = self._resolve_tts_use_streaming_playback(pipeline.tts_adapter)Also applies to: 9631-9639
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engine.py` around lines 9103 - 9112, Extract the duplicated TTS downstream-mode resolution into a single helper (e.g., resolve_tts_downstream_mode or get_effective_tts_downstream_mode) that takes the pipeline instance (or pipeline.tts_adapter) and self.config as inputs, reads pipeline.tts_adapter.downstream_mode_override using getattr with a default of "auto", logs the evaluated override (include adapter class name), and returns a boolean for use_streaming_playback (True if override == "stream", False if override == "file", else compare self.config.downstream_mode != "file"); then replace both duplicated blocks (the one using pipeline.tts_adapter.downstream_mode_override in greeting and main synthesis paths) with calls to this helper and assign its boolean return to use_streaming_playback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/pipelines/azure.py`:
- Around line 1060-1070: The non-streaming decode branch only passthroughs
"mulaw" but not "alaw", causing raw A-law audio to fall through incorrectly;
update the conditional in the non-streaming path that currently checks
native_encoding == "mulaw" to also allow "alaw" (e.g., native_encoding in
("mulaw", "alaw")) and passthrough when target_encoding matches, and mirror any
logging/handling used in the streaming path; verify related helpers
(_to_target_format, resample_audio) remain unchanged and update any tests that
expect A-law passthrough.
- Around line 696-697: The call to
recognizer.start_continuous_recognition_async() returns a ResultFuture that must
be awaited to ensure startup completed and to surface local init errors; update
the code where recognizer.start_continuous_recognition_async() is invoked to
capture its return (e.g., result_future =
recognizer.start_continuous_recognition_async()) and call result_future.get()
(synchronously) after wiring the event handlers (such as the canceled handler)
so the method blocks until continuous recognition has actually started and any
immediate errors are raised.
---
Duplicate comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 76-83: The effect that auto-opens STT expert mode is missing
checks for options.stt.vad_initial_silence_timeout_ms; update the conditional
that sets setShowSttExpert(true) to also check for
config?.options?.stt?.vad_initial_silence_timeout_ms !== undefined (and add that
same property to the effect dependency array) so the form will auto-open when
only vad_initial_silence_timeout_ms is present; refer to the existing
setShowSttExpert call and the config?.options?.stt timestamp/vad checks to
locate where to add the new property.
---
Nitpick comments:
In `@admin_ui/frontend/src/components/config/PipelineForm.tsx`:
- Around line 666-672: Replace the free-text input for "Azure TTS Output Format
Override" with a select that reuses the curated format options from
AzureProviderForm (the same list of allowed output formats), binding the
selected value to localConfig.options?.tts?.output_format and calling
updateRoleOptions('tts', { output_format: selectedValue || undefined }) on
change; ensure an empty/cleared selection maps to undefined and keep the
disabled state controlled by showTtsExpert so the field is disabled when
showTtsExpert is false.
In `@src/engine.py`:
- Around line 9103-9112: Extract the duplicated TTS downstream-mode resolution
into a single helper (e.g., resolve_tts_downstream_mode or
get_effective_tts_downstream_mode) that takes the pipeline instance (or
pipeline.tts_adapter) and self.config as inputs, reads
pipeline.tts_adapter.downstream_mode_override using getattr with a default of
"auto", logs the evaluated override (include adapter class name), and returns a
boolean for use_streaming_playback (True if override == "stream", False if
override == "file", else compare self.config.downstream_mode != "file"); then
replace both duplicated blocks (the one using
pipeline.tts_adapter.downstream_mode_override in greeting and main synthesis
paths) with calls to this helper and assign its boolean return to
use_streaming_playback.
In `@src/pipelines/azure.py`:
- Around line 754-768: The cleanup currently swallows exceptions for
push_stream.close() and recognizer.stop_continuous_recognition; modify both
except blocks to capture the exception (e.g., except Exception as e) and log the
error details at debug level using the existing logger (include context like
call_id and exc_info or the exception message) instead of silently passing so
failures during cleanup are recorded for troubleshooting while still not
propagating.
- Around line 153-160: The SSML builder escapes text content but not attribute
values; update the code that constructs the SSML (where safe_text is used and
where voice_name, lang, lang_tag, prosody_pitch, and prosody_rate are
interpolated) to escape attribute values as well. Add or reuse an XML
attribute-escaping helper (e.g., escape_xml_attr similar to safe_text but
ensuring quotes and other XML entities are encoded) and apply it to voice_name,
lang, lang_tag, prosody_pitch, and prosody_rate before interpolation into
attributes so malformed names cannot break or inject into the SSML.
- Around line 354-359: The code currently swallows exceptions from
self._vad.is_speech; change the bare except to catch Exception as e and emit a
debug-level log before continuing so malformed frames are diagnosable.
Specifically, in the loop where self._vad.is_speech(frame, sample_rate_hz) is
called (and has_speech is set), replace "except Exception: pass" with "except
Exception as e: <logger>.debug(...)" and include the exception string and
minimal context (e.g., sample_rate_hz, len(frame) or frame metadata) in the
message; keep the existing control flow (do not re-raise) so has_speech behavior
is unchanged. Use the module or instance logger used elsewhere in the file
(e.g., logger or self._logger) to ensure consistent logging.
- Around line 908-910: The early return makes the subsequent yield unreachable;
replace this unclear pattern by removing the unreachable yield and explicitly
preserve the async-generator signature with a clear no-op yield guard (e.g.,
place a conditional no-op yield like an "if False: yield" / dummy yield at
function start) so the function remains an async generator while keeping the
early "if not text: return" path; update the function containing the "text"
parameter accordingly (replace the unreachable "yield" after the return with the
no-op yield guard).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a267e24e-67df-4881-82e5-c05076b7d0a3
📒 Files selected for processing (3)
admin_ui/frontend/src/components/config/PipelineForm.tsxsrc/engine.pysrc/pipelines/azure.py
…ssthrough - Call .get() on start_continuous_recognition_async() ResultFuture to surface startup errors instead of silently continuing - Add A-law passthrough branch in non-streaming TTS decode path to match the streaming path's handling of raw A-law audio Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/pipelines/azure.py (3)
485-489: Add availability check for aiohttp before use.
AzureSTTRealtimeAdapter.start_streamchecksif not speechsdk:before using the SDK (line 618), but_ensure_sessionusesaiohttp.ClientSessionwithout checking ifaiohttpisNone. If aiohttp isn't installed, this will raise an unclearAttributeError: 'NoneType' object has no attribute 'ClientSession'.♻️ Proposed fix for clearer error handling
async def _ensure_session(self) -> None: if self._session and not self._session.closed: return + if aiohttp is None: + raise RuntimeError( + "aiohttp package is not installed. Required for Azure STT Fast / Azure TTS." + ) factory = self._session_factory or aiohttp.ClientSession self._session = factory()Apply the same fix to
AzureTTSAdapter._ensure_sessionat lines 1089-1093.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 485 - 489, The _ensure_session methods in AzureSTTRealtimeAdapter and AzureTTSAdapter currently call aiohttp.ClientSession without verifying aiohttp is available; update both _ensure_session implementations (the methods named _ensure_session) to check if aiohttp is None and raise a clear RuntimeError (or ValueError) with an explanatory message (e.g., "aiohttp is required for HTTP sessions; please install aiohttp") before attempting to use aiohttp.ClientSession, preserving the existing factory override logic (_session_factory) and behavior when a session already exists.
751-768: Log cleanup errors at debug level instead of silently ignoring.Static analysis flags the
try-except-passblocks during cleanup. While errors during cleanup are typically non-fatal, logging at debug level aids troubleshooting without disrupting the flow.♻️ Optional: add debug logging for cleanup errors
# Close audio stream so Azure knows input is done push_stream = session.get("push_stream") if push_stream: try: push_stream.close() - except Exception: - pass + except Exception as exc: + logger.debug("Azure SDK push_stream.close() failed", call_id=call_id, exc_info=exc) # Stop recognizer recognizer = session.get("recognizer") if recognizer: try: # Stop blocks until final processed segments finish, which may take seconds. # In asyncio context, best to run it in thread to avoid blocking loop if slow. logger.debug("Stopping Azure SDK continuous recognition", call_id=call_id) await asyncio.to_thread(recognizer.stop_continuous_recognition) - except Exception: - pass + except Exception as exc: + logger.debug("Azure SDK stop_continuous_recognition failed", call_id=call_id, exc_info=exc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 751 - 768, Replace the silent cleanup except/pass blocks with debug-level logging: when closing push_stream (session["push_stream"]) or stopping recognizer (session["recognizer"] / recognizer.stop_continuous_recognition) catch Exception as e and call logger.debug with a short message including call_id and the exception (or use exc_info=True) so errors during cleanup are recorded but non-fatal; keep the existing await asyncio.to_thread(recognizer.stop_continuous_recognition) behavior and only change the except handlers to log at debug instead of pass.
354-359: Consider logging VAD frame processing errors at debug level.The
try-except-passblock silently discards any VAD errors, which could make debugging difficult if VAD consistently fails on certain audio patterns.♻️ Optional: log at debug level
try: if self._vad.is_speech(frame, sample_rate_hz): has_speech = True break - except Exception: - pass + except Exception as exc: + logger.debug("VAD frame check failed", exc_info=exc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pipelines/azure.py` around lines 354 - 359, The try/except currently swallows VAD errors in the self._vad.is_speech(frame, sample_rate_hz) call; change the except to capture the exception (e.g., except Exception as e) and log it at debug level including context (frame/sample_rate_hz and the exception message/stack) using the existing logger (e.g., self._logger.debug or module logger) before continuing, so VAD failures are recorded for debugging without altering control flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/pipelines/azure.py`:
- Around line 1027-1044: The code currently assumes A-law or mu-law bytes can be
treated as PCM16 LE and converted or passed through, which silently corrupts
audio for unsupported alaw↔mulaw conversions; update both the streaming branch
(where is_raw_mulaw / is_raw_alaw and target_encoding are checked before using
pcm_payload, fmt_lower, resample_audio, _to_target_format) and the non-streaming
branch (where raw alaw/mulaw is passed through) to explicitly detect mismatched
encodings (e.g., is_raw_alaw and target_encoding == "mulaw" or is_raw_mulaw and
target_encoding == "alaw") and raise a clear exception (or return an error)
instead of attempting PCM resampling/formatting, ensuring only identical
passthroughs or supported conversions reach resample_audio/_to_target_format.
- Around line 142-178: The SSML builder _build_ssml currently only escapes text
content but inserts prosody_pitch, prosody_rate, lang_tag, voice_name (and
derived lang) directly into attribute values, risking malformed XML; add an XML
attribute-escaping step (e.g., a small helper like _escape_xml_attr that
replaces &, <, >, " and ' with &, <, >, ", ') and apply it
to voice_name, lang, lang_tag, and each prosody attribute value before
constructing prosody_attrs and the <voice>/<lang> wrappers so all attribute
values are safe.
---
Nitpick comments:
In `@src/pipelines/azure.py`:
- Around line 485-489: The _ensure_session methods in AzureSTTRealtimeAdapter
and AzureTTSAdapter currently call aiohttp.ClientSession without verifying
aiohttp is available; update both _ensure_session implementations (the methods
named _ensure_session) to check if aiohttp is None and raise a clear
RuntimeError (or ValueError) with an explanatory message (e.g., "aiohttp is
required for HTTP sessions; please install aiohttp") before attempting to use
aiohttp.ClientSession, preserving the existing factory override logic
(_session_factory) and behavior when a session already exists.
- Around line 751-768: Replace the silent cleanup except/pass blocks with
debug-level logging: when closing push_stream (session["push_stream"]) or
stopping recognizer (session["recognizer"] /
recognizer.stop_continuous_recognition) catch Exception as e and call
logger.debug with a short message including call_id and the exception (or use
exc_info=True) so errors during cleanup are recorded but non-fatal; keep the
existing await asyncio.to_thread(recognizer.stop_continuous_recognition)
behavior and only change the except handlers to log at debug instead of pass.
- Around line 354-359: The try/except currently swallows VAD errors in the
self._vad.is_speech(frame, sample_rate_hz) call; change the except to capture
the exception (e.g., except Exception as e) and log it at debug level including
context (frame/sample_rate_hz and the exception message/stack) using the
existing logger (e.g., self._logger.debug or module logger) before continuing,
so VAD failures are recorded for debugging without altering control flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d8e7b9e2-91e7-41e1-9671-7be2f6e34a5a
📒 Files selected for processing (1)
src/pipelines/azure.py
| def _build_ssml( | ||
| text: str, | ||
| voice_name: str, | ||
| language: str, | ||
| prosody_pitch: Optional[str] = None, | ||
| prosody_rate: Optional[str] = None, | ||
| lang_tag: Optional[str] = None, | ||
| ) -> str: | ||
| """Build a minimal SSML document for Azure TTS, with optional prosody and lang controls.""" | ||
| # Derive xml:lang from voice_name locale prefix (e.g. "en-US-JennyNeural" -> "en-US") | ||
| lang = language or ("-".join(voice_name.split("-")[:2]) if "-" in voice_name else "en-US") | ||
| safe_text = ( | ||
| text | ||
| .replace("&", "&") | ||
| .replace("<", "<") | ||
| .replace(">", ">") | ||
| .replace('"', """) | ||
| .replace("'", "'") | ||
| ) | ||
| # Build optional <prosody> wrapper | ||
| prosody_attrs = [] | ||
| if prosody_pitch: | ||
| prosody_attrs.append(f"pitch='{prosody_pitch}'") | ||
| if prosody_rate: | ||
| prosody_attrs.append(f"rate='{prosody_rate}'") | ||
| if prosody_attrs: | ||
| inner = f"<prosody {' '.join(prosody_attrs)}>{safe_text}</prosody>" | ||
| else: | ||
| inner = safe_text | ||
| # For multilingual voices, wrap with <lang xml:lang="..."> to specify spoken language | ||
| if lang_tag: | ||
| inner = f"<lang xml:lang='{lang_tag}'>{inner}</lang>" | ||
| return ( | ||
| f"<speak version='1.0' xml:lang='{lang}'>" | ||
| f"<voice name='{voice_name}'>{inner}</voice>" | ||
| f"</speak>" | ||
| ) |
There was a problem hiding this comment.
SSML attribute values are not escaped, risking malformed XML or injection.
The safe_text variable properly escapes XML entities, but prosody_pitch, prosody_rate, lang_tag, and voice_name are embedded directly into single-quoted attributes without escaping. If any of these contain a single quote (e.g., pitch="default'"), the SSML becomes malformed or potentially exploitable.
🐛 Proposed fix to escape attribute values
+def _escape_attr(value: str) -> str:
+ """Escape a value for use in an XML single-quoted attribute."""
+ return (
+ value
+ .replace("&", "&")
+ .replace("<", "<")
+ .replace(">", ">")
+ .replace("'", "'")
+ )
+
def _build_ssml(
text: str,
voice_name: str,
language: str,
prosody_pitch: Optional[str] = None,
prosody_rate: Optional[str] = None,
lang_tag: Optional[str] = None,
) -> str:
"""Build a minimal SSML document for Azure TTS, with optional prosody and lang controls."""
# Derive xml:lang from voice_name locale prefix (e.g. "en-US-JennyNeural" -> "en-US")
lang = language or ("-".join(voice_name.split("-")[:2]) if "-" in voice_name else "en-US")
safe_text = (
text
.replace("&", "&")
.replace("<", "<")
.replace(">", ">")
.replace('"', """)
.replace("'", "'")
)
# Build optional <prosody> wrapper
prosody_attrs = []
if prosody_pitch:
- prosody_attrs.append(f"pitch='{prosody_pitch}'")
+ prosody_attrs.append(f"pitch='{_escape_attr(prosody_pitch)}'")
if prosody_rate:
- prosody_attrs.append(f"rate='{prosody_rate}'")
+ prosody_attrs.append(f"rate='{_escape_attr(prosody_rate)}'")
if prosody_attrs:
inner = f"<prosody {' '.join(prosody_attrs)}>{safe_text}</prosody>"
else:
inner = safe_text
# For multilingual voices, wrap with <lang xml:lang="..."> to specify spoken language
if lang_tag:
- inner = f"<lang xml:lang='{lang_tag}'>{inner}</lang>"
+ inner = f"<lang xml:lang='{_escape_attr(lang_tag)}'>{inner}</lang>"
+ safe_lang = _escape_attr(lang)
+ safe_voice = _escape_attr(voice_name)
return (
- f"<speak version='1.0' xml:lang='{lang}'>"
- f"<voice name='{voice_name}'>{inner}</voice>"
+ f"<speak version='1.0' xml:lang='{safe_lang}'>"
+ f"<voice name='{safe_voice}'>{inner}</voice>"
f"</speak>"
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pipelines/azure.py` around lines 142 - 178, The SSML builder _build_ssml
currently only escapes text content but inserts prosody_pitch, prosody_rate,
lang_tag, voice_name (and derived lang) directly into attribute values, risking
malformed XML; add an XML attribute-escaping step (e.g., a small helper like
_escape_xml_attr that replaces &, <, >, " and ' with &, <, >, ",
') and apply it to voice_name, lang, lang_tag, and each prosody attribute
value before constructing prosody_attrs and the <voice>/<lang> wrappers so all
attribute values are safe.
| if (is_raw_mulaw and target_encoding == "mulaw") or (is_raw_alaw and target_encoding == "alaw"): | ||
| # Already in target format — yield directly | ||
| converted = pcm_payload | ||
| else: | ||
| # PCM16 LE raw bytes — determine source sample rate from format string | ||
| audio_bytes = pcm_payload | ||
| fmt_l = fmt_lower | ||
| if "8khz" in fmt_l: | ||
| source_rate = 8000 | ||
| elif "16khz" in fmt_l: | ||
| source_rate = 16000 | ||
| elif "24khz" in fmt_l: | ||
| source_rate = 24000 | ||
| else: | ||
| source_rate = 8000 | ||
| if source_rate != target_rate: | ||
| audio_bytes, _ = resample_audio(audio_bytes, source_rate, target_rate) | ||
| converted = _to_target_format(audio_bytes, target_encoding) |
There was a problem hiding this comment.
Unsupported encoding conversion silently corrupts audio.
Both streaming and non-streaming paths handle mulaw↔mulaw and alaw↔alaw passthrough, but mismatched combinations (e.g., output_format=raw-8khz-8bit-mono-alaw with target_encoding=mulaw) fall through incorrectly:
- Streaming (line 1030-1044): A-law bytes are treated as PCM16 LE, producing garbage.
- Non-streaming (line 1071-1073): A-law bytes are passed through as-is but interpreted as mulaw downstream.
Since there's no alaw↔mulaw converter, the code should raise an explicit error rather than silently corrupt audio.
🐛 Proposed fix for streaming path
if (is_raw_mulaw and target_encoding == "mulaw") or (is_raw_alaw and target_encoding == "alaw"):
# Already in target format — yield directly
converted = pcm_payload
+ elif is_raw_mulaw or is_raw_alaw:
+ # Source is mulaw/alaw but target doesn't match — unsupported conversion
+ raise RuntimeError(
+ f"Azure TTS: cannot convert raw {('mulaw' if is_raw_mulaw else 'alaw')} "
+ f"to target_encoding='{target_encoding}'. Use matching output_format or target_encoding."
+ )
else:
# PCM16 LE raw bytes — determine source sample rate from format string🐛 Proposed fix for non-streaming path
if native_encoding == "mulaw" and target_encoding == "mulaw":
# Already in mulaw at 8 kHz — yield directly
converted = audio_bytes
elif native_encoding == "alaw" and target_encoding == "alaw":
# Already in A-law at 8 kHz — yield directly
converted = audio_bytes
+ elif native_encoding in ("mulaw", "alaw"):
+ # Source is mulaw/alaw but target doesn't match — unsupported conversion
+ raise RuntimeError(
+ f"Azure TTS: cannot convert {native_encoding} to target_encoding='{target_encoding}'. "
+ "Use matching output_format or target_encoding."
+ )
elif native_encoding in ("pcm16", "pcm"):
# Resample if needed, then convert to target encoding
if source_rate != target_rate:
audio_bytes, _ = resample_audio(audio_bytes, source_rate, target_rate)
converted = _to_target_format(audio_bytes, target_encoding)
else:
# Unknown encoding — pass through as-is
converted = audio_bytesAlso applies to: 1060-1073
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pipelines/azure.py` around lines 1027 - 1044, The code currently assumes
A-law or mu-law bytes can be treated as PCM16 LE and converted or passed
through, which silently corrupts audio for unsupported alaw↔mulaw conversions;
update both the streaming branch (where is_raw_mulaw / is_raw_alaw and
target_encoding are checked before using pcm_payload, fmt_lower, resample_audio,
_to_target_format) and the non-streaming branch (where raw alaw/mulaw is passed
through) to explicitly detect mismatched encodings (e.g., is_raw_alaw and
target_encoding == "mulaw" or is_raw_mulaw and target_encoding == "alaw") and
raise a clear exception (or return an error) instead of attempting PCM
resampling/formatting, ensuring only identical passthroughs or supported
conversions reach resample_audio/_to_target_format.
AzureSTTRealtimeAdapter (one-shot STT REST), and AzureTTSAdapter
in src/pipelines/azure.py
orchestrator; azure_stt alias routes by variant field
region, variant (realtime/fast), language, voice name, output format,
target encoding/sample rate, chunk size, timeouts, custom endpoints
overrides for STT; voice name/output format overrides for TTS)
Summary by CodeRabbit
New Features
Bug Fixes