-
Notifications
You must be signed in to change notification settings - Fork 473
New ai proxy #2397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New ai proxy #2397
Conversation
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis PR introduces a comprehensive AI service infrastructure, replacing the legacy STT server with a new unified AI application. It adds a new LLM proxy crate for OpenRouter integrations, refactors transcribe proxy with analytics capabilities, implements CI/CD pipelines for automated deployment to Fly.io, and integrates Sentry for error tracking and monitoring. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AIApp as AI App
participant LLMProxy as LLM Proxy
participant OpenRouter
participant Analytics
Client->>AIApp: POST /llm/completions
AIApp->>AIApp: authenticate (JWKS or insecure decode)
AIApp->>LLMProxy: forward completion request
LLMProxy->>LLMProxy: resolve model & provider
LLMProxy->>OpenRouter: stream POST /chat/completions
loop streaming chunks
OpenRouter-->>LLMProxy: delta event
LLMProxy->>LLMProxy: extract metadata (tokens, model, id)
LLMProxy-->>Client: SSE frame
end
LLMProxy->>OpenRouter: GET /generations/{id} (fetch cost)
OpenRouter-->>LLMProxy: total_cost
LLMProxy->>Analytics: report_generation_event
Analytics-->>LLMProxy: ack
LLMProxy-->>Client: stream end
sequenceDiagram
participant Client
participant TranscribeProxy as Transcribe Proxy
participant Provider as STT Provider
participant Analytics
Client->>TranscribeProxy: WebSocket /ws
TranscribeProxy->>TranscribeProxy: authenticate
TranscribeProxy->>TranscribeProxy: select provider from config
TranscribeProxy->>Provider: init upstream connection
Provider-->>TranscribeProxy: connection established
Note over TranscribeProxy: start_time = now()
loop bidirectional streaming
Client->>TranscribeProxy: audio frames
TranscribeProxy->>Provider: forward frames
Provider-->>TranscribeProxy: transcription results
TranscribeProxy-->>Client: results
end
Client->>TranscribeProxy: close connection
TranscribeProxy->>TranscribeProxy: elapsed = start_time.elapsed()
TranscribeProxy->>Analytics: report_stt(provider, elapsed)
Analytics-->>TranscribeProxy: ack
TranscribeProxy-->>Client: close frame
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 2 inconclusive)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🧹 Nitpick comments (13)
crates/llm-proxy/src/error.rs (1)
5-6: Consider truncating or sanitizing the error body.The
Upstreamerror variant includes the full response body. For large responses or responses containing sensitive data (API keys, PII), this could lead to memory issues or logging sensitive information.🔎 Consider this approach:
#[error("upstream error: {status} - {body}")] -Upstream { status: u16, body: String }, +Upstream { + status: u16, + body: String, // Consider truncating to first 500 chars or redacting sensitive fields +},Or add a helper method to sanitize the body before creating the error.
.github/workflows/ai_ci.yaml (1)
23-23: Consider adding cargo test to verify functionality.The CI workflow only runs
cargo check, which verifies compilation but doesn't run tests. Consider adding a test step to catch functional regressions.- run: cargo check -p ai +- run: cargo test -p aiThis ensures both compilation and test coverage as part of CI.
apps/ai/fly.toml (1)
30-36: Consider using a dedicated health check endpoint.Using
/llm/completionsfor health checks may be problematic. LLM completion endpoints can be slow or resource-intensive, and the health check may timeout (4s) under load. A lightweight/healthor/healthzendpoint that validates service readiness without invoking the LLM would be more reliable.crates/transcribe-proxy/src/service.rs (2)
539-547: Consider usingVecDequefor O(1) front removal.
Vec::remove(0)is O(n) as it shifts all remaining elements. With the 5 MiB backpressure limit, the impact is bounded, but switching toVecDequewithpop_front()would provide O(1) removal for the queue-like access pattern used here.🔎 Suggested change
-use std::collections::HashMap; +use std::collections::{HashMap, VecDeque};And in
PendingStateand initialization:struct PendingState { - control_messages: Arc<Mutex<Vec<QueuedPayload>>>, - data_messages: Arc<Mutex<Vec<QueuedPayload>>>, + control_messages: Arc<Mutex<VecDeque<QueuedPayload>>>, + data_messages: Arc<Mutex<VecDeque<QueuedPayload>>>, bytes: Arc<Mutex<usize>>, }Then replace
control.remove(0)/data_queue.remove(0)withcontrol.pop_front().unwrap()/data_queue.pop_front().unwrap().
484-570: Optional: Consider extracting common queue-and-send logic.The
Message::TextandMessage::Binarybranches share nearly identical logic for backpressure checking, queueing, draining, and sending. A helper function could reduce duplication and maintenance burden.Also applies to: 571-646
apps/ai/src/env.rs (1)
49-51: Consider returning a reference instead of cloning.The
api_keys()method clones the entireHashMapon each call. If this is only called during initialization (as appears to be the case inmain.rs), this is acceptable. However, if called frequently, consider returning&HashMap<Provider, String>instead.🔎 Optional refactor to avoid clone:
- pub fn api_keys(&self) -> HashMap<Provider, String> { - self.api_keys.clone() + pub fn api_keys(&self) -> &HashMap<Provider, String> { + &self.api_keys }apps/ai/src/main.rs (1)
71-76: Consider adding SIGTERM handling for container environments.Container orchestrators like Fly.io typically send SIGTERM for graceful shutdown. Currently only CTRL+C (SIGINT) is handled. This may cause abrupt termination in production.
🔎 Apply this diff to handle both signals:
async fn shutdown_signal() { - tokio::signal::ctrl_c() - .await - .expect("failed to install CTRL+C signal handler"); + let ctrl_c = async { + tokio::signal::ctrl_c() + .await + .expect("failed to install CTRL+C signal handler"); + }; + + #[cfg(unix)] + let terminate = async { + tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate()) + .expect("failed to install SIGTERM signal handler") + .recv() + .await; + }; + + #[cfg(not(unix))] + let terminate = std::future::pending::<()>(); + + tokio::select! { + _ = ctrl_c => {}, + _ = terminate => {}, + } tracing::info!("shutting down"); }.github/workflows/ai_cd.yaml (2)
1-3: Add workflow name for clarity.The workflow is missing a
namefield at the top level, which helps identify it in the GitHub Actions UI.🔎 Apply this diff:
+name: AI CD + on: workflow_dispatch:
10-14: Redundant git fetch command.Line 14 (
git fetch --tags --force) appears redundant since the checkout action already specifiesfetch-tags: trueon line 13.🔎 Consider removing the redundant fetch:
- uses: actions/checkout@v4 with: fetch-depth: 0 fetch-tags: true - - run: git fetch --tags --forceapps/ai/src/auth.rs (2)
36-39: Consider deriving Clone for JwksState.This would simplify the cloning logic in lines 60-63 and 87-90.
🔎 Apply this diff:
+#[derive(Clone)] enum JwksState { Available(JwkSet), Empty, }Then simplify the match expressions:
- return Ok(match &cached.state { - JwksState::Available(jwks) => JwksState::Available(jwks.clone()), - JwksState::Empty => JwksState::Empty, - }); + return Ok(cached.state.clone());
168-172: Document the security implications of this function.While the function name includes "insecure", adding a doc comment would make the security implications clearer for future maintainers. Based on coding guidelines, comments about 'Why' are acceptable.
🔎 Suggested documentation:
+// WARNING: Skips signature verification. Only use in local development +// when JWKS endpoint is unavailable. fn decode_claims_insecure(token: &str) -> Result<Claims, ()> { jsonwebtoken::dangerous::insecure_decode::<Claims>(token) .map(|data| data.claims) .map_err(|_| ()) }crates/llm-proxy/src/router.rs (1)
349-355: Consider handlingResponse::builder()errors.The
.unwrap()here (and at line 412) could panic if the builder fails. While unlikely with these static headers, returning an error response would be safer.🔎 Suggested fix
- Response::builder() + match Response::builder() .status(status) .header("Content-Type", "text/event-stream") .header("Cache-Control", "no-cache") .body(body) - .unwrap() + { + Ok(resp) => resp, + Err(e) => { + tracing::error!(error = %e, "failed to build response"); + (StatusCode::INTERNAL_SERVER_ERROR, "Internal error").into_response() + } + }crates/llm-proxy/src/posthog.rs (1)
76-79: Consider checking response status for complete error visibility.
send().awaitsucceeds even if PostHog returns an error status (4xx/5xx). For better observability, you could check the response:🔎 Suggested fix
let url = format!("{}/batch", config.host); - if let Err(e) = client.post(&url).json(&request).send().await { - tracing::warn!(error = %e, "failed to send posthog event"); - } + match client.post(&url).json(&request).send().await { + Ok(resp) if !resp.status().is_success() => { + tracing::warn!(status = %resp.status(), "posthog returned error status"); + } + Err(e) => { + tracing::warn!(error = %e, "failed to send posthog event"); + } + _ => {} + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (22)
.github/actions/sentry_cli/action.yaml(1 hunks).github/workflows/ai_cd.yaml(1 hunks).github/workflows/ai_ci.yaml(1 hunks)Cargo.toml(2 hunks)apps/ai/Cargo.toml(2 hunks)apps/ai/Dockerfile(1 hunks)apps/ai/fly.toml(1 hunks)apps/ai/src/auth.rs(5 hunks)apps/ai/src/env.rs(2 hunks)apps/ai/src/main.rs(1 hunks)apps/stt/src/main.rs(0 hunks)crates/llm-proxy/Cargo.toml(1 hunks)crates/llm-proxy/src/error.rs(1 hunks)crates/llm-proxy/src/lib.rs(1 hunks)crates/llm-proxy/src/posthog.rs(1 hunks)crates/llm-proxy/src/router.rs(1 hunks)crates/transcribe-proxy/Cargo.toml(1 hunks)crates/transcribe-proxy/src/config.rs(1 hunks)crates/transcribe-proxy/src/lib.rs(1 hunks)crates/transcribe-proxy/src/router.rs(4 hunks)crates/transcribe-proxy/src/service.rs(6 hunks)doxxer.ai.toml(1 hunks)
💤 Files with no reviewable changes (1)
- apps/stt/src/main.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*
📄 CodeRabbit inference engine (AGENTS.md)
Format using
dprint fmtfrom the root. Do not usecargo fmt.
Files:
crates/llm-proxy/src/lib.rscrates/llm-proxy/src/error.rsapps/ai/fly.tomlcrates/transcribe-proxy/src/config.rscrates/llm-proxy/Cargo.tomlapps/ai/src/env.rsapps/ai/Cargo.tomlcrates/transcribe-proxy/src/service.rsapps/ai/src/main.rscrates/llm-proxy/src/router.rscrates/transcribe-proxy/src/lib.rsapps/ai/DockerfileCargo.tomlapps/ai/src/auth.rsdoxxer.ai.tomlcrates/transcribe-proxy/Cargo.tomlcrates/llm-proxy/src/posthog.rscrates/transcribe-proxy/src/router.rs
**/*.{ts,tsx,rs,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
By default, avoid writing comments at all. If you write one, it should be about 'Why', not 'What'.
Files:
crates/llm-proxy/src/lib.rscrates/llm-proxy/src/error.rscrates/transcribe-proxy/src/config.rsapps/ai/src/env.rscrates/transcribe-proxy/src/service.rsapps/ai/src/main.rscrates/llm-proxy/src/router.rscrates/transcribe-proxy/src/lib.rsapps/ai/src/auth.rscrates/llm-proxy/src/posthog.rscrates/transcribe-proxy/src/router.rs
🧬 Code graph analysis (7)
crates/llm-proxy/src/lib.rs (3)
apps/api/src/integration/posthog.ts (1)
posthog(5-8)crates/llm-proxy/src/router.rs (1)
router(77-86)crates/transcribe-proxy/src/router.rs (1)
router(24-33)
crates/transcribe-proxy/src/config.rs (4)
crates/llm-proxy/src/router.rs (1)
new(33-48)crates/transcribe-proxy/src/service.rs (1)
new(313-325)apps/ai/src/env.rs (1)
api_keys(49-51)crates/transcribe-proxy/src/router.rs (1)
s(42-42)
apps/ai/src/main.rs (4)
apps/ai/src/env.rs (1)
env(17-22)crates/llm-proxy/src/router.rs (2)
new(33-48)router(77-86)crates/transcribe-proxy/src/config.rs (1)
new(16-22)crates/transcribe-proxy/src/router.rs (2)
router(24-33)s(42-42)
crates/llm-proxy/src/router.rs (1)
crates/llm-proxy/src/posthog.rs (2)
capture_ai_generation(60-80)fetch_generation_metadata(97-124)
crates/transcribe-proxy/src/lib.rs (2)
crates/llm-proxy/src/router.rs (1)
router(77-86)crates/transcribe-proxy/src/router.rs (1)
router(24-33)
crates/llm-proxy/src/posthog.rs (1)
crates/llm-proxy/src/router.rs (1)
new(33-48)
crates/transcribe-proxy/src/router.rs (2)
crates/transcribe-proxy/src/config.rs (1)
new(16-22)crates/transcribe-proxy/src/service.rs (3)
new(313-325)upstream_url(74-77)builder(187-189)
🪛 actionlint (1.7.9)
.github/workflows/ai_ci.yaml
22-22: description is required in metadata of "" action at "/home/jailuser/git/.github/actions/rust_install/action.yaml"
(action)
22-22: name is required in action metadata "/home/jailuser/git/.github/actions/rust_install/action.yaml"
(action)
.github/workflows/ai_cd.yaml
15-15: description is required in metadata of "" action at "/home/jailuser/git/.github/actions/doxxer_install/action.yaml"
(action)
15-15: name is required in action metadata "/home/jailuser/git/.github/actions/doxxer_install/action.yaml"
(action)
24-24: label "depot-ubuntu-24.04-8" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
28-28: description is required in metadata of "" action at "/home/jailuser/git/.github/actions/rust_install/action.yaml"
(action)
28-28: name is required in action metadata "/home/jailuser/git/.github/actions/rust_install/action.yaml"
(action)
58-58: description is required in metadata of "" action at "/home/jailuser/git/.github/actions/sentry_cli/action.yaml"
(action)
58-58: name is required in action metadata "/home/jailuser/git/.github/actions/sentry_cli/action.yaml"
(action)
67-67: label "depot-ubuntu-24.04-8" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Redirect rules - hyprnote-storybook
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote-storybook
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote-storybook
- GitHub Check: Pages changed - hyprnote
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
- GitHub Check: ci
- GitHub Check: fmt
🔇 Additional comments (23)
crates/llm-proxy/src/lib.rs (1)
1-7: LGTM! Clean module structure.The module organization and public re-exports follow Rust conventions. The wildcard re-exports provide a convenient API surface for consumers of this crate.
crates/transcribe-proxy/src/lib.rs (1)
1-9: LGTM! Improved public API surface.The refactoring from wildcard exports (
pub use service::*) to explicit exports improves API clarity and reduces the risk of unintentionally exposing internal implementation details. This follows Rust best practices for library crates.doxxer.ai.toml (1)
6-13: No changes needed—the version increment configuration is valid.All version levels (patch, minor, major) set to
increment = 1is correct and standard for semantic versioning. The doxxer tool applies the specified increment to each version component when calculating the next version; this configuration is intentional and produces the expected SemVer behavior.crates/llm-proxy/Cargo.toml (1)
4-4: Verify Rust edition 2024 compatibility.The crate uses
edition = "2024", which was introduced in Rust 1.85 (December 2024). Ensure your CI/CD environment and all developers have Rust 1.85+ installed.Run this script to check the Rust version in your CI environment:
#!/bin/bash # Check Rust version supports edition 2024 (requires 1.85+) rustc --version cargo --version # Verify the build succeeds with edition 2024 cargo check -p llm-proxyCargo.toml (1)
13-13: LGTM!The workspace configuration changes properly add the new
apps/aimember andhypr-llm-proxydependency, aligning with the PR's objective to introduce the new AI proxy component.Also applies to: 51-51
apps/ai/fly.toml (1)
1-42: Deployment configuration looks well-structured.Blue-green deployment strategy, auto-scaling settings, and HTTPS enforcement are appropriate for a production service. Consider monitoring resource utilization after deployment to validate the 1GB memory allocation is sufficient for LLM proxy workloads.
crates/transcribe-proxy/Cargo.toml (1)
6-21: LGTM!The dependency additions appropriately support the new stateful router architecture with HTTP client capabilities (
reqwest), serialization (serde,serde_json), and provider integration (owhisper-providers).apps/ai/Cargo.toml (1)
1-25: LGTM!The package configuration properly sets up the new
aiapplication with appropriate dependencies for LLM proxy (hypr-llm-proxy), transcription proxy (hypr-transcribe-proxy), and observability (sentrywith tower integration).crates/transcribe-proxy/src/service.rs (1)
36-41: Good refactoring of pending state management.Consolidating the three separate
Arc<Mutex<...>>fields intoPendingStateimproves cohesion and makes it clearer that these fields are conceptually related. TheClonederive is correct sinceArc<Mutex<T>>implementsClone.crates/transcribe-proxy/src/router.rs (4)
18-33: LGTM! Clean stateful router setup.The
AppStatepattern properly encapsulates configuration and a shared HTTP client. Creatingreqwest::Client::new()once and sharing via state is correct—the client uses connection pooling internally and is designed to be cloned cheaply.
35-63: LGTM!The handler properly validates API key configuration upfront and provides appropriate error responses. The provider resolution from query params with fallback to default is well-implemented.
111-123: Hardcoded Gladia configuration may limit flexibility.
encoding,bit_depth, and other config values are hardcoded. If clients send audio in different formats, this could cause transcription issues. Consider deriving these from query params or documenting the expected client audio format.Is the hardcoded
wav/pcmencoding with 16-bit depth intentional? If so, clients must ensure their audio matches this format.
151-177: LGTM!The
build_proxyfunction correctly applies provider-specific authentication mechanisms and propagates configuration (connect_timeout). The closure inFirstMessageauth properly capturesapi_keyby value for the move semantics required.apps/ai/src/env.rs (1)
9-12: LGTM - New environment fields properly added.The new
sentry_environmentandopenrouter_api_keyfields are correctly declared and follow the existing pattern..github/workflows/ai_cd.yaml (1)
24-25: Custom runner labels are intentional.The static analysis warnings about
depot-ubuntu-24.04-8being an unknown label can be ignored - this appears to be a custom/self-hosted runner (Depot). Consider adding anactionlint.yamlconfig file to suppress these warnings.Also applies to: 67-68
crates/transcribe-proxy/src/config.rs (2)
8-13: LGTM - Clean config struct with reasonable defaults.The struct design with builder pattern is well-structured. The 5-second default connect timeout is reasonable for external service calls.
34-36: LGTM - Efficient key lookup.
api_key_forcorrectly returnsOption<&str>avoiding unnecessary clones.crates/llm-proxy/src/router.rs (2)
37-45: Verify default model identifiers are valid.Some default models like
"openai/gpt-oss-120b:exacto"and"openai/gpt-5.1-chat"don't appear to match standard OpenRouter model identifiers. Please verify these are correct or intended as placeholders to be overridden.
192-243: LGTM on handler setup and error handling.Good use of
tokio::time::timeoutto bound the upstream request, and appropriate error responses (502 for upstream failures, 504 for timeout).crates/llm-proxy/src/posthog.rs (4)
65-69: Verify distinct_id usage aligns with PostHog analytics goals.Using
trace_id(the generation ID) asdistinct_idmeans each AI generation appears as a unique "user" in PostHog. If you want to correlate generations by actual user or session, consider passing a user/session identifier instead.
4-22: LGTM on PostHogConfig.Clean builder pattern with sensible default host.
24-45: LGTM on AiGenerationProperties.Well-structured with proper serde field renames matching PostHog's expected schema.
97-124: LGTM on fetch_generation_metadata overall structure.Good error handling with early returns and appropriate warning logs. (See earlier comment about URL encoding.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
crates/llm-proxy/src/router.rs (5)
85-89: ChatMessage struct is missing fields required for tool calling.As noted in previous reviews, the struct only captures
roleandcontent, but OpenRouter tool-calling flows require additional fields liketool_calls,tool_call_id, andname. This will drop those fields during deserialization.
163-174: Serialize implementation drops tool-calling fields.This impl hard-codes serialization to only
roleandcontent. If you add the missing fields toChatMessageas suggested in the previous review, update this impl to include them conditionally or deriveSerializewith#[serde(skip_serializing_if = "Option::is_none")].
262-306: SSE parsing may miss metadata if events span chunk boundaries.As noted in previous reviews, each chunk is parsed independently. If an SSE event is split across two network chunks, metadata extraction will fail. Since this only affects telemetry and raw chunks are still forwarded correctly, this is a minor reliability concern.
323-345: PostHog telemetry calls have no timeout protection.As noted in previous reviews, the
fetch_generation_metadataand analytics event calls inside the spawned task run without a timeout. If OpenRouter or PostHog is slow/unresponsive, these tasks will hang indefinitely.
385-406: PostHog telemetry calls have no timeout protection.Same issue as the streaming path: telemetry calls in the spawned task lack timeout protection.
🧹 Nitpick comments (3)
crates/transcribe-proxy/src/router.rs (1)
177-192: Consider using a stable string representation for analytics provider name.Using
format!("{:?}", provider).to_lowercase()relies onDebugoutput, which may change if the enum's derive or variants are modified. For analytics consistency, consider adding a method likeprovider.name()or usingDisplaytrait if available.🔎 Suggested approach
- let provider_name = format!("{:?}", provider).to_lowercase(); + let provider_name = provider.to_string(); // if Display is implemented + // or: let provider_name = provider.name(); // if a dedicated method existscrates/transcribe-proxy/src/service.rs (2)
586-594: Consider usingVecDequefor O(1) front removal.
Vec::remove(0)is O(n) as it shifts all remaining elements. Since messages are always removed from the front,VecDequewould provide O(1)pop_front()operations. This could matter under high message throughput.🔎 Suggested change in PendingState
+use std::collections::VecDeque; #[derive(Clone)] struct PendingState { - control_messages: Arc<Mutex<Vec<QueuedPayload>>>, - data_messages: Arc<Mutex<Vec<QueuedPayload>>>, + control_messages: Arc<Mutex<VecDeque<QueuedPayload>>>, + data_messages: Arc<Mutex<VecDeque<QueuedPayload>>>, bytes: Arc<Mutex<usize>>, }Then replace
remove(0)withpop_front().unwrap():- let queued = if !control.is_empty() { - control.remove(0) - } else { - data_queue.remove(0) - }; + let queued = if !control.is_empty() { + control.pop_front().unwrap() + } else { + data_queue.pop_front().unwrap() + };
531-694: Consider extracting common message handling logic.The
Message::Text(lines 531-617) andMessage::Binary(lines 618-694) branches share nearly identical logic for queuing, backpressure checking, and sending. Extracting a helper method could reduce duplication and maintenance burden.🔎 Possible approach
async fn handle_payload( data: Vec<u8>, is_text: bool, control_matcher: &Option<ControlMessageMatcher>, pending_control_messages: &Mutex<Vec<QueuedPayload>>, pending_data_messages: &Mutex<Vec<QueuedPayload>>, pending_bytes: &Mutex<usize>, upstream_sender: &mut UpstreamSender, shutdown_tx: &tokio::sync::broadcast::Sender<(u16, String)>, ) -> Result<(), ()> { // Unified queuing and sending logic }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
crates/llm-proxy/Cargo.toml(1 hunks)crates/llm-proxy/src/lib.rs(1 hunks)crates/llm-proxy/src/router.rs(1 hunks)crates/transcribe-proxy/Cargo.toml(1 hunks)crates/transcribe-proxy/src/config.rs(1 hunks)crates/transcribe-proxy/src/router.rs(4 hunks)crates/transcribe-proxy/src/service.rs(24 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/transcribe-proxy/src/config.rs
- crates/llm-proxy/Cargo.toml
🧰 Additional context used
📓 Path-based instructions (2)
**/*
📄 CodeRabbit inference engine (AGENTS.md)
Format using
dprint fmtfrom the root. Do not usecargo fmt.
Files:
crates/transcribe-proxy/src/router.rscrates/llm-proxy/src/lib.rscrates/transcribe-proxy/Cargo.tomlcrates/llm-proxy/src/router.rscrates/transcribe-proxy/src/service.rs
**/*.{ts,tsx,rs,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
By default, avoid writing comments at all. If you write one, it should be about 'Why', not 'What'.
Files:
crates/transcribe-proxy/src/router.rscrates/llm-proxy/src/lib.rscrates/llm-proxy/src/router.rscrates/transcribe-proxy/src/service.rs
🧬 Code graph analysis (2)
crates/llm-proxy/src/lib.rs (1)
crates/llm-proxy/src/router.rs (1)
router(74-83)
crates/llm-proxy/src/router.rs (2)
crates/transcribe-proxy/src/config.rs (2)
new(18-25)with_analytics(37-40)crates/transcribe-proxy/src/router.rs (2)
router(25-34)s(43-43)
🔇 Additional comments (13)
crates/transcribe-proxy/Cargo.toml (1)
7-23: LGTM! Dependencies are well-organized and appropriate.The added dependencies align with the PR objectives:
hypr-analyticsandowhisper-providerssupport the new analytics and OpenWhisper/OpenRouter integrations- Utility dependencies (
reqwest,uuid,serde,serde_json,url,bytes,thiserror) provide necessary HTTP, serialization, and error handling capabilities- All dependencies use workspace-managed versions, which maintains consistency across the monorepo
- Feature selections (
json,v4,derive) are appropriate and minimalcrates/transcribe-proxy/src/router.rs (5)
19-34: LGTM!The
AppStatestruct cleanly encapsulates configuration and HTTP client, and the router setup is idiomatic for axum. Sincereqwest::Clientuses internalArc, cloningAppStateis efficient.
36-64: LGTM!The handler correctly validates provider configuration, returns appropriate HTTP status codes for different failure modes (500 for missing API key, 502 for upstream resolution failures), and includes useful logging context.
66-89: LGTM!The function handles different authentication modes cleanly. The
unwrap()on line 77 is acceptable sinceprovider.default_ws_url()should return a well-formed URL controlled by the crate.
91-150: LGTM!The session initialization logic is well-structured with proper error handling. The Gladia-specific configuration is appropriate given this is the current use case for
Auth::SessionInit.
197-222: LGTM!The data structures are clean and appropriately defined for the Gladia API interaction.
crates/transcribe-proxy/src/service.rs (6)
29-43: LGTM!The
OnCloseCallbacktype alias andPendingStatestruct are well-designed. Consolidating the three pending queue fields into a single struct improves code organization and makes it easier to pass around related state.
127-133: LGTM!The
on_closebuilder method follows the established pattern and properly wraps the callback inArcfor thread-safe sharing.
183-189: LGTM!Consistent implementation of
on_closein the request-based builder variant.
256-261: LGTM!The
on_closecallback is correctly propagated through thepreconnect()path toPreconnectedProxy.
371-437: LGTM!The timing logic correctly captures
start_timebefore connection setup and invokeson_closewith the elapsed duration after both client-to-upstream and upstream-to-client tasks complete.
446-481: LGTM!The
run_with_upstreammethod maintains the same timing and callback pattern asrun, ensuring consistent behavior for preconnected proxies.crates/llm-proxy/src/lib.rs (1)
1-5: LGTM!Clean and idiomatic library structure. The module declarations and re-exports properly expose the public API surface for the LLM proxy.
No description provided.