New ai proxy #2397

yujonglee · 2025-12-18T08:34:27Z

No description provided.

netlify · 2025-12-18T08:34:33Z

✅ Deploy Preview for hyprnote ready!

Name	Link
🔨 Latest commit	`61e3198`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote/deploys/6943cb2086d89b000885aeb3
😎 Deploy Preview	https://deploy-preview-2397--hyprnote.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2025-12-18T08:34:33Z

✅ Deploy Preview for hyprnote-storybook ready!

Name	Link
🔨 Latest commit	`61e3198`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote-storybook/deploys/6943cb20e7a88d00081c1323
😎 Deploy Preview	https://deploy-preview-2397--hyprnote-storybook.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2025-12-18T08:34:54Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive AI service infrastructure, replacing the legacy STT server with a new unified AI application. It adds a new LLM proxy crate for OpenRouter integrations, refactors transcribe proxy with analytics capabilities, implements CI/CD pipelines for automated deployment to Fly.io, and integrates Sentry for error tracking and monitoring.

Changes

Cohort / File(s)	Summary
GitHub Actions & Deployment `.github/actions/sentry_cli/action.yaml`, `.github/workflows/ai_ci.yaml`, `.github/workflows/ai_cd.yaml`	Adds Sentry CLI composite action and two workflows: CI for cargo check on pushes/PRs to apps/ai, and CD for versioning, building, uploading debug symbols to Sentry, and deploying to Fly.io with tag creation.
Workspace Configuration `Cargo.toml`, `doxxer.ai.toml`	Updates workspace members (adds apps/ai, removes apps/stt), adds hypr-llm-proxy as workspace dependency, and introduces doxxer configuration for AI versioning with ai_v tag format.
AI App Setup `apps/ai/Cargo.toml`, `apps/ai/Dockerfile`, `apps/ai/fly.toml`	Defines new AI package (renamed from stt-server), containerization with Debian base, and Fly.io deployment config with HTTP service, health checks, and VM settings.
AI App Implementation `apps/ai/src/main.rs`, `apps/ai/src/auth.rs`, `apps/ai/src/env.rs`	Implements Axum HTTP server with graceful shutdown, Sentry integration, and tracing. Refactors auth with JwksState enum for caching JWKS and fallback claim decoding. Adds sentry_environment and openrouter_api_key fields; replaces api_key_for with api_keys accessor.
STT App Removal `apps/stt/src/main.rs`	Removes entire STT server binary implementation.
LLM Proxy Crate `crates/llm-proxy/Cargo.toml`, `crates/llm-proxy/src/lib.rs`, `crates/llm-proxy/src/error.rs`, `crates/llm-proxy/src/analytics.rs`, `crates/llm-proxy/src/config.rs`, `crates/llm-proxy/src/handler.rs`, `crates/llm-proxy/src/types.rs`	New crate for OpenRouter LLM integration with configuration, streaming completions handler, analytics reporting, error types, and request/response data models. Exports AnalyticsReporter, GenerationEvent, and router.
Transcribe Proxy Updates `crates/transcribe-proxy/Cargo.toml`, `crates/transcribe-proxy/src/config.rs`, `crates/transcribe-proxy/src/lib.rs`, `crates/transcribe-proxy/src/analytics.rs`	Adds dependencies (hypr-analytics, owhisper-providers, reqwest, uuid) and replaces AnalyticsClient with Arc\<dyn SttAnalyticsReporter\> in config. New analytics module with SttEvent and SttAnalyticsReporter trait. Updates public exports and adds integration tests.
Transcribe Proxy Router & Service `crates/transcribe-proxy/src/router.rs`, `crates/transcribe-proxy/src/service.rs`	Refactors router to use AppState pattern with config and HTTP client; introduces PendingState struct and OnCloseCallback for timing-based connection close notifications with analytics integration.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AIApp as AI App
    participant LLMProxy as LLM Proxy
    participant OpenRouter
    participant Analytics
    
    Client->>AIApp: POST /llm/completions
    AIApp->>AIApp: authenticate (JWKS or insecure decode)
    AIApp->>LLMProxy: forward completion request
    
    LLMProxy->>LLMProxy: resolve model & provider
    LLMProxy->>OpenRouter: stream POST /chat/completions
    
    loop streaming chunks
        OpenRouter-->>LLMProxy: delta event
        LLMProxy->>LLMProxy: extract metadata (tokens, model, id)
        LLMProxy-->>Client: SSE frame
    end
    
    LLMProxy->>OpenRouter: GET /generations/{id} (fetch cost)
    OpenRouter-->>LLMProxy: total_cost
    LLMProxy->>Analytics: report_generation_event
    Analytics-->>LLMProxy: ack
    LLMProxy-->>Client: stream end

sequenceDiagram
    participant Client
    participant TranscribeProxy as Transcribe Proxy
    participant Provider as STT Provider
    participant Analytics
    
    Client->>TranscribeProxy: WebSocket /ws
    TranscribeProxy->>TranscribeProxy: authenticate
    TranscribeProxy->>TranscribeProxy: select provider from config
    TranscribeProxy->>Provider: init upstream connection
    Provider-->>TranscribeProxy: connection established
    
    Note over TranscribeProxy: start_time = now()
    
    loop bidirectional streaming
        Client->>TranscribeProxy: audio frames
        TranscribeProxy->>Provider: forward frames
        Provider-->>TranscribeProxy: transcription results
        TranscribeProxy-->>Client: results
    end
    
    Client->>TranscribeProxy: close connection
    TranscribeProxy->>TranscribeProxy: elapsed = start_time.elapsed()
    TranscribeProxy->>Analytics: report_stt(provider, elapsed)
    Analytics-->>TranscribeProxy: ack
    TranscribeProxy-->>Client: close frame

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

High-density new modules: LLM proxy's handler.rs implements streaming completions with dynamic model selection, timeout handling, metadata extraction, and async analytics integration—requires careful review of error paths and timeout edge cases.
Structural refactoring in transcribe-proxy: service.rs introduces PendingState grouping, OnCloseCallback propagation across multiple constructors and run paths, and timing-based connection notifications; requires verification of callback invocation in all code paths and proper state cleanup.
Authentication state machine changes: auth.rs adds JwksState enum with divergent empty vs. available branches (debug vs. release builds); insecure_decode path needs validation for security implications.
Cross-crate API surface expansion: Multiple new public types (AnalyticsReporter, GenerationEvent, SttAnalyticsReporter, SttEvent, LlmProxyConfig) and trait implementations; consistency and composability should be verified.
Heterogeneous changes: Significant edits across 25+ files with varied purposes (workflows, configs, new crate, app refactoring, service refactoring) demand separate reasoning for each cohort.

Possibly related PRs

Add owhisper-providers and share it in client adapters and proxy server #2396: Both PRs add and integrate the owhisper-providers provider abstraction into workspace and application crates, updating config and adapter code to use Provider APIs.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 2 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'New ai proxy' is vague and generic, using non-descriptive language that doesn't clearly convey what the changeset accomplishes beyond a general reference to adding an AI proxy.	Provide a more specific title that describes the main architectural change, such as 'Refactor stt-server to ai app with LLM and transcribe proxies' or 'Extract proxy services into separate crates and create new ai app'.
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to assess whether it relates to the changeset.	Add a description explaining the motivation for restructuring stt to ai, the new llm-proxy and transcribe-proxy crates, deployment changes, and any breaking changes for reviewers.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23d6700 and 61e3198.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (11)

apps/ai/src/auth.rs (5 hunks)
crates/llm-proxy/Cargo.toml (1 hunks)
crates/llm-proxy/src/analytics.rs (1 hunks)
crates/llm-proxy/src/config.rs (1 hunks)
crates/llm-proxy/src/handler.rs (1 hunks)
crates/llm-proxy/src/lib.rs (1 hunks)
crates/llm-proxy/src/types.rs (1 hunks)
crates/transcribe-proxy/src/analytics.rs (1 hunks)
crates/transcribe-proxy/src/config.rs (1 hunks)
crates/transcribe-proxy/src/lib.rs (1 hunks)
crates/transcribe-proxy/src/router.rs (4 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

🧹 Nitpick comments (13)

crates/llm-proxy/src/error.rs (1)
5-6: Consider truncating or sanitizing the error body.

The Upstream error variant includes the full response body. For large responses or responses containing sensitive data (API keys, PII), this could lead to memory issues or logging sensitive information.
🔎 Consider this approach:
 #[error("upstream error: {status} - {body}")]
-Upstream { status: u16, body: String },
+Upstream { 
+    status: u16, 
+    body: String,  // Consider truncating to first 500 chars or redacting sensitive fields
+},
Or add a helper method to sanitize the body before creating the error.
.github/workflows/ai_ci.yaml (1)
23-23: Consider adding cargo test to verify functionality.

The CI workflow only runs cargo check, which verifies compilation but doesn't run tests. Consider adding a test step to catch functional regressions.
 - run: cargo check -p ai
+- run: cargo test -p ai
This ensures both compilation and test coverage as part of CI.
apps/ai/fly.toml (1)

30-36: Consider using a dedicated health check endpoint.

Using /llm/completions for health checks may be problematic. LLM completion endpoints can be slow or resource-intensive, and the health check may timeout (4s) under load. A lightweight /health or /healthz endpoint that validates service readiness without invoking the LLM would be more reliable.
crates/transcribe-proxy/src/service.rs (2)
539-547: Consider using VecDeque for O(1) front removal.

Vec::remove(0) is O(n) as it shifts all remaining elements. With the 5 MiB backpressure limit, the impact is bounded, but switching to VecDeque with pop_front() would provide O(1) removal for the queue-like access pattern used here.
🔎 Suggested change
-use std::collections::HashMap;
+use std::collections::{HashMap, VecDeque};
And in PendingState and initialization:
 struct PendingState {
-    control_messages: Arc<Mutex<Vec<QueuedPayload>>>,
-    data_messages: Arc<Mutex<Vec<QueuedPayload>>>,
+    control_messages: Arc<Mutex<VecDeque<QueuedPayload>>>,
+    data_messages: Arc<Mutex<VecDeque<QueuedPayload>>>,
     bytes: Arc<Mutex<usize>>,
 }
Then replace control.remove(0) / data_queue.remove(0) with control.pop_front().unwrap() / data_queue.pop_front().unwrap().
484-570: Optional: Consider extracting common queue-and-send logic.

The Message::Text and Message::Binary branches share nearly identical logic for backpressure checking, queueing, draining, and sending. A helper function could reduce duplication and maintenance burden.

Also applies to: 571-646
apps/ai/src/env.rs (1)
49-51: Consider returning a reference instead of cloning.

The api_keys() method clones the entire HashMap on each call. If this is only called during initialization (as appears to be the case in main.rs), this is acceptable. However, if called frequently, consider returning &HashMap<Provider, String> instead.
🔎 Optional refactor to avoid clone:
-    pub fn api_keys(&self) -> HashMap<Provider, String> {
-        self.api_keys.clone()
+    pub fn api_keys(&self) -> &HashMap<Provider, String> {
+        &self.api_keys
     }
apps/ai/src/main.rs (1)
71-76: Consider adding SIGTERM handling for container environments.

Container orchestrators like Fly.io typically send SIGTERM for graceful shutdown. Currently only CTRL+C (SIGINT) is handled. This may cause abrupt termination in production.
🔎 Apply this diff to handle both signals:
 async fn shutdown_signal() {
-    tokio::signal::ctrl_c()
-        .await
-        .expect("failed to install CTRL+C signal handler");
+    let ctrl_c = async {
+        tokio::signal::ctrl_c()
+            .await
+            .expect("failed to install CTRL+C signal handler");
+    };
+
+    #[cfg(unix)]
+    let terminate = async {
+        tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())
+            .expect("failed to install SIGTERM signal handler")
+            .recv()
+            .await;
+    };
+
+    #[cfg(not(unix))]
+    let terminate = std::future::pending::<()>();
+
+    tokio::select! {
+        _ = ctrl_c => {},
+        _ = terminate => {},
+    }
     tracing::info!("shutting down");
 }
.github/workflows/ai_cd.yaml (2)
1-3: Add workflow name for clarity.

The workflow is missing a name field at the top level, which helps identify it in the GitHub Actions UI.
🔎 Apply this diff:
+name: AI CD
+
 on:
   workflow_dispatch:
10-14: Redundant git fetch command.

Line 14 (git fetch --tags --force) appears redundant since the checkout action already specifies fetch-tags: true on line 13.
🔎 Consider removing the redundant fetch:
       - uses: actions/checkout@v4
         with:
           fetch-depth: 0
           fetch-tags: true
-      - run: git fetch --tags --force
apps/ai/src/auth.rs (2)
36-39: Consider deriving Clone for JwksState.

This would simplify the cloning logic in lines 60-63 and 87-90.
🔎 Apply this diff:
+#[derive(Clone)]
 enum JwksState {
     Available(JwkSet),
     Empty,
 }
Then simplify the match expressions:
-                return Ok(match &cached.state {
-                    JwksState::Available(jwks) => JwksState::Available(jwks.clone()),
-                    JwksState::Empty => JwksState::Empty,
-                });
+                return Ok(cached.state.clone());
168-172: Document the security implications of this function.

While the function name includes "insecure", adding a doc comment would make the security implications clearer for future maintainers. Based on coding guidelines, comments about 'Why' are acceptable.
🔎 Suggested documentation:
+// WARNING: Skips signature verification. Only use in local development
+// when JWKS endpoint is unavailable.
 fn decode_claims_insecure(token: &str) -> Result<Claims, ()> {
     jsonwebtoken::dangerous::insecure_decode::<Claims>(token)
         .map(|data| data.claims)
         .map_err(|_| ())
 }
crates/llm-proxy/src/router.rs (1)
349-355: Consider handling Response::builder() errors.

The .unwrap() here (and at line 412) could panic if the builder fails. While unlikely with these static headers, returning an error response would be safer.
🔎 Suggested fix
-        Response::builder()
+        match Response::builder()
             .status(status)
             .header("Content-Type", "text/event-stream")
             .header("Cache-Control", "no-cache")
             .body(body)
-            .unwrap()
+        {
+            Ok(resp) => resp,
+            Err(e) => {
+                tracing::error!(error = %e, "failed to build response");
+                (StatusCode::INTERNAL_SERVER_ERROR, "Internal error").into_response()
+            }
+        }
crates/llm-proxy/src/posthog.rs (1)
76-79: Consider checking response status for complete error visibility.

send().await succeeds even if PostHog returns an error status (4xx/5xx). For better observability, you could check the response:
🔎 Suggested fix
     let url = format!("{}/batch", config.host);
-    if let Err(e) = client.post(&url).json(&request).send().await {
-        tracing::warn!(error = %e, "failed to send posthog event");
-    }
+    match client.post(&url).json(&request).send().await {
+        Ok(resp) if !resp.status().is_success() => {
+            tracing::warn!(status = %resp.status(), "posthog returned error status");
+        }
+        Err(e) => {
+            tracing::warn!(error = %e, "failed to send posthog event");
+        }
+        _ => {}
+    }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fcd12f5 and 4af1790.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (22)

.github/actions/sentry_cli/action.yaml (1 hunks)
.github/workflows/ai_cd.yaml (1 hunks)
.github/workflows/ai_ci.yaml (1 hunks)
Cargo.toml (2 hunks)
apps/ai/Cargo.toml (2 hunks)
apps/ai/Dockerfile (1 hunks)
apps/ai/fly.toml (1 hunks)
apps/ai/src/auth.rs (5 hunks)
apps/ai/src/env.rs (2 hunks)
apps/ai/src/main.rs (1 hunks)
apps/stt/src/main.rs (0 hunks)
crates/llm-proxy/Cargo.toml (1 hunks)
crates/llm-proxy/src/error.rs (1 hunks)
crates/llm-proxy/src/lib.rs (1 hunks)
crates/llm-proxy/src/posthog.rs (1 hunks)
crates/llm-proxy/src/router.rs (1 hunks)
crates/transcribe-proxy/Cargo.toml (1 hunks)
crates/transcribe-proxy/src/config.rs (1 hunks)
crates/transcribe-proxy/src/lib.rs (1 hunks)
crates/transcribe-proxy/src/router.rs (4 hunks)
crates/transcribe-proxy/src/service.rs (6 hunks)
doxxer.ai.toml (1 hunks)

💤 Files with no reviewable changes (1)

apps/stt/src/main.rs

🧰 Additional context used

📓 Path-based instructions (2)

**/*

📄 CodeRabbit inference engine (AGENTS.md)

Format using dprint fmt from the root. Do not use cargo fmt.

Files:

crates/llm-proxy/src/lib.rs
crates/llm-proxy/src/error.rs
apps/ai/fly.toml
crates/transcribe-proxy/src/config.rs
crates/llm-proxy/Cargo.toml
apps/ai/src/env.rs
apps/ai/Cargo.toml
crates/transcribe-proxy/src/service.rs
apps/ai/src/main.rs
crates/llm-proxy/src/router.rs
crates/transcribe-proxy/src/lib.rs
apps/ai/Dockerfile
Cargo.toml
apps/ai/src/auth.rs
doxxer.ai.toml
crates/transcribe-proxy/Cargo.toml
crates/llm-proxy/src/posthog.rs
crates/transcribe-proxy/src/router.rs

**/*.{ts,tsx,rs,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

By default, avoid writing comments at all. If you write one, it should be about 'Why', not 'What'.

Files:

crates/llm-proxy/src/lib.rs
crates/llm-proxy/src/error.rs
crates/transcribe-proxy/src/config.rs
apps/ai/src/env.rs
crates/transcribe-proxy/src/service.rs
apps/ai/src/main.rs
crates/llm-proxy/src/router.rs
crates/transcribe-proxy/src/lib.rs
apps/ai/src/auth.rs
crates/llm-proxy/src/posthog.rs
crates/transcribe-proxy/src/router.rs

🧬 Code graph analysis (7)

crates/llm-proxy/src/lib.rs (3)

apps/api/src/integration/posthog.ts (1)

posthog (5-8)

crates/llm-proxy/src/router.rs (1)

router (77-86)

crates/transcribe-proxy/src/router.rs (1)

router (24-33)

crates/transcribe-proxy/src/config.rs (4)

crates/llm-proxy/src/router.rs (1)

new (33-48)

crates/transcribe-proxy/src/service.rs (1)

new (313-325)

apps/ai/src/env.rs (1)

api_keys (49-51)

crates/transcribe-proxy/src/router.rs (1)

s (42-42)

apps/ai/src/main.rs (4)

apps/ai/src/env.rs (1)

env (17-22)

crates/llm-proxy/src/router.rs (2)

new (33-48)

router (77-86)

crates/transcribe-proxy/src/config.rs (1)

new (16-22)

crates/transcribe-proxy/src/router.rs (2)

router (24-33)

s (42-42)

crates/llm-proxy/src/router.rs (1)

crates/llm-proxy/src/posthog.rs (2)

capture_ai_generation (60-80)

fetch_generation_metadata (97-124)

crates/transcribe-proxy/src/lib.rs (2)

crates/llm-proxy/src/router.rs (1)

router (77-86)

crates/transcribe-proxy/src/router.rs (1)

router (24-33)

crates/llm-proxy/src/posthog.rs (1)

crates/llm-proxy/src/router.rs (1)

new (33-48)

crates/transcribe-proxy/src/router.rs (2)

crates/transcribe-proxy/src/config.rs (1)

new (16-22)

crates/transcribe-proxy/src/service.rs (3)

new (313-325)

upstream_url (74-77)

builder (187-189)

🪛 actionlint (1.7.9)

.github/workflows/ai_ci.yaml

22-22: description is required in metadata of "" action at "/home/jailuser/git/.github/actions/rust_install/action.yaml"

(action)

22-22: name is required in action metadata "/home/jailuser/git/.github/actions/rust_install/action.yaml"

(action)

.github/workflows/ai_cd.yaml

15-15: description is required in metadata of "" action at "/home/jailuser/git/.github/actions/doxxer_install/action.yaml"

(action)

15-15: name is required in action metadata "/home/jailuser/git/.github/actions/doxxer_install/action.yaml"

(action)

24-24: label "depot-ubuntu-24.04-8" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

28-28: description is required in metadata of "" action at "/home/jailuser/git/.github/actions/rust_install/action.yaml"

(action)

28-28: name is required in action metadata "/home/jailuser/git/.github/actions/rust_install/action.yaml"

(action)

58-58: description is required in metadata of "" action at "/home/jailuser/git/.github/actions/sentry_cli/action.yaml"

(action)

58-58: name is required in action metadata "/home/jailuser/git/.github/actions/sentry_cli/action.yaml"

(action)

67-67: label "depot-ubuntu-24.04-8" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Redirect rules - hyprnote-storybook
GitHub Check: Redirect rules - hyprnote
GitHub Check: Header rules - hyprnote-storybook
GitHub Check: Header rules - hyprnote
GitHub Check: Pages changed - hyprnote-storybook
GitHub Check: Pages changed - hyprnote
GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
GitHub Check: desktop_ci (macos, depot-macos-14)
GitHub Check: ci
GitHub Check: fmt

🔇 Additional comments (23)

crates/llm-proxy/src/lib.rs (1)

1-7: LGTM! Clean module structure.

The module organization and public re-exports follow Rust conventions. The wildcard re-exports provide a convenient API surface for consumers of this crate.

crates/transcribe-proxy/src/lib.rs (1)

1-9: LGTM! Improved public API surface.

The refactoring from wildcard exports (pub use service::*) to explicit exports improves API clarity and reduces the risk of unintentionally exposing internal implementation details. This follows Rust best practices for library crates.

doxxer.ai.toml (1)

6-13: No changes needed—the version increment configuration is valid.

All version levels (patch, minor, major) set to increment = 1 is correct and standard for semantic versioning. The doxxer tool applies the specified increment to each version component when calculating the next version; this configuration is intentional and produces the expected SemVer behavior.
crates/llm-proxy/Cargo.toml (1)
4-4: Verify Rust edition 2024 compatibility.

The crate uses edition = "2024", which was introduced in Rust 1.85 (December 2024). Ensure your CI/CD environment and all developers have Rust 1.85+ installed.

Run this script to check the Rust version in your CI environment:
#!/bin/bash
# Check Rust version supports edition 2024 (requires 1.85+)
rustc --version
cargo --version

# Verify the build succeeds with edition 2024
cargo check -p llm-proxy
Cargo.toml (1)

13-13: LGTM!

The workspace configuration changes properly add the new apps/ai member and hypr-llm-proxy dependency, aligning with the PR's objective to introduce the new AI proxy component.

Also applies to: 51-51

apps/ai/fly.toml (1)

1-42: Deployment configuration looks well-structured.

Blue-green deployment strategy, auto-scaling settings, and HTTPS enforcement are appropriate for a production service. Consider monitoring resource utilization after deployment to validate the 1GB memory allocation is sufficient for LLM proxy workloads.

crates/transcribe-proxy/Cargo.toml (1)

6-21: LGTM!

The dependency additions appropriately support the new stateful router architecture with HTTP client capabilities (reqwest), serialization (serde, serde_json), and provider integration (owhisper-providers).

apps/ai/Cargo.toml (1)

1-25: LGTM!

The package configuration properly sets up the new ai application with appropriate dependencies for LLM proxy (hypr-llm-proxy), transcription proxy (hypr-transcribe-proxy), and observability (sentry with tower integration).

crates/transcribe-proxy/src/service.rs (1)

36-41: Good refactoring of pending state management.

Consolidating the three separate Arc<Mutex<...>> fields into PendingState improves cohesion and makes it clearer that these fields are conceptually related. The Clone derive is correct since Arc<Mutex<T>> implements Clone.

crates/transcribe-proxy/src/router.rs (4)

18-33: LGTM! Clean stateful router setup.

The AppState pattern properly encapsulates configuration and a shared HTTP client. Creating reqwest::Client::new() once and sharing via state is correct—the client uses connection pooling internally and is designed to be cloned cheaply.

35-63: LGTM!

The handler properly validates API key configuration upfront and provides appropriate error responses. The provider resolution from query params with fallback to default is well-implemented.

111-123: Hardcoded Gladia configuration may limit flexibility.

encoding, bit_depth, and other config values are hardcoded. If clients send audio in different formats, this could cause transcription issues. Consider deriving these from query params or documenting the expected client audio format.

Is the hardcoded wav/pcm encoding with 16-bit depth intentional? If so, clients must ensure their audio matches this format.

151-177: LGTM!

The build_proxy function correctly applies provider-specific authentication mechanisms and propagates configuration (connect_timeout). The closure in FirstMessage auth properly captures api_key by value for the move semantics required.

apps/ai/src/env.rs (1)

9-12: LGTM - New environment fields properly added.

The new sentry_environment and openrouter_api_key fields are correctly declared and follow the existing pattern.

.github/workflows/ai_cd.yaml (1)

24-25: Custom runner labels are intentional.

The static analysis warnings about depot-ubuntu-24.04-8 being an unknown label can be ignored - this appears to be a custom/self-hosted runner (Depot). Consider adding an actionlint.yaml config file to suppress these warnings.

Also applies to: 67-68

crates/transcribe-proxy/src/config.rs (2)

8-13: LGTM - Clean config struct with reasonable defaults.

The struct design with builder pattern is well-structured. The 5-second default connect timeout is reasonable for external service calls.

34-36: LGTM - Efficient key lookup.

api_key_for correctly returns Option<&str> avoiding unnecessary clones.

crates/llm-proxy/src/router.rs (2)

37-45: Verify default model identifiers are valid.

Some default models like "openai/gpt-oss-120b:exacto" and "openai/gpt-5.1-chat" don't appear to match standard OpenRouter model identifiers. Please verify these are correct or intended as placeholders to be overridden.

192-243: LGTM on handler setup and error handling.

Good use of tokio::time::timeout to bound the upstream request, and appropriate error responses (502 for upstream failures, 504 for timeout).

crates/llm-proxy/src/posthog.rs (4)

65-69: Verify distinct_id usage aligns with PostHog analytics goals.

Using trace_id (the generation ID) as distinct_id means each AI generation appears as a unique "user" in PostHog. If you want to correlate generations by actual user or session, consider passing a user/session identifier instead.

4-22: LGTM on PostHogConfig.

Clean builder pattern with sensible default host.

24-45: LGTM on AiGenerationProperties.

Well-structured with proper serde field renames matching PostHog's expected schema.

97-124: LGTM on fetch_generation_metadata overall structure.

Good error handling with early returns and appropriate warning logs. (See earlier comment about URL encoding.)

.github/actions/sentry_cli/action.yaml

.github/workflows/ai_cd.yaml

apps/ai/Dockerfile

apps/ai/src/auth.rs

crates/llm-proxy/src/posthog.rs

crates/llm-proxy/src/router.rs

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (5)

crates/llm-proxy/src/router.rs (5)

85-89: ChatMessage struct is missing fields required for tool calling.

As noted in previous reviews, the struct only captures role and content, but OpenRouter tool-calling flows require additional fields like tool_calls, tool_call_id, and name. This will drop those fields during deserialization.

163-174: Serialize implementation drops tool-calling fields.

This impl hard-codes serialization to only role and content. If you add the missing fields to ChatMessage as suggested in the previous review, update this impl to include them conditionally or derive Serialize with #[serde(skip_serializing_if = "Option::is_none")].

262-306: SSE parsing may miss metadata if events span chunk boundaries.

As noted in previous reviews, each chunk is parsed independently. If an SSE event is split across two network chunks, metadata extraction will fail. Since this only affects telemetry and raw chunks are still forwarded correctly, this is a minor reliability concern.

323-345: PostHog telemetry calls have no timeout protection.

As noted in previous reviews, the fetch_generation_metadata and analytics event calls inside the spawned task run without a timeout. If OpenRouter or PostHog is slow/unresponsive, these tasks will hang indefinitely.

385-406: PostHog telemetry calls have no timeout protection.

Same issue as the streaming path: telemetry calls in the spawned task lack timeout protection.

🧹 Nitpick comments (3)

crates/transcribe-proxy/src/router.rs (1)

177-192: Consider using a stable string representation for analytics provider name.

Using format!("{:?}", provider).to_lowercase() relies on Debug output, which may change if the enum's derive or variants are modified. For analytics consistency, consider adding a method like provider.name() or using Display trait if available.
🔎 Suggested approach
-        let provider_name = format!("{:?}", provider).to_lowercase();
+        let provider_name = provider.to_string(); // if Display is implemented
+        // or: let provider_name = provider.name(); // if a dedicated method exists

crates/transcribe-proxy/src/service.rs (2)

586-594: Consider using VecDeque for O(1) front removal.

Vec::remove(0) is O(n) as it shifts all remaining elements. Since messages are always removed from the front, VecDeque would provide O(1) pop_front() operations. This could matter under high message throughput.

🔎 Suggested change in PendingState

+use std::collections::VecDeque;

 #[derive(Clone)]
 struct PendingState {
-    control_messages: Arc<Mutex<Vec<QueuedPayload>>>,
-    data_messages: Arc<Mutex<Vec<QueuedPayload>>>,
+    control_messages: Arc<Mutex<VecDeque<QueuedPayload>>>,
+    data_messages: Arc<Mutex<VecDeque<QueuedPayload>>>,
     bytes: Arc<Mutex<usize>>,
 }

Then replace remove(0) with pop_front().unwrap():

-                                    let queued = if !control.is_empty() {
-                                        control.remove(0)
-                                    } else {
-                                        data_queue.remove(0)
-                                    };
+                                    let queued = if !control.is_empty() {
+                                        control.pop_front().unwrap()
+                                    } else {
+                                        data_queue.pop_front().unwrap()
+                                    };

531-694: Consider extracting common message handling logic.

The Message::Text (lines 531-617) and Message::Binary (lines 618-694) branches share nearly identical logic for queuing, backpressure checking, and sending. Extracting a helper method could reduce duplication and maintenance burden.

🔎 Possible approach

async fn handle_payload(
    data: Vec<u8>,
    is_text: bool,
    control_matcher: &Option<ControlMessageMatcher>,
    pending_control_messages: &Mutex<Vec<QueuedPayload>>,
    pending_data_messages: &Mutex<Vec<QueuedPayload>>,
    pending_bytes: &Mutex<usize>,
    upstream_sender: &mut UpstreamSender,
    shutdown_tx: &tokio::sync::broadcast::Sender<(u16, String)>,
) -> Result<(), ()> {
    // Unified queuing and sending logic
}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4af1790 and 23d6700.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

crates/llm-proxy/Cargo.toml (1 hunks)
crates/llm-proxy/src/lib.rs (1 hunks)
crates/llm-proxy/src/router.rs (1 hunks)
crates/transcribe-proxy/Cargo.toml (1 hunks)
crates/transcribe-proxy/src/config.rs (1 hunks)
crates/transcribe-proxy/src/router.rs (4 hunks)
crates/transcribe-proxy/src/service.rs (24 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

crates/transcribe-proxy/src/config.rs
crates/llm-proxy/Cargo.toml

🧰 Additional context used

📓 Path-based instructions (2)

**/*

📄 CodeRabbit inference engine (AGENTS.md)

Format using dprint fmt from the root. Do not use cargo fmt.

Files:

crates/transcribe-proxy/src/router.rs
crates/llm-proxy/src/lib.rs
crates/transcribe-proxy/Cargo.toml
crates/llm-proxy/src/router.rs
crates/transcribe-proxy/src/service.rs

**/*.{ts,tsx,rs,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

By default, avoid writing comments at all. If you write one, it should be about 'Why', not 'What'.

Files:

crates/transcribe-proxy/src/router.rs
crates/llm-proxy/src/lib.rs
crates/llm-proxy/src/router.rs
crates/transcribe-proxy/src/service.rs

🧬 Code graph analysis (2)

crates/llm-proxy/src/lib.rs (1)

crates/llm-proxy/src/router.rs (1)

router (74-83)

crates/llm-proxy/src/router.rs (2)

crates/transcribe-proxy/src/config.rs (2)

new (18-25)

with_analytics (37-40)

crates/transcribe-proxy/src/router.rs (2)

router (25-34)

s (43-43)

🔇 Additional comments (13)

crates/transcribe-proxy/Cargo.toml (1)

7-23: LGTM! Dependencies are well-organized and appropriate.

The added dependencies align with the PR objectives:

hypr-analytics and owhisper-providers support the new analytics and OpenWhisper/OpenRouter integrations

Utility dependencies (reqwest, uuid, serde, serde_json, url, bytes, thiserror) provide necessary HTTP, serialization, and error handling capabilities

All dependencies use workspace-managed versions, which maintains consistency across the monorepo

Feature selections (json, v4, derive) are appropriate and minimal

crates/transcribe-proxy/src/router.rs (5)

19-34: LGTM!

The AppState struct cleanly encapsulates configuration and HTTP client, and the router setup is idiomatic for axum. Since reqwest::Client uses internal Arc, cloning AppState is efficient.

36-64: LGTM!

The handler correctly validates provider configuration, returns appropriate HTTP status codes for different failure modes (500 for missing API key, 502 for upstream resolution failures), and includes useful logging context.

66-89: LGTM!

The function handles different authentication modes cleanly. The unwrap() on line 77 is acceptable since provider.default_ws_url() should return a well-formed URL controlled by the crate.

91-150: LGTM!

The session initialization logic is well-structured with proper error handling. The Gladia-specific configuration is appropriate given this is the current use case for Auth::SessionInit.

197-222: LGTM!

The data structures are clean and appropriately defined for the Gladia API interaction.

crates/transcribe-proxy/src/service.rs (6)

29-43: LGTM!

The OnCloseCallback type alias and PendingState struct are well-designed. Consolidating the three pending queue fields into a single struct improves code organization and makes it easier to pass around related state.

127-133: LGTM!

The on_close builder method follows the established pattern and properly wraps the callback in Arc for thread-safe sharing.

183-189: LGTM!

Consistent implementation of on_close in the request-based builder variant.

256-261: LGTM!

The on_close callback is correctly propagated through the preconnect() path to PreconnectedProxy.

371-437: LGTM!

The timing logic correctly captures start_time before connection setup and invokes on_close with the elapsed duration after both client-to-upstream and upstream-to-client tasks complete.

446-481: LGTM!

The run_with_upstream method maintains the same timing and callback pattern as run, ensuring consistent behavior for preconnected proxies.

crates/llm-proxy/src/lib.rs (1)

1-5: LGTM!

Clean and idiomatic library structure. The module declarations and re-exports properly expose the public API surface for the LLM proxy.

crates/llm-proxy/src/router.rs

yujonglee added 3 commits December 18, 2025 16:21

update auth for stt

e333a5c

added llm-proxy

40b0629

wip

4af1790

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

update analytics stuff

23d6700

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

crates/llm-proxy/src/router.rs Outdated Show resolved Hide resolved

add e2e test

61e3198

yujonglee merged commit 8366881 into main Dec 18, 2025
14 of 16 checks passed

yujonglee deleted the new-ai-proxy branch December 18, 2025 09:37

coderabbitai bot mentioned this pull request Dec 20, 2025

Transcribe-proxy replay testing #2438

Merged

4 tasks

New ai proxy #2397

New ai proxy #2397

Uh oh!

Conversation

yujonglee commented Dec 18, 2025

Uh oh!

netlify bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote ready!

Uh oh!

netlify bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote-storybook ready!

Uh oh!

coderabbitai bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Dec 18, 2025 •

edited

Loading

netlify bot commented Dec 18, 2025 •

edited

Loading

coderabbitai bot commented Dec 18, 2025 •

edited

Loading