Skip to content

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 5, 2025

feat(api): add batch STT transcription endpoint

Summary

Adds a new POST /transcribe HTTP endpoint for batch speech-to-text transcription via file upload. This mirrors the existing real-time WebSocket proxy (GET /listen) but for batch processing.

New files:

  • batch-types.ts - TypeScript types matching owhisper_interface::batch::Response
  • batch-deepgram.ts - Synchronous POST to Deepgram's batch API
  • batch-assemblyai.ts - Upload → Create transcript → Poll until complete
  • batch-soniox.ts - Upload file → Create transcription → Poll → Get transcript

Usage:

POST /transcribe?provider=deepgram&language=en
Content-Type: audio/wav
Authorization: Bearer <token>

<audio data>

Updates since last revision

  • Added requireSupabaseAuth middleware to /transcribe endpoint in index.ts (same pattern as /listen)
  • Wired params.model to AssemblyAI's speech_model parameter for model selection consistency across all providers

Review & Testing Checklist for Human

  • Test with real audio files - The provider implementations are ported from Rust adapters but haven't been tested against actual provider APIs. Test each provider (Deepgram, AssemblyAI, Soniox) with a sample audio file.
  • Verify response format compatibility - Confirm the BatchResponse TypeScript type matches what the Rust client (owhisper_interface::batch::Response) expects to deserialize.
  • Review timeout implications - AssemblyAI/Soniox polling can run up to ~10 minutes (200 attempts × 3s). Verify this won't hit infrastructure timeouts (load balancer, edge proxy, etc.).
  • Verify auth in production - Auth middleware is only applied when NODE_ENV !== "development". Confirm this matches the /listen endpoint behavior and is intentional.

Recommended test plan:

  1. Deploy to staging
  2. curl -X POST "https://api.staging.hyprnote.com/transcribe?provider=deepgram" -H "Authorization: Bearer $TOKEN" -H "Content-Type: audio/wav" --data-binary @test.wav
  3. Repeat for ?provider=assemblyai and ?provider=soniox
  4. Verify response structure matches expected schema

Notes

Add POST /transcribe endpoint for batch speech-to-text transcription
via file upload. This mirrors the existing real-time WebSocket proxy
pattern but for batch processing.

Features:
- Support for Deepgram, AssemblyAI, and Soniox providers via ?provider= query param
- Normalized BatchResponse format matching owhisper_interface::batch::Response
- Proper polling for async providers (AssemblyAI, Soniox)
- OpenAPI documentation with Zod schemas
- Sentry tracing and metrics integration

Usage:
POST /transcribe?provider=deepgram&language=en
Content-Type: audio/wav
<audio data>

Co-Authored-By: yujonglee <[email protected]>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Dec 5, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit c17d552
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/6932b09fbecef100081e39cb
😎 Deploy Preview https://deploy-preview-2146--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Dec 5, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit c17d552
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/6932b09fda05e00008e47ef9
😎 Deploy Preview https://deploy-preview-2146--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

📝 Walkthrough

Walkthrough

A batch transcription feature is introduced with support for multiple providers (Deepgram, AssemblyAI, Soniox). New types define the batch response structure. Provider-specific modules implement audio transcription workflows. A dispatcher routes requests to the appropriate provider. A new POST /transcribe endpoint exposes this functionality via the public API.

Changes

Cohort / File(s) Summary
Type Definitions
apps/api/src/stt/batch-types.ts
Adds TypeScript types for batch transcription: BatchWord, BatchAlternatives, BatchChannel, BatchResults, BatchResponse, BatchProvider (union of provider names), and BatchParams (languages, keywords, model options).
Dispatcher & Re-exports
apps/api/src/stt/index.ts
Adds transcribeBatch() function that routes to provider-specific implementations based on BatchProvider parameter. Re-exports batch types and provider-specific transcribe functions.
Provider Implementations
apps/api/src/stt/batch-deepgram.ts, batch-assemblyai.ts, batch-soniox.ts
Implements provider-specific transcription workflows. Deepgram: single API call with URL parameter building. AssemblyAI: upload → create transcript → poll for results → convert to response. Soniox: upload → create transcription → poll with timeout → retrieve transcript → convert to response. Each includes error handling and response mapping to internal BatchResponse format.
Route Handler
apps/api/src/routes.ts
Adds POST /transcribe endpoint that accepts provider, language, keyword, and model via query parameters, reads audio from request body, delegates to transcribeBatch(), and returns BatchResponseSchema or BatchErrorSchema. Includes upstream latency metrics, Sentry tagging, and differentiates upstream failures (502) from general errors (500). Returns 400 for missing audio data.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant API as POST /transcribe<br/>Endpoint
    participant Dispatcher as transcribeBatch<br/>Dispatcher
    participant Provider as Provider-Specific<br/>Module
    participant ExtAPI as External API<br/>(AssemblyAI/Deepgram/Soniox)
    participant DB as Response Mapping

    Client->>API: Audio + params<br/>(provider, language,<br/>keywords, model)
    API->>API: Validate audio data<br/>(400 if missing)
    API->>Dispatcher: transcribeBatch(provider,<br/>audioData, contentType, params)
    Dispatcher->>Provider: Route to provider impl
    
    alt Provider: Deepgram
        Provider->>ExtAPI: POST batch listen<br/>(audio, params)
        ExtAPI-->>Provider: BatchResponse
    else Provider: AssemblyAI
        Provider->>ExtAPI: Upload audio
        ExtAPI-->>Provider: Upload URL
        Provider->>ExtAPI: Create transcript
        ExtAPI-->>Provider: Transcript ID
        loop Poll (max attempts)
            Provider->>ExtAPI: Check status
            ExtAPI-->>Provider: Status/Result
            break On completion or error
            end
        end
        Provider->>DB: convertToResponse()
        DB-->>Provider: Mapped BatchResponse
    else Provider: Soniox
        Provider->>ExtAPI: Upload audio file
        ExtAPI-->>Provider: File ID
        Provider->>ExtAPI: Create transcription
        ExtAPI-->>Provider: Transcription ID
        loop Poll (max attempts)
            Provider->>ExtAPI: Check status
            ExtAPI-->>Provider: Status/Result
            break On completion or error
            end
        end
        Provider->>ExtAPI: Retrieve transcript
        ExtAPI-->>Provider: Full transcript
        Provider->>DB: convertToResponse()
        DB-->>Provider: Mapped BatchResponse
    end
    
    Provider-->>Dispatcher: BatchResponse or error
    Dispatcher-->>API: BatchResponse or error
    
    alt Success
        API-->>Client: 200 + BatchResponseSchema
        API->>API: Record latency metric
    else Upstream Error
        API-->>Client: 502 + BatchErrorSchema
    else Other Error
        API-->>Client: 500 + BatchErrorSchema
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Provider orchestration complexity: AssemblyAI and Soniox implementations include polling loops, timeout handling, and multi-step workflows (upload → create → poll → convert) that require careful verification of error boundaries and state management.
  • Type consistency across providers: Verify that all provider implementations correctly map their respective API responses to the unified BatchResponse structure, particularly word-level timing, speaker extraction, and alternative/channel nesting.
  • Error handling and status codes: Route handler differentiates between upstream failures (502) and general errors (500); ensure Sentry tagging context is rich and error propagation is correct across the provider-to-route call chain.
  • Parameter handling variance: Deepgram, AssemblyAI, and Soniox each interpret language, keywords, and model parameters differently; verify query parameter passing and API-specific transformations.
  • Polling robustness: Both AssemblyAI and Soniox use polling with max attempts and timeouts; confirm interval/timeout constants are appropriate and error messages are informative.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding a batch STT transcription endpoint to the API.
Description check ✅ Passed The pull request description clearly relates to the changeset, detailing a new batch STT transcription endpoint with specific file additions, API usage examples, and implementation notes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764928422-batch-stt-endpoint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
apps/api/src/stt/batch-deepgram.ts (1)

35-43: Consider adding a timeout to prevent indefinite hangs.

The fetch call to Deepgram has no timeout configured. If the upstream service becomes unresponsive, this could block indefinitely.

-  const response = await fetch(url.toString(), {
+  const controller = new AbortController();
+  const timeoutId = setTimeout(() => controller.abort(), 120_000); // 2 min timeout for batch
+
+  const response = await fetch(url.toString(), {
     method: "POST",
     headers: {
       Authorization: `Token ${env.DEEPGRAM_API_KEY}`,
       "Content-Type": contentType,
       Accept: "application/json",
     },
     body: audioData,
+    signal: controller.signal,
-  });
+  }).finally(() => clearTimeout(timeoutId));
apps/api/src/routes.ts (3)

426-430: Provider validation is missing - invalid values silently fall through to default.

The provider query parameter is cast directly to BatchProvider without validation. An invalid provider like ?provider=invalid will silently use Deepgram (the default case in transcribeBatch). Consider validating the provider value.

-    type BatchProvider = "deepgram" | "assemblyai" | "soniox";
+    const VALID_PROVIDERS = ["deepgram", "assemblyai", "soniox"] as const;
+    type BatchProvider = (typeof VALID_PROVIDERS)[number];

     const clientUrl = new URL(c.req.url, "http://localhost");
-    const provider =
-      (clientUrl.searchParams.get("provider") as BatchProvider) ?? "deepgram";
+    const providerParam = clientUrl.searchParams.get("provider");
+    const provider: BatchProvider =
+      providerParam && VALID_PROVIDERS.includes(providerParam as BatchProvider)
+        ? (providerParam as BatchProvider)
+        : "deepgram";

Alternatively, import BatchProvider from ./stt to avoid the duplicate type declaration.


471-473: Fragile upstream error detection based on substring matching.

The check errorMessage.includes("failed:") is brittle and could misclassify errors if error message formats change. Consider using a custom error class or error codes from the provider modules instead.

// In provider modules, throw a typed error:
class UpstreamError extends Error {
  constructor(message: string, public readonly provider: string) {
    super(message);
    this.name = "UpstreamError";
  }
}

// In route handler:
const isUpstreamError = error instanceof UpstreamError;

58-89: Zod schemas duplicate TypeScript types from batch-types.ts.

The schemas mirror the types defined in batch-types.ts. While necessary for OpenAPI documentation, this creates a maintenance burden where changes must be synchronized manually. Consider deriving types from schemas using z.infer<> or generating schemas from types.

apps/api/src/stt/batch-soniox.ts (2)

37-39: Consider specifying MIME type for Blob.

The Blob is created without a MIME type, which could cause issues with the file upload if Soniox expects a specific content type. Consider passing the contentType parameter (currently unused) to the Blob constructor.

-const uploadFile = async (
-  audioData: ArrayBuffer,
-  fileName: string,
-): Promise<string> => {
+const uploadFile = async (
+  audioData: ArrayBuffer,
+  fileName: string,
+  contentType?: string,
+): Promise<string> => {
   const formData = new FormData();
-  const blob = new Blob([audioData]);
+  const blob = new Blob([audioData], contentType ? { type: contentType } : undefined);
   formData.append("file", blob, fileName);

175-188: Default confidence of 1.0 may be misleading.

When Soniox doesn't provide confidence values, defaulting to 1.0 (100% confidence) could mislead consumers of the API. Consider using null or a clearly marked default value, or document this behavior.

apps/api/src/stt/index.ts (1)

23-23: Consider consolidating SttProvider and BatchProvider.

Both types represent the same set of providers ("deepgram" | "assemblyai" | "soniox"). While they serve different contexts (streaming vs batch), consolidating to a single Provider type could reduce duplication.

-export type SttProvider = "deepgram" | "assemblyai" | "soniox";
+// Use BatchProvider for both streaming and batch contexts
+export type { BatchProvider as SttProvider } from "./batch-types";
apps/api/src/stt/batch-assemblyai.ts (4)

11-31: Config constants and basic AssemblyAI types look good; minor typing nit on speaker

The constants and transcript shape line up with AssemblyAI’s async STT API surface, and the 3s × 200 polling budget (≈10 minutes) is a reasonable default.

One small TypeScript nit: AssemblyAI can return speaker: null when diarization is disabled, so widening the type to include null would better match the API response and avoid surprises at call sites.(learn.microsoft.com)

 type AssemblyAIWord = {
   text: string;
   start: number;
   end: number;
   confidence: number;
-  speaker?: string;
+  speaker?: string | null;
 };

33-52: Upload flow and error handling look solid; consider guarding against hung requests

The /upload call is straightforward and you surface rich error details from AssemblyAI, which is great for debugging.

One thing you might want to layer in (either here or at a higher level) is a fetch timeout/abort mechanism so a stuck network connection doesn’t hold onto a worker indefinitely, especially given large audio payloads.


102-138: Polling loop is correct; ensure the 10‑minute cap matches your API timeouts

The poller correctly:

  • Retrieves /v2/transcript/{id} in a loop.
  • Exits immediately on status === "completed" or throws on status === "error".
  • Throws a clear timeout error after MAX_POLL_ATTEMPTS.

Given POLL_INTERVAL_MS = 3000 and MAX_POLL_ATTEMPTS = 200, you cap a transcription at ~10 minutes. That’s reasonable, but if you expect long audio files and slower models (e.g. Slam‑1 with keyterms prompting), it’s worth double‑checking that:

  • This window is sufficient for your typical workloads, and
  • It doesn’t conflict with upstream HTTP/server timeouts for the batch endpoint.

180-189: End‑to‑end orchestration is clear; watch overall latency and memory usage

The transcribeWithAssemblyAI pipeline (upload → create transcript → poll → convert) is easy to follow and keeps provider‑specific logic well factored.

A couple of higher‑level considerations:

  • The function holds onto the full ArrayBuffer for the duration of the upload. If you expect very large audio files or high concurrency, streaming from disk/Blob (where possible) could reduce memory pressure.
  • Combined with the 10‑minute polling window, this endpoint can tie up an API worker for a long time per request. If batch jobs are expected to be long‑running, you may eventually want to push transcript creation/polling into a background worker and return a job handle instead of blocking the HTTP request.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 775c138 and 332efc8.

📒 Files selected for processing (6)
  • apps/api/src/routes.ts (2 hunks)
  • apps/api/src/stt/batch-assemblyai.ts (1 hunks)
  • apps/api/src/stt/batch-deepgram.ts (1 hunks)
  • apps/api/src/stt/batch-soniox.ts (1 hunks)
  • apps/api/src/stt/batch-types.ts (1 hunks)
  • apps/api/src/stt/index.ts (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.ts: Agent implementations should use TypeScript and follow the established architectural patterns defined in the agent framework
Agent communication should use defined message protocols and interfaces

Files:

  • apps/api/src/stt/batch-types.ts
  • apps/api/src/stt/batch-assemblyai.ts
  • apps/api/src/stt/batch-soniox.ts
  • apps/api/src/stt/index.ts
  • apps/api/src/stt/batch-deepgram.ts
  • apps/api/src/routes.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Avoid creating a bunch of types/interfaces if they are not shared. Especially for function props, just inline them instead.
Never do manual state management for form/mutation. Use useForm (from tanstack-form) and useQuery/useMutation (from tanstack-query) instead for 99% of cases. Avoid patterns like setError.
If there are many classNames with conditional logic, use cn (import from @hypr/utils). It is similar to clsx. Always pass an array and split by logical grouping.
Use motion/react instead of framer-motion.

Files:

  • apps/api/src/stt/batch-types.ts
  • apps/api/src/stt/batch-assemblyai.ts
  • apps/api/src/stt/batch-soniox.ts
  • apps/api/src/stt/index.ts
  • apps/api/src/stt/batch-deepgram.ts
  • apps/api/src/routes.ts
🧬 Code graph analysis (6)
apps/api/src/stt/batch-types.ts (1)
apps/api/src/stt/index.ts (3)
  • BatchResponse (15-15)
  • BatchProvider (15-15)
  • BatchParams (15-15)
apps/api/src/stt/batch-assemblyai.ts (1)
apps/api/src/stt/batch-types.ts (6)
  • BatchParams (31-35)
  • BatchResponse (24-27)
  • BatchWord (1-8)
  • BatchAlternatives (10-14)
  • BatchChannel (16-18)
  • BatchResults (20-22)
apps/api/src/stt/batch-soniox.ts (2)
apps/api/src/stt/batch-types.ts (6)
  • BatchParams (31-35)
  • BatchResponse (24-27)
  • BatchWord (1-8)
  • BatchAlternatives (10-14)
  • BatchChannel (16-18)
  • BatchResults (20-22)
apps/api/src/stt/index.ts (3)
  • BatchParams (15-15)
  • BatchResponse (15-15)
  • transcribeWithSoniox (18-18)
apps/api/src/stt/index.ts (4)
apps/api/src/stt/batch-types.ts (3)
  • BatchProvider (29-29)
  • BatchParams (31-35)
  • BatchResponse (24-27)
apps/api/src/stt/batch-assemblyai.ts (1)
  • transcribeWithAssemblyAI (180-189)
apps/api/src/stt/batch-soniox.ts (1)
  • transcribeWithSoniox (205-216)
apps/api/src/stt/batch-deepgram.ts (1)
  • transcribeWithDeepgram (6-53)
apps/api/src/stt/batch-deepgram.ts (2)
apps/api/src/stt/index.ts (3)
  • transcribeWithDeepgram (16-16)
  • BatchParams (15-15)
  • BatchResponse (15-15)
apps/api/src/stt/batch-types.ts (2)
  • BatchParams (31-35)
  • BatchResponse (24-27)
apps/api/src/routes.ts (3)
apps/api/src/stt/batch-types.ts (1)
  • BatchProvider (29-29)
apps/api/src/stt/index.ts (2)
  • BatchProvider (15-15)
  • transcribeBatch (25-41)
apps/api/src/sentry/metrics.ts (1)
  • Metrics (3-35)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: fmt
  • GitHub Check: Devin
🔇 Additional comments (9)
apps/api/src/stt/batch-deepgram.ts (2)

17-17: Verify that mip_opt_out=false is intentional.

Setting mip_opt_out to false allows Deepgram to use the audio data for model improvement. Depending on your data privacy requirements or customer agreements, you may want to set this to true to opt out.


52-52: Type assertion trusts upstream response structure.

The response is cast directly to BatchResponse without runtime validation. If Deepgram's API returns an unexpected structure, it could cause runtime errors downstream. This is acceptable if you trust the upstream API contract, but consider adding validation for critical paths.

apps/api/src/stt/batch-types.ts (1)

1-35: Well-structured type definitions for batch transcription.

The types provide a clean, unified contract for all providers to map their responses to. The structure mirrors Deepgram's format while allowing flexibility for metadata.

apps/api/src/routes.ts (1)

373-423: Route documentation and structure look good.

The endpoint is well-documented with OpenAPI descriptions, appropriate status codes (400, 401, 500, 502), and security requirements. The Sentry instrumentation and metrics collection provide good observability.

apps/api/src/stt/batch-soniox.ts (2)

102-138: Polling implementation is well-structured.

The polling logic correctly handles all status transitions (completed, error, queued, processing), includes a reasonable timeout (200 attempts × 3s = 10 minutes), and throws descriptive errors for unexpected states.


205-216: Clean orchestration of the Soniox workflow.

The main function properly sequences upload → create → poll → fetch → convert, with each step's errors propagating naturally. The _contentType parameter is correctly prefixed to indicate intentional non-use.

apps/api/src/stt/index.ts (2)

25-41: Clean dispatcher implementation.

The switch statement correctly routes to provider-specific implementations. The default fallback to Deepgram is reasonable. The fileName parameter is appropriately only forwarded to Soniox, which is the only provider requiring it for file uploads.


15-18: Good public API surface design.

Re-exporting both types and individual provider functions gives consumers flexibility to either use the unified transcribeBatch dispatcher or call providers directly when needed.

apps/api/src/stt/batch-assemblyai.ts (1)

140-178: BatchResponse mapping is consistent across STT providers

The conversion to BatchWord/BatchAlternatives/BatchChannel/BatchResults is clean and aligns with other providers:

  • Converting start/end from milliseconds to seconds matches the standard used by Soniox and keeps units consistent across the application.
  • Parsing the speaker label (e.g., "A", "B") into a numeric ID while falling back to undefined on parse failure is robust.
  • Falling back to "" when result.text is absent and to 1.0 when confidence is missing matches Soniox's implementation and is the project's standard.
  • punctuated_word is consistently mapped to the raw word text across all providers.

devin-ai-integration bot and others added 2 commits December 5, 2025 10:12
Wire params.model to AssemblyAI's speech_model parameter for model
selection, matching the behavior of Deepgram and Soniox handlers.

Addresses CodeRabbit review feedback.

Co-Authored-By: yujonglee <[email protected]>
Apply the same auth middleware pattern as /listen to ensure the
/transcribe endpoint requires Supabase authentication as documented
in the OpenAPI spec.

Addresses CodeRabbit review feedback.

Co-Authored-By: yujonglee <[email protected]>
@yujonglee yujonglee merged commit 2bf214c into main Dec 5, 2025
12 of 13 checks passed
@yujonglee yujonglee deleted the devin/1764928422-batch-stt-endpoint branch December 5, 2025 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants