feat: initial commit for the transcription service by thesecdude · Pull Request #1206 · xynehq/xyne

thesecdude · 2025-11-10T20:35:46Z

Summary by CodeRabbit

New Features
- Added audio/video transcription functionality with speaker identification and multiple output formats (JSON, TXT, SRT)
- Added transcription navigation link in the sidebar for quick access
- Enhanced keyboard accessibility for history toggle and theme controls
Bug Fixes
- Fixed history modal to properly close when clicking outside
Performance
- Implemented lazy loading for profile images

coderabbitai · 2025-11-10T20:35:58Z

Walkthrough

This PR introduces end-to-end automatic speech recognition capabilities, including a frontend transcription page for file uploads, backend ASR API endpoints, audio processing via Whisper and speaker diarization, job queue integration, LLM-driven transcript refinement, and multi-format result delivery.

Changes

Cohort / File(s)	Summary
Frontend Transcription Route `frontend/src/routes/_authenticated/transcription.tsx`	New authenticated route implementing full UI for audio/video upload, Whisper model/format selection, speaker count configuration, job status polling (30-min timeout), and dynamic result display with downloadable transcripts (JSON/TXT/SRT) and speaker segmentation preview.
Frontend Navigation Updates `frontend/src/components/Sidebar.tsx`	Added Transcription icon with `/transcription` link and active-state highlighting; enhanced keyboard accessibility for history and theme toggles (role, tabIndex, onKeyDown); updated profile image attributes (alt text, lazy loading); standardized active-state styling; refactored with UserRole type import.
Backend ASR API `server/api/asr.ts`	New module with two endpoints: TranscribeAudioApi (validates input, creates temporary workspace, enqueues ASR job, returns jobId) and GetJobStatusApi (fetches job status from queue, reads output files on completion, returns structured result). Includes Zod schemas for validation.
File Upload & Serve Handlers `server/api/files.ts`	Added handleSimpleFileUpload (multipart form data extraction, disk write to uploads/asr with timestamp-sanitized filename) and serveASRFile (path-traversal protection, content-type inference, file streaming). Minor refactor: replaced entries() loop with direct for-of iteration.
ASR Job Queue Processing `server/queue/asrProcessor.ts`	New module orchestrating transcription jobs: downloads audio, converts to mono 16 kHz WAV via ffmpeg, executes Python Whisper+diarization script, optionally refines output with LLM, generates JSON/TXT/SRT formats. Includes robust error handling and logging throughout pipeline.
Queue Initialization `server/queue/index.ts`	Added ASRQueue constant and worker; wires handleASRJob callback for sequential job processing with logging and error handling.
Audio Processing Pipeline `server/asr-sd/whisper_diarization.py`	New Python class ParallelWhisperDiarization executing Whisper transcription and Pyannote speaker diarization in parallel; merges results with deterministic speaker assignment; outputs JSON with timing metrics, speaker segments, and per-word detail. Includes CLI for standalone execution.
Transcript Refinement Service `server/services/transcriptRefinement.ts`	New service normalizing unknown speaker bridge segments, merging consecutive same-speaker segments, chunking within token limits, and LLM-driven refinement. Defines TranscriptSegment, TranscriptResult, and RefinementOptions interfaces. Exports six core functions with fallback error handling.
Server Routing Integration `server/server.ts`	Integrated new ASR and file endpoints: POST `/asr/transcribe`, GET `/asr/job-status`, POST `/files/upload-simple`, GET `/files/asr/:filename`. Added corresponding imports and schema validations.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Frontend
    participant Server as API Server
    participant Queue as Job Queue
    participant Python as Whisper/<br/>Diarization
    participant LLM as LLM Refiner
    
    User->>Frontend: Upload audio file
    Frontend->>Server: POST /files/upload-simple
    Server->>Server: Save file to disk
    Server-->>Frontend: Return upload URL
    
    Frontend->>Server: POST /asr/transcribe (with file URL)
    Server->>Server: Validate, create temp workspace
    Server->>Queue: Enqueue TranscribeJobData
    Server-->>Frontend: Return jobId
    
    Frontend->>Frontend: Start polling (1s interval)
    Frontend->>Server: GET /asr/job-status?jobId
    Server->>Queue: Check job status
    
    par Async Processing
        Queue->>Python: Download audio & convert to WAV
        Python->>Python: Run Whisper + Diarization<br/>(parallel execution)
        Python-->>Queue: Return transcript + speakers
    end
    
    Queue->>LLM: Refine transcript (if enabled)
    LLM-->>Queue: Refined segments
    Queue->>Queue: Write JSON/TXT/SRT outputs
    Queue->>Server: Mark job complete
    
    Frontend->>Server: GET /asr/job-status?jobId (polling)
    Server-->>Frontend: Return completed status +<br/>output URLs
    
    Frontend->>User: Display transcript, segments,<br/>speaker labels, download links

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Key areas requiring attention:

Python parallel execution and diarization logic (server/asr-sd/whisper_diarization.py): concurrent threading, deterministic speaker assignment, edge cases in merging transcription with diarization output.
ASR queue processor (server/queue/asrProcessor.ts): multi-stage pipeline (download → convert → transcribe → refine → output), error propagation and retry semantics, file I/O and cleanup.
Transcript refinement service (server/services/transcriptRefinement.ts): LLM integration, token-aware chunking, segment alignment/validation logic, fallback behavior on LLM errors.
Frontend polling and state management (frontend/src/routes/_authenticated/transcription.tsx): long-running job polling with timeout, cleanup of intervals, accurate UI state transitions across idle/uploading/queued/processing/completed/failed.
File handling security (server/api/files.ts): path-traversal protection in serveASRFile, multipart parsing robustness.

Possibly related PRs

Sheet chunking #1046: Modifies server/api/files.ts processingResults iteration logic; likely related to similar file-handling refactoring patterns.
feat(dashboard): usage analysis #681: Modifies frontend/src/components/Sidebar.tsx for navigation link updates; overlaps with Sidebar icon and routing changes in this PR.

Suggested reviewers

zereraz
shivamashtikar
junaid-shirur
devesh-juspay

Poem

🐰 Whispers and words, now captured true,
Speakers identified, diarization through,
From audio streams to refined transcripts bright,
The ASR pipeline shines—a new feature's flight! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the primary change: introducing a complete transcription service feature across frontend, backend, and queue processing layers.
Docstring Coverage	✅ Passed	Docstring coverage is 92.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/transcripts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-11-10T20:36:14Z

Summary of Changes

Hello @thesecdude, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a significant new feature: an end-to-end audio transcription service. It provides users with the ability to upload audio or video files and receive high-quality, speaker-diarized transcripts. The implementation spans both frontend and backend, establishing a robust, scalable processing workflow that leverages state-of-the-art AI models and includes intelligent refinement steps to ensure accurate and well-formatted output.

Highlights

New Transcription Service: This pull request introduces a comprehensive audio/video transcription service, integrating Whisper for transcription, Pyannote for speaker diarization, and an optional LLM for transcript refinement.
Frontend User Interface: A dedicated 'Transcription' page has been added to the frontend, accessible via the sidebar. Users can upload audio/video files, select transcription models, output formats, and specify speaker details.
Robust Backend Processing Pipeline: The backend now features an asynchronous job processing pipeline using pg-boss. This pipeline handles file download, audio format conversion (via FFmpeg), parallel ASR and diarization using a Python script, and TypeScript-based LLM refinement.
Advanced Transcript Refinement: The service includes sophisticated post-processing steps such as normalizing 'UNKNOWN' speaker segments, merging consecutive segments from the same speaker, and using an LLM to clean up and improve transcript accuracy and readability, all while strictly preserving original segment timing.
Multiple Output Formats: Transcribed results can be downloaded in various formats including JSON, plain text (TXT), and SRT subtitles, catering to different user needs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new audio transcription service, including a frontend page for uploads, backend APIs for job management, and a queue-based processing pipeline. The implementation is comprehensive, with a robust Python script for transcription and a well-designed TypeScript service for LLM refinement. My review focuses on improving the reliability of the job polling mechanism, ensuring consistency in API calls, and fixing critical bugs in the job status and results retrieval logic.

gemini-code-assist · 2025-11-10T20:39:34Z

server/api/asr.ts

+    const queueJobId = await boss.send(ASRQueue, jobData, {
+      expireInHours: 24,
+      retryLimit: 2,
+      retryDelay: 60,
+      retryBackoff: true,
+    })


The job status polling will fail because boss.getJobById is being called with the custom UUID (jobId), but pg-boss expects its own internal job ID. This will result in a 404 "Job not found" error for every status check.

To fix this, you should specify the jobId (UUID) as the job's unique ID when sending it to the queue. This can be done using the id option in boss.send. This way, getJobById will work as expected.

Suggested change

const queueJobId = await boss.send(ASRQueue, jobData, {

expireInHours: 24,

retryLimit: 2,

retryDelay: 60,

retryBackoff: true,

})

const queueJobId = await boss.send(ASRQueue, jobData, {

id: jobId, // Use the UUID as the job ID for pg-boss

expireInHours: 24,

retryLimit: 2,

retryDelay: 60,

retryBackoff: true,

})

gemini-code-assist · 2025-11-10T20:39:34Z

frontend/src/routes/_authenticated/transcription.tsx

+  const uploadFile = async (file: File): Promise<string> => {
+    const formData = new FormData()
+    formData.append("file", file)
+
+    const response = await fetch(`${api}/files/upload-simple`, {
+      method: "POST",
+      headers: {
+        Authorization: `Bearer ${localStorage.getItem("access_token")}`,
+      },
+      body: formData,
+    })
+
+    if (!response.ok) {
+      let message = "File upload failed"
+      try {
+        const data = await response.json()
+        if (data?.message) message = data.message
+      } catch {
+        // ignore JSON parse errors
+      }
+      throw new Error(message)
+    }
+
+    const data = await response.json()
+    return data.url
+  }


The API calls in this component are made using fetch directly, with the access token retrieved from localStorage. This is inconsistent with the rest of the application, which seems to use a centralized api client from hono/client configured with authFetch.

Using fetch directly bypasses any centralized logic for handling authentication, such as token refreshing, which can lead to authentication errors and maintenance issues. You should refactor uploadFile, pollJobStatus, and handleSubmit to use the api client.

For example, uploadFile can be refactored like this:

const uploadFile = async (file: File): Promise<string> => { const formData = new FormData() formData.append("file", file) const response = await api.files["upload-simple"].$post({ body: formData, }) if (!response.ok) { let message = "File upload failed" try { const data = await response.json() if (data?.message) message = data.message } catch { // ignore JSON parse errors } throw new Error(message) } const data = await response.json() return data.url }

Similarly, pollJobStatus and the transcription request in handleSubmit should also use the api client.

gemini-code-assist · 2025-11-10T20:39:34Z

server/api/asr.ts

+        const suffix = jobData.refineWithLLM ? "_refined" : "_raw"
+        const format = jobData.outputFormat || "json"


There's a mismatch in the output filename suffix for non-refined transcripts. This API looks for a file with a _raw suffix, but the asrProcessor creates a file with a _merged suffix when LLM refinement is disabled. This will prevent the API from finding and returning the results for non-refined jobs.

To fix this, the suffix logic here should align with the logic in asrProcessor.ts.

Suggested change

const suffix = jobData.refineWithLLM ? "_refined" : "_raw"

const format = jobData.outputFormat || "json"

const suffix = jobData.refineWithLLM ? "_refined" : "_merged"

const format = jobData.outputFormat || "json"

gemini-code-assist · 2025-11-10T20:39:34Z

frontend/src/routes/_authenticated/transcription.tsx

+  const pollJobStatus = useCallback(
+    (jobId: string) => {
+      clearPolling()
+
+      const intervalId = setInterval(async () => {
+        try {
+          const response = await fetch(`${api}/asr/job-status?jobId=${jobId}`, {
+            headers: {
+              Authorization: `Bearer ${localStorage.getItem("access_token")}`,
+            },
+          })
+
+          if (!response.ok) {
+            throw new Error("Failed to fetch job status")
+          }
+
+          const data = await response.json()
+
+          if (data.status === "completed") {
+            clearPolling()
+            setJobStatus("completed")
+            setResult(data.outputs ?? null)
+          } else if (data.status === "failed") {
+            clearPolling()
+            setJobStatus("failed")
+            setError(data.error || "Transcription failed")
+          } else if (data.status === "active") {
+            setJobStatus("processing")
+          }
+        } catch (err) {
+          clearPolling()
+          setJobStatus("failed")
+          setError(
+            err instanceof Error
+              ? err.message
+              : "Failed to check job status",
+          )
+        }
+      }, 3000)
+
+      pollIntervalRef.current = intervalId
+
+      // Hard timeout after 30 minutes
+      const timeoutId = setTimeout(() => {
+        clearPolling()
+        setJobStatus("failed")
+        setError("Transcription timed out. Please try again.")
+      }, 30 * 60 * 1000)
+
+      pollTimeoutRef.current = timeoutId
+    },
+    [clearPolling],
+  )


The current polling mechanism uses setInterval, which can lead to issues on unreliable networks. If an API request takes longer than the interval (3 seconds), or if the network is temporarily down, multiple requests can stack up, leading to unnecessary server load and unpredictable client-side behavior.

A more robust approach is to use a recursive setTimeout. This ensures that the next poll is only scheduled after the previous one has completed (either successfully or with an error).

Additionally, when a polling request fails, the UI shows a generic "Transcription failed" message. This is misleading, as the job on the backend might still be running. The error should clarify that it was a failure to check the status, not a failure of the job itself.

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (8)

server/api/files.ts (1)

754-776: Stream ASR files instead of buffering

Audio/video responses can be hundreds of MB. Reading them with fs.readFile loads the whole file into memory per request, which tanks concurrency and risks OOM. Stream the file (e.g. via Bun.file().stream()) so memory stays constant.

-    const fileBuffer = await fs.readFile(normalizedPath)
-
-    return c.newResponse(fileBuffer as any, 200, {
-      "Content-Type": contentType,
-      "Content-Length": fileBuffer.length.toString(),
-    })
+    const bunFile = Bun.file(normalizedPath)
+    const stat = await bunFile.stat()
+
+    return new Response(bunFile.stream(), {
+      status: 200,
+      headers: {
+        "Content-Type": contentType,
+        ...(stat?.size ? { "Content-Length": stat.size.toString() } : {}),
+      },
+    })

server/queue/asrProcessor.ts (3)

184-193: Clarify multilingual mode default behavior.

The code comment states "Default: multilingual ON unless explicitly disabled (but in your pipeline: always true)" but the implementation is data.multilingual !== false, which means multilingual is ON by default (undefined or true → multilingual enabled).

However, the comment is confusing. Consider clarifying:

-    // Default: multilingual ON unless explicitly disabled (but in your pipeline: always true)
+    // Default: multilingual mode is enabled unless explicitly set to false
     const multilingual = data.multilingual !== false
     if (multilingual) {
       args.push("--multilingual")
     }

205-207: Improve error message specificity.

The error message includes stderr but doesn't distinguish between different failure modes. Consider checking for common failure scenarios to provide more actionable error messages.

     if (result.exitCode !== 0) {
+      // Check for common error patterns
+      if (result.stderr.includes("CUDA out of memory")) {
+        throw new Error(`Transcription failed: GPU out of memory. Consider using a smaller model or CPU mode.`)
+      }
+      if (result.stderr.includes("HuggingFace") || result.stderr.includes("authentication")) {
+        throw new Error(`Transcription failed: HuggingFace authentication error. Verify HF_TOKEN is set correctly.`)
+      }
       throw new Error(`Transcription failed (exit ${result.exitCode}): ${result.stderr}`)
     }

283-295: Consider validating SRT timestamp ordering.

SRT format requires sequential, non-overlapping timestamps. If segments have timing issues (overlaps, out-of-order), the generated SRT may be invalid.

Add a validation step before writing SRT:

if (outputFormat === "srt" || outputFormat === "all") {
  const srtPath = data.outputPath + suffix + ".srt"
  
  // Validate segment ordering
  for (let i = 1; i < finalTranscript.segments.length; i++) {
    const prev = finalTranscript.segments[i - 1]
    const curr = finalTranscript.segments[i]
    if (curr.start < prev.end) {
      Logger.warn(
        { jobId: data.jobId, segmentIndex: i },
        "Overlapping segments detected in SRT output"
      )
    }
  }
  
  const srtContent = finalTranscript.segments
    .map((seg, idx) => {
      const start = formatTimestamp(seg.start).replace(".", ",")
      const end = formatTimestamp(seg.end).replace(".", ",")
      const text = `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`
      return `${idx + 1}\n${start} --> ${end}\n${text}\n`
    })
    .join("\n")
  await fs.writeFile(srtPath, srtContent, "utf-8")
  Logger.info({ jobId: data.jobId, path: srtPath }, "Saved SRT output")
}

server/asr-sd/whisper_diarization.py (4)

1-1: Make script executable if intended for direct CLI usage.

The shebang is present but the file isn't marked as executable. If this script is intended to be invoked directly (not just via python whisper_diarization.py), make it executable.

Run this command to make it executable:

chmod +x server/asr-sd/whisper_diarization.py

118-137: Consider adding timeout for future results.

While the parallel execution handles interrupts and exceptions well, there's no timeout on future.result(). If a worker thread hangs (e.g., network issue downloading model weights, deadlock), the main thread will wait indefinitely.

Add a timeout parameter:

             # Wait for completion and collect results
             for future in as_completed([whisper_future, diarization_future]):
                 try:
                     if future is whisper_future:
-                        whisper_result, whisper_time = future.result(timeout=None)
+                        whisper_result, whisper_time = future.result(timeout=7200)  # 2 hour timeout
                         print(f"   Whisper completed ({whisper_time:.2f}s)")
                     else:
-                        diarization_result, diarization_time = future.result(timeout=None)
+                        diarization_result, diarization_time = future.result(timeout=7200)
                         print(f"   Pyannote completed ({diarization_time:.2f}s)")

281-304: Consider language-aware text joining for multilingual support.

The _smart_join method uses English punctuation rules (spaces around punctuation). For languages like Chinese, Japanese, or Thai that don't use spaces, or languages with different punctuation conventions, this may produce incorrect spacing.

Consider adding language-aware joining:

def _smart_join(self, prev_text: str, token: str, language: str = "en") -> str:
    """Smart spacing around punctuation when joining tokens."""
    if not prev_text:
        return token

    # Languages that don't use spaces between words
    no_space_languages = {"zh", "ja", "th", "my", "km"}
    if language in no_space_languages:
        return prev_text + token

    # English punctuation rules
    closing_punct = {",", ".", "!", "?", ":", ";", ")", "]", "}", "'"}
    opening_punct = {"(", "[", "{", "'"}

    if token in closing_punct:
        return prev_text + token
    if prev_text[-1] in opening_punct:
        return prev_text + token
    if token == "'" and prev_text and prev_text[-1].isalnum():
        return prev_text + token

    return prev_text + " " + token

Note: This would require tracking segment language throughout the pipeline.

404-413: Use specific exception types instead of bare Exception.

Catching bare Exception can hide unexpected errors and make debugging difficult. It's better to catch specific exceptions that are expected during initialization.

     # Initialize pipeline
     try:
         pipeline = ParallelWhisperDiarization(
             whisper_model=args.whisper_model,
             diarization_model=args.diarization_model,
             device=args.device,
             hf_token=args.hf_token,
         )
-    except Exception as e:
+    except (OSError, RuntimeError, ValueError) as e:
         print(f"\nError initializing pipeline: {e}")
         return
+    except KeyboardInterrupt:
+        print("\nInitialization cancelled by user")
+        return

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b01484 and d990015.

📒 Files selected for processing (9)

frontend/src/components/Sidebar.tsx (13 hunks)
frontend/src/routes/_authenticated/transcription.tsx (1 hunks)
server/api/asr.ts (1 hunks)
server/api/files.ts (2 hunks)
server/asr-sd/whisper_diarization.py (1 hunks)
server/queue/asrProcessor.ts (1 hunks)
server/queue/index.ts (4 hunks)
server/server.ts (4 hunks)
server/services/transcriptRefinement.ts (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📚 Learning: 2025-06-24T08:53:33.051Z

Learnt from: rahul1841
Repo: xynehq/xyne PR: 585
File: frontend/src/components/HistoryModal.tsx:341-351
Timestamp: 2025-06-24T08:53:33.051Z
Learning: In frontend/src/components/HistoryModal.tsx, bookmark buttons don't need the CLASS_NAMES.BOOKMARK_BUTTON class because the HistoryModal is rendered inside the sidebar area and is already protected by the sidebar's click outside handler logic.

Applied to files:

frontend/src/components/Sidebar.tsx

📚 Learning: 2025-09-16T08:57:58.762Z

Learnt from: rahul1841
Repo: xynehq/xyne PR: 843
File: server/server.ts:996-996
Timestamp: 2025-09-16T08:57:58.762Z
Learning: The DownloadFileApi handler in server/api/knowledgeBase.ts properly uses the clId route parameter to scope and validate file access within the correct collection.

Applied to files:

server/api/files.ts

📚 Learning: 2025-05-28T10:47:41.020Z

Learnt from: naSim087
Repo: xynehq/xyne PR: 484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

server/api/files.ts

📚 Learning: 2025-10-07T05:35:17.385Z

Learnt from: Himanshvarma
Repo: xynehq/xyne PR: 1046
File: server/services/fileProcessor.ts:118-131
Timestamp: 2025-10-07T05:35:17.385Z
Learning: In server/services/fileProcessor.ts, SheetProcessingResult objects intentionally use empty chunks_map and image_chunks_map arrays because each sheet is ingested as a separate Vespa document with sheet-level metadata (sheetName, sheetIndex, totalSheets, docId), making per-chunk metadata redundant for sheet files.

Applied to files:

server/api/files.ts

📚 Learning: 2025-06-10T05:40:04.427Z

Learnt from: naSim087
Repo: xynehq/xyne PR: 525
File: frontend/src/routes/_authenticated/admin/integrations/slack.tsx:141-148
Timestamp: 2025-06-10T05:40:04.427Z
Learning: In frontend/src/routes/_authenticated/admin/integrations/slack.tsx, the ConnectAction enum and related connectAction state (lines 141-148, 469-471) are intentionally kept for future development, even though they appear unused in the current implementation.

Applied to files:

frontend/src/routes/_authenticated/transcription.tsx

🧬 Code graph analysis (8)

server/api/asr.ts (3)

server/logger/index.ts (2)

getLogger (36-93)

Subsystem (15-15)

server/queue/asrProcessor.ts (1)

TranscribeJobData (35-50)

server/queue/index.ts (1)

ASRQueue (59-59)

server/queue/asrProcessor.ts (2)

server/logger/index.ts (2)

getLogger (36-93)

Subsystem (15-15)

server/services/transcriptRefinement.ts (3)

TranscriptResult (26-41)

refineTranscript (416-474)

mergeConsecutiveSegments (142-195)

server/queue/index.ts (2)

server/queue/boss.ts (1)

boss (4-7)

server/queue/asrProcessor.ts (1)

handleASRJob (305-327)

server/services/transcriptRefinement.ts (3)

server/logger/index.ts (2)

getLogger (36-93)

Subsystem (15-15)

server/ai/types.ts (1)

LLMProvider (340-350)

server/ai/provider/index.ts (1)

getProviderByModel (369-419)

server/server.ts (2)

server/api/files.ts (2)

handleSimpleFileUpload (675-723)

serveASRFile (726-784)

server/api/asr.ts (4)

transcribeAudioSchema (20-33)

TranscribeAudioApi (40-151)

getJobStatusSchema (35-37)

GetJobStatusApi (154-230)

frontend/src/components/Sidebar.tsx (2)

frontend/src/lib/utils.ts (1)

cn (5-7)

frontend/src/components/Tooltip.tsx (1)

Tip (3-28)

server/api/files.ts (1)

server/logger/index.ts (2)

getLogger (36-93)

Subsystem (15-15)

frontend/src/routes/_authenticated/transcription.tsx (2)

frontend/src/api.ts (1)

api (5-5)

frontend/src/components/Sidebar.tsx (1)

Sidebar (41-439)

🪛 GitHub Actions: TypeScript Build Check

frontend/src/components/Sidebar.tsx

frontend/src/routes/_authenticated/transcription.tsx

[error] 31-31: bunx tsc -b failed: Argument of type '"/_authenticated/transcription"' is not assignable to parameter of type 'keyof FileRoutesByPath'.

🪛 Ruff (0.14.4)

server/asr-sd/whisper_diarization.py

1-1: Shebang is present but file is not executable

(EXE001)

55-55: Use raise without specifying exception name

Remove exception name

(TRY201)

61-63: try-except-pass detected, consider logging the exception

(S110)

61-61: Do not catch blind exception: Exception

(BLE001)

145-145: Avoid specifying long messages outside the exception class

(TRY003)

300-300: Possible hardcoded password assigned to: "token"

(S105)

411-411: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (12)

server/queue/asrProcessor.ts (6)
17-17: Verify process.cwd() returns repository root in all execution contexts.

The ASR_SD_DIR path assumes process.cwd() returns the repository root. In some deployment scenarios (systemd services, Docker containers with custom working directories, or when the process is started from a different directory), this might not hold true, leading to file-not-found errors.

Consider using __dirname or an explicit environment variable for the project root:
-const ASR_SD_DIR = path.join(process.cwd(), "asr-sd")
+const ASR_SD_DIR = path.join(__dirname, "../..", "asr-sd")
Or rely on an environment variable:
const PROJECT_ROOT = process.env.PROJECT_ROOT || process.cwd()
const ASR_SD_DIR = path.join(PROJECT_ROOT, "asr-sd")
21-29: LGTM!

The timestamp formatting logic correctly converts seconds to SRT format (HH:MM:SS.mmm) with proper padding.

31-52: LGTM!

The type definitions are well-structured with clear interfaces and appropriate use of optional fields. The enum pattern allows for future extensibility.

171-173: Potential issue with audio file extension replacement.

The regex replace(/\.[^.]+$/, "_converted.wav") assumes the input file has an extension. If data.audioPath has no extension (e.g., just "audiofile"), the converted path will be "audiofile_converted.wav", which is fine. However, if it has multiple dots (e.g., "my.audio.file.mp3"), it works correctly. The logic appears sound.

228-251: LGTM - Good fallback handling for LLM refinement.

The code properly checks for LLM configuration and gracefully falls back to raw transcript if refinement fails. The warning messages are clear and informative.

305-327: LGTM!

The job router is well-structured with proper type narrowing, error handling, and extensibility for future job types. The switch statement with a default case ensures unknown job types are caught.
server/asr-sd/whisper_diarization.py (6)

107-142: LGTM - Excellent parallel execution with proper cleanup.

The parallel execution implementation is robust with:

Proper resource cleanup via executor.shutdown

KeyboardInterrupt handling for graceful termination

Exception handling with future cancellation

Clear progress logging

180-204: LGTM!

Both worker methods are correctly implemented with torch.no_grad() for inference optimization and accurate timing measurements.

206-262: LGTM - Robust merging logic with good fallbacks.

The merge implementation correctly handles edge cases (missing word timestamps, segment-level fallback) and produces a well-structured result with speaker assignments.

264-279: LGTM!

The deterministic speaker assignment uses a sensible distance-based tiebreaking strategy when a timestamp overlaps multiple speaker segments.

306-338: LGTM!

The speaker grouping logic correctly consolidates consecutive words from the same speaker while maintaining all necessary metadata (timestamps, word list).

341-463: LGTM - Well-structured CLI with comprehensive output.

The CLI implementation is thorough with:

Clear argument definitions with defaults

Input validation

Detailed progress logging

Comprehensive timing metrics

Informative summary output

coderabbitai · 2025-11-10T20:43:05Z

frontend/src/routes/_authenticated/transcription.tsx

+export const Route = createFileRoute("/_authenticated/transcription")({
+  component: TranscriptionPage,
+  errorComponent: () => <div>Error loading transcription page</div>,
+  loader: async ({ context }) => {
+    return {
+      user: context.user,
+      workspace: context.workspace,
+    }
+  },
+})


⚠️ Potential issue | 🔴 Critical

Route path must be /transcription, not /_authenticated/transcription.

Our router typings reject "/_authenticated/transcription" (see failing bunx tsc -b), which means the page never registers under the intended /transcription URL. As a result the sidebar link can’t type-check or hit this route. Please switch the path to the actual URL segment so the generated FileRoutesByPath includes /transcription.

Apply this diff:

-export const Route = createFileRoute("/_authenticated/transcription")({ +export const Route = createFileRoute("/transcription")({

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const Route = createFileRoute("/_authenticated/transcription")({

component: TranscriptionPage,

errorComponent: () => <div>Error loading transcription page</div>,

loader: async ({ context }) => {

return {

user: context.user,

workspace: context.workspace,

}

},

})

export const Route = createFileRoute("/transcription")({

component: TranscriptionPage,

errorComponent: () => <div>Error loading transcription page</div>,

loader: async ({ context }) => {

return {

user: context.user,

workspace: context.workspace,

}

},

})

🧰 Tools

🪛 GitHub Actions: TypeScript Build Check

[error] 31-31: bunx tsc -b failed: Argument of type '"/_authenticated/transcription"' is not assignable to parameter of type 'keyof FileRoutesByPath'.

🤖 Prompt for AI Agents

In frontend/src/routes/_authenticated/transcription.tsx around lines 31 to 40, the route is currently registered with the path "/_authenticated/transcription" but should be "/transcription"; update the createFileRoute(...) call to use "/transcription" so the compiled router typings include the correct FileRoutesByPath entry and the page registers at the intended URL, then run type-check to verify the change.

coderabbitai · 2025-11-10T20:43:05Z

frontend/src/routes/_authenticated/transcription.tsx

+    const response = await fetch(`${api}/files/upload-simple`, {
+      method: "POST",
+      headers: {
+        Authorization: `Bearer ${localStorage.getItem("access_token")}`,
+      },
+      body: formData,
+    })
+
+    if (!response.ok) {
+      let message = "File upload failed"
+      try {
+        const data = await response.json()
+        if (data?.message) message = data.message
+      } catch {
+        // ignore JSON parse errors
+      }
+      throw new Error(message)
+    }
+
+    const data = await response.json()
+    return data.url
+  }
+
+  const pollJobStatus = useCallback(
+    (jobId: string) => {
+      clearPolling()
+
+      const intervalId = setInterval(async () => {
+        try {
+          const response = await fetch(`${api}/asr/job-status?jobId=${jobId}`, {
+            headers: {
+              Authorization: `Bearer ${localStorage.getItem("access_token")}`,
+            },
+          })
+
+          if (!response.ok) {
+            throw new Error("Failed to fetch job status")
+          }
+
+          const data = await response.json()
+
+          if (data.status === "completed") {
+            clearPolling()
+            setJobStatus("completed")
+            setResult(data.outputs ?? null)
+          } else if (data.status === "failed") {
+            clearPolling()
+            setJobStatus("failed")
+            setError(data.error || "Transcription failed")
+          } else if (data.status === "active") {
+            setJobStatus("processing")
+          }
+        } catch (err) {
+          clearPolling()
+          setJobStatus("failed")
+          setError(
+            err instanceof Error
+              ? err.message
+              : "Failed to check job status",
+          )
+        }
+      }, 3000)
+
+      pollIntervalRef.current = intervalId
+
+      // Hard timeout after 30 minutes
+      const timeoutId = setTimeout(() => {
+        clearPolling()
+        setJobStatus("failed")
+        setError("Transcription timed out. Please try again.")
+      }, 30 * 60 * 1000)
+
+      pollTimeoutRef.current = timeoutId
+    },
+    [clearPolling],
+  )
+
+  const handleSubmit = async () => {
+    if (!selectedFile) {
+      setError("Please select a file first")
+      return
+    }
+
+    try {
+      clearPolling()
+      setJobStatus("uploading")
+      setError(null)
+      setResult(null)
+      setJobId(null)
+      setUploadProgress(0)
+
+      // Upload file
+      const audioUrl = await uploadFile(selectedFile)
+      setUploadProgress(50)
+
+      // Start transcription job
+      setJobStatus("queued")
+      const response = await fetch(`${api}/asr/transcribe`, {
+        method: "POST",
+        headers: {
+          "Content-Type": "application/json",
+          Authorization: `Bearer ${localStorage.getItem("access_token")}`,
+        },
+        body: JSON.stringify({
+          audioUrl,
+          whisperModel,
+          refineWithLLM: true,
+          outputFormat,
+          numSpeakers: numSpeakers || undefined,
+          multilingual: true,
+        }),
+      })


⚠️ Potential issue | 🔴 Critical

Fix API calls: api is not a string base URL.

api from @/api is the typed client object; interpolating it in a template literal coerces it to "[object Object]", so every fetch goes to "[object Object]/files/upload-simple" / "[object Object]/asr/*". That breaks uploads, transcription kicks, and status polling entirely. Please point these fetches at the real /api/v1/... endpoints (or use the typed client helpers) in all three spots.

One possible fix:

- const response = await fetch(`${api}/files/upload-simple`, { + const response = await fetch("/api/v1/files/upload-simple", { @@ - const response = await fetch(`${api}/asr/job-status?jobId=${jobId}`, { + const response = await fetch( + `/api/v1/asr/job-status?jobId=${jobId}`, + { @@ - const response = await fetch(`${api}/asr/transcribe`, { + const response = await fetch("/api/v1/asr/transcribe", {

Make sure to apply the same correction anywhere else this pattern appears.

coderabbitai · 2025-11-10T20:43:05Z

server/api/asr.ts

+    const job = await boss.getJobById(ASRQueue, jobId)
+
+    if (!job) {
+      throw new HTTPException(404, {
+        message: "Job not found",
+      })
+    }
+
+    const tempDir = path.join(ASR_SD_DIR, "temp", jobId)
+    let outputs: Record<string, any> = {}
+
+    // If job is completed, read output files
+    if (job.state === "completed") {
+      try {
+        const jobData = job.data as TranscribeJobData
+        const suffix = jobData.refineWithLLM ? "_refined" : "_raw"
+        const format = jobData.outputFormat || "json"
+
+        if (format === "json" || format === "all") {
+          const jsonPath = path.join(tempDir, `transcription${suffix}.json`)
+          try {
+            const jsonContent = await fs.readFile(jsonPath, "utf-8")
+            outputs.json = JSON.parse(jsonContent)
+          } catch (error) {
+            Logger.warn({ error, jsonPath }, "Failed to read JSON output")
+          }
+        }
+
+        if (format === "txt" || format === "all") {
+          const txtPath = path.join(tempDir, `transcription${suffix}.txt`)
+          try {
+            outputs.txt = await fs.readFile(txtPath, "utf-8")
+          } catch (error) {
+            Logger.warn({ error, txtPath }, "Failed to read TXT output")
+          }
+        }
+
+        if (format === "srt" || format === "all") {
+          const srtPath = path.join(tempDir, `transcription${suffix}.srt`)
+          try {
+            outputs.srt = await fs.readFile(srtPath, "utf-8")
+          } catch (error) {
+            Logger.warn({ error, srtPath }, "Failed to read SRT output")
+          }
+        }
+      } catch (error) {
+        Logger.warn({ error }, "Error reading output files")
+      }
+    }
+
+    return c.json({
+      success: true,
+      jobId,
+      status: job.state,
+      createdOn: job.createdOn,
+      startedOn: job.startedOn,
+      completedOn: job.completedOn,
+      outputs: Object.keys(outputs).length > 0 ? outputs : undefined,


⚠️ Potential issue | 🔴 Critical

Job status lookup always returns 404

boss.getJobById expects the pg-boss job id that boss.send returns. Here we query with our own UUID (jobId) that only lives inside the payload, so pg-boss never finds the job and the endpoint 404s even when work is running or finished.

Use the actual boss id (queueJobId) for lookups (or publish the job with that UUID). Without this, clients can never fetch status or outputs.

-export const getJobStatusSchema = z.object({ - jobId: z.string({ message: "Invalid job ID" }).uuid(), -}) +export const getJobStatusSchema = z.object({ + queueJobId: z.string({ message: "Invalid job ID" }).uuid(), +}) ... - const { jobId } = c.req.valid("query" as never) as z.infer<typeof getJobStatusSchema> + const { queueJobId } = c.req.valid("query" as never) as z.infer<typeof getJobStatusSchema> ... - const job = await boss.getJobById(ASRQueue, jobId) + const job = await boss.getJobById(ASRQueue, queueJobId)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

server/api/asr.ts lines 161-218: the code calls boss.getJobById(jobId) using our internal UUID (jobId) but pg-boss requires the pg-boss job id, so replace this lookup to use the actual boss job id (queueJobId) that you must persist when enqueuing the job; specifically, when you create/send the job, store the returned boss id into the job payload/DB (e.g., data.queueJobId), and here read that mapping (e.g., fetch your job record or job metadata by our UUID and extract queueJobId) then call boss.getJobById(queueJobId); if queueJobId is missing return 404 with a clear message. Also ensure job creation code includes queueJobId in the stored job record so future lookups succeed.

coderabbitai · 2025-11-10T20:43:05Z

server/asr-sd/whisper_diarization.py

+        if self.device == "cuda" and torch.cuda.is_available():
+            try:
+                self.diarization_pipeline.to(torch.device("cuda"))
+            except Exception:
+                # Not all components may support .to("cuda"); silently fall back
+                pass


⚠️ Potential issue | 🟡 Minor

Log CUDA migration failures instead of silently ignoring.

The try-except block silently catches failures when moving the diarization pipeline to CUDA. This makes debugging difficult when GPU acceleration isn't working as expected.

Apply this diff:

if self.device == "cuda" and torch.cuda.is_available(): try: self.diarization_pipeline.to(torch.device("cuda")) + print("Diarization pipeline moved to CUDA") except Exception: - # Not all components may support .to("cuda"); silently fall back - pass + # Not all components may support .to("cuda"); fall back to CPU + print("Warning: Could not move diarization pipeline to CUDA, using CPU")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.device == "cuda" and torch.cuda.is_available():

try:

self.diarization_pipeline.to(torch.device("cuda"))

except Exception:

# Not all components may support .to("cuda"); silently fall back

pass

if self.device == "cuda" and torch.cuda.is_available():

try:

self.diarization_pipeline.to(torch.device("cuda"))

print("Diarization pipeline moved to CUDA")

except Exception:

# Not all components may support .to("cuda"); fall back to CPU

print("Warning: Could not move diarization pipeline to CUDA, using CPU")

🧰 Tools

🪛 Ruff (0.14.4)

61-63: try-except-pass detected, consider logging the exception

(S110)

61-61: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In server/asr-sd/whisper_diarization.py around lines 58 to 63, the try-except that moves the diarization pipeline to CUDA currently swallows all exceptions; change it to catch Exception as e and log the failure (e.g., logger.error("Failed to move diarization pipeline to CUDA", exc_info=True) or logger.exception(...)) before falling back to CPU so GPU migration failures are recorded for debugging.

coderabbitai · 2025-11-10T20:43:05Z

server/asr-sd/whisper_diarization.py

+    # Transcribe with parallel processing
+    result = pipeline.transcribe(
+        args.audio,
+        language=args.language,
+        num_speakers=args.num_speakers,
+        min_speakers=args.min_speakers,
+        max_speakers=args.max_speakers,
+        multilingual=args.multilingual,
+    )


⚠️ Potential issue | 🟠 Major

Add error handling for the transcribe call.

The transcribe call could fail for various reasons (audio file corrupt, out of memory, model errors), but there's no try-except block around it. This will cause an unhandled exception and ugly stack trace instead of a clean error message.

+ # Transcribe with parallel processing + try: result = pipeline.transcribe( args.audio, language=args.language, num_speakers=args.num_speakers, min_speakers=args.min_speakers, max_speakers=args.max_speakers, multilingual=args.multilingual, ) + except KeyboardInterrupt: + print("\nTranscription cancelled by user") + return + except Exception as e: + print(f"\nError during transcription: {e}") + return

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In server/asr-sd/whisper_diarization.py around lines 415 to 423, the pipeline.transcribe(...) call is unprotected and can raise exceptions (corrupt audio, OOM, model errors); wrap this call in a try-except that catches broad exceptions (Exception), log a clear error message including the exception details and the input file/parameters, and either re-raise a custom exception or exit/return a controlled error value so the program doesn't crash with an ugly stack trace; ensure the logger/error handler includes the exception type and message and optionally the stacktrace for debugging.

coderabbitai · 2025-11-10T20:43:06Z

server/queue/asrProcessor.ts

+async function runPythonScript(
+  scriptName: string,
+  args: string[],
+): Promise<{ stdout: string; stderr: string; exitCode: number }> {
+  return new Promise((resolve, reject) => {
+    const scriptPath = path.join(ASR_SD_DIR, scriptName)
+    const pythonProcess = spawn(PYTHON_EXECUTABLE, [scriptPath, ...args])
+
+    let stdout = ""
+    let stderr = ""
+
+    pythonProcess.stdout.on("data", (data) => {
+      const output = data.toString()
+      stdout += output
+      Logger.info({ output }, "Python stdout")
+    })
+
+    pythonProcess.stderr.on("data", (data) => {
+      const output = data.toString()
+      stderr += output
+      Logger.warn({ output }, "Python stderr")
+    })
+
+    pythonProcess.on("close", (code, signal) => {
+      const exitCode = code ?? -1 // if null (killed by signal), treat as failure
+      Logger.info({ exitCode, signal }, "Python process exited")
+      resolve({
+        stdout,
+        stderr,
+        exitCode,
+      })
+    })
+
+    pythonProcess.on("error", (error) => {
+      Logger.error({ error }, "Failed to start Python process")
+      reject(error)
+    })
+  })
+}


⚠️ Potential issue | 🟠 Major

Add timeout mechanism for Python process execution.

The Python process has no timeout, which could cause the job to hang indefinitely if the script freezes, encounters an infinite loop, or waits for unavailable resources. This creates a reliability risk where jobs may never complete or fail gracefully.

Add a timeout with cleanup:

async function runPythonScript( scriptName: string, args: string[], + timeoutMs: number = 3600000, // 1 hour default ): Promise<{ stdout: string; stderr: string; exitCode: number }> { return new Promise((resolve, reject) => { const scriptPath = path.join(ASR_SD_DIR, scriptName) const pythonProcess = spawn(PYTHON_EXECUTABLE, [scriptPath, ...args]) let stdout = "" let stderr = "" + let timeoutHandle: NodeJS.Timeout | null = null + + // Set up timeout + if (timeoutMs > 0) { + timeoutHandle = setTimeout(() => { + Logger.warn({ scriptName, timeoutMs }, "Python process timeout, killing") + pythonProcess.kill("SIGTERM") + setTimeout(() => pythonProcess.kill("SIGKILL"), 5000) // Force kill after 5s + }, timeoutMs) + } pythonProcess.stdout.on("data", (data) => { const output = data.toString() stdout += output Logger.info({ output }, "Python stdout") }) pythonProcess.stderr.on("data", (data) => { const output = data.toString() stderr += output Logger.warn({ output }, "Python stderr") }) pythonProcess.on("close", (code, signal) => { + if (timeoutHandle) clearTimeout(timeoutHandle) const exitCode = code ?? -1 Logger.info({ exitCode, signal }, "Python process exited") resolve({ stdout, stderr, exitCode, }) }) pythonProcess.on("error", (error) => { + if (timeoutHandle) clearTimeout(timeoutHandle) Logger.error({ error }, "Failed to start Python process") reject(error) }) }) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async function runPythonScript(

scriptName: string,

args: string[],

): Promise<{ stdout: string; stderr: string; exitCode: number }> {

return new Promise((resolve, reject) => {

const scriptPath = path.join(ASR_SD_DIR, scriptName)

const pythonProcess = spawn(PYTHON_EXECUTABLE, [scriptPath, ...args])

let stdout = ""

let stderr = ""

pythonProcess.stdout.on("data", (data) => {

const output = data.toString()

stdout += output

Logger.info({ output }, "Python stdout")

})

pythonProcess.stderr.on("data", (data) => {

const output = data.toString()

stderr += output

Logger.warn({ output }, "Python stderr")

})

pythonProcess.on("close", (code, signal) => {

const exitCode = code ?? -1 // if null (killed by signal), treat as failure

Logger.info({ exitCode, signal }, "Python process exited")

resolve({

stdout,

stderr,

exitCode,

})

})

pythonProcess.on("error", (error) => {

Logger.error({ error }, "Failed to start Python process")

reject(error)

})

})

}

async function runPythonScript(

scriptName: string,

args: string[],

timeoutMs: number = 3600000, // 1 hour default

): Promise<{ stdout: string; stderr: string; exitCode: number }> {

return new Promise((resolve, reject) => {

const scriptPath = path.join(ASR_SD_DIR, scriptName)

const pythonProcess = spawn(PYTHON_EXECUTABLE, [scriptPath, ...args])

let stdout = ""

let stderr = ""

let timeoutHandle: NodeJS.Timeout | null = null

// Set up timeout

if (timeoutMs > 0) {

timeoutHandle = setTimeout(() => {

Logger.warn({ scriptName, timeoutMs }, "Python process timeout, killing")

pythonProcess.kill("SIGTERM")

setTimeout(() => pythonProcess.kill("SIGKILL"), 5000) // Force kill after 5s

}, timeoutMs)

}

pythonProcess.stdout.on("data", (data) => {

const output = data.toString()

stdout += output

Logger.info({ output }, "Python stdout")

})

pythonProcess.stderr.on("data", (data) => {

const output = data.toString()

stderr += output

Logger.warn({ output }, "Python stderr")

})

pythonProcess.on("close", (code, signal) => {

if (timeoutHandle) clearTimeout(timeoutHandle)

const exitCode = code ?? -1 // if null (killed by signal), treat as failure

Logger.info({ exitCode, signal }, "Python process exited")

resolve({

stdout,

stderr,

exitCode,

})

})

pythonProcess.on("error", (error) => {

if (timeoutHandle) clearTimeout(timeoutHandle)

Logger.error({ error }, "Failed to start Python process")

reject(error)

})

})

}

🤖 Prompt for AI Agents

In server/queue/asrProcessor.ts around lines 55 to 93, the runPythonScript function has no timeout and can hang indefinitely; add a configurable timeout (e.g., parameter or constant like PYTHON_TIMEOUT_MS) and implement a timer that, after the timeout, kills the spawned pythonProcess (process.kill), rejects the promise with a timeout error, and logs the timeout; ensure you clear the timeout on normal process close or error, and remove or guard duplicate event handlers so the promise is resolved/rejected only once (cleanup listeners after close/error/timeout) to avoid memory leaks or double resolutions.

coderabbitai · 2025-11-10T20:43:06Z

server/queue/asrProcessor.ts

+async function downloadFile(url: string, outputPath: string): Promise<void> {
+  const response = await fetch(url)
+  if (!response.ok) {
+    throw new Error(`Failed to download file: ${response.status} ${response.statusText}`)
+  }
+  const buffer = await response.arrayBuffer()
+  await fs.writeFile(outputPath, Buffer.from(buffer))
+}


⚠️ Potential issue | 🟠 Major

Add timeout and stream-based download for reliability and memory efficiency.

The current implementation has several reliability and performance concerns:

No timeout: The fetch call could hang indefinitely on slow/unresponsive servers

Memory inefficient: Loading the entire audio file into memory before writing is problematic for large files (multi-GB audio files are common in transcription workflows)

No retry logic: Transient network failures will cause job failure without retry

Refactor to use streaming with timeout and basic retry:

-async function downloadFile(url: string, outputPath: string): Promise<void> { - const response = await fetch(url) +async function downloadFile(url: string, outputPath: string, timeoutMs = 300000): Promise<void> { + const controller = new AbortController() + const timeoutHandle = setTimeout(() => controller.abort(), timeoutMs) + + try { + const response = await fetch(url, { signal: controller.signal }) + if (!response.ok) { throw new Error(`Failed to download file: ${response.status} ${response.statusText}`) } - const buffer = await response.arrayBuffer() - await fs.writeFile(outputPath, Buffer.from(buffer)) + + if (!response.body) { + throw new Error("Response body is null") + } + + // Stream directly to file + const fileStream = require('fs').createWriteStream(outputPath) + const reader = response.body.getReader() + + while (true) { + const { done, value } = await reader.read() + if (done) break + fileStream.write(Buffer.from(value)) + } + + await new Promise((resolve, reject) => { + fileStream.end() + fileStream.on('finish', resolve) + fileStream.on('error', reject) + }) + } finally { + clearTimeout(timeoutHandle) + } }

coderabbitai · 2025-11-10T20:43:06Z

server/queue/asrProcessor.ts

+async function convertAudioToOptimalFormat(
+  inputPath: string,
+  outputPath: string
+): Promise<void> {
+  Logger.info(
+    { inputPath, outputPath },
+    "Converting audio to optimal format (mono, 16kHz, 16-bit PCM WAV)",
+  )
+
+  const ffmpegArgs = [
+    "-i",
+    inputPath, // Input file
+    "-ac",
+    "1", // Convert to mono (1 audio channel)
+    "-ar",
+    "16000", // Sample rate: 16 kHz
+    "-sample_fmt",
+    "s16", // 16-bit PCM
+    "-acodec",
+    "pcm_s16le", // PCM 16-bit little-endian codec
+    "-y", // Overwrite output file if exists
+    outputPath,
+  ]
+
+  return new Promise((resolve, reject) => {
+    const ffmpegProcess = spawn("ffmpeg", ffmpegArgs)
+
+    let stderr = ""
+
+    ffmpegProcess.stderr.on("data", (data) => {
+      stderr += data.toString()
+    })
+
+    ffmpegProcess.on("close", (code) => {
+      if (code === 0) {
+        Logger.info({ outputPath }, "Audio conversion completed successfully")
+        resolve()
+      } else {
+        Logger.error({ exitCode: code, stderr }, "FFmpeg conversion failed")
+        reject(new Error(`FFmpeg conversion failed with code ${code}: ${stderr}`))
+      }
+    })
+
+    ffmpegProcess.on("error", (error) => {
+      Logger.error({ error }, "Failed to start FFmpeg process")
+      reject(new Error(`Failed to start FFmpeg: ${error.message}`))
+    })
+  })
+}


⚠️ Potential issue | 🟠 Major

Add timeout and cleanup on conversion failure.

Similar to the Python script execution, the FFmpeg process lacks a timeout mechanism. Additionally, if conversion fails partway through, a partial/corrupt output file may remain.

Apply similar timeout pattern and add cleanup:

async function convertAudioToOptimalFormat( inputPath: string, - outputPath: string + outputPath: string, + timeoutMs: number = 600000 // 10 minutes default ): Promise<void> { Logger.info( { inputPath, outputPath }, "Converting audio to optimal format (mono, 16kHz, 16-bit PCM WAV)", ) const ffmpegArgs = [ "-i", inputPath, "-ac", "1", "-ar", "16000", "-sample_fmt", "s16", "-acodec", "pcm_s16le", "-y", outputPath, ] return new Promise((resolve, reject) => { const ffmpegProcess = spawn("ffmpeg", ffmpegArgs) + let timeoutHandle: NodeJS.Timeout | null = null + if (timeoutMs > 0) { + timeoutHandle = setTimeout(() => { + Logger.warn({ inputPath, timeoutMs }, "FFmpeg timeout, killing process") + ffmpegProcess.kill("SIGTERM") + setTimeout(() => ffmpegProcess.kill("SIGKILL"), 5000) + }, timeoutMs) + } let stderr = "" ffmpegProcess.stderr.on("data", (data) => { stderr += data.toString() }) ffmpegProcess.on("close", (code) => { + if (timeoutHandle) clearTimeout(timeoutHandle) if (code === 0) { Logger.info({ outputPath }, "Audio conversion completed successfully") resolve() } else { Logger.error({ exitCode: code, stderr }, "FFmpeg conversion failed") + // Clean up partial output file + fs.unlink(outputPath).catch(() => {}) reject(new Error(`FFmpeg conversion failed with code ${code}: ${stderr}`)) } }) ffmpegProcess.on("error", (error) => { + if (timeoutHandle) clearTimeout(timeoutHandle) Logger.error({ error }, "Failed to start FFmpeg process") + fs.unlink(outputPath).catch(() => {}) reject(new Error(`Failed to start FFmpeg: ${error.message}`)) }) }) }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In server/queue/asrProcessor.ts around lines 107 to 155, the FFmpeg conversion currently has no timeout and leaves partial/corrupt output files on failure; add a timeout similar to the Python pattern that will kill the ffmpeg process if it exceeds a configured duration (e.g., 30s), clear the timer on normal close, and on any non-zero exit, error, or timeout attempt to kill the child process and unlink/remove the outputPath to clean up partial files; also ensure event listeners are cleaned (clearTimeout and remove listeners) before resolving/rejecting and include the stderr/timeout reason in the rejected Error so callers get useful context.

coderabbitai · 2025-11-10T20:43:06Z

server/queue/asrProcessor.ts

+async function handleTranscribeJob(
+  boss: PgBoss, // currently unused, but kept for future extensions
+  job: PgBoss.Job<TranscribeJobData>,
+): Promise<void> {
+  const data = job.data
+  Logger.info({ jobId: data.jobId }, "Starting transcription job")
+
+  try {
+    // Download audio file
+    Logger.info({ audioUrl: data.audioUrl, audioPath: data.audioPath }, "Downloading audio file")
+    await downloadFile(data.audioUrl, data.audioPath)
+
+    // Convert audio to optimal format (mono, 16kHz, 16-bit PCM WAV)
+    const convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav")
+    Logger.info({ convertedAudioPath }, "Using converted audio path")
+    await convertAudioToOptimalFormat(data.audioPath, convertedAudioPath)
+
+    // Build command arguments for simplified Python script (ASR + Diarization only)
+    const args = [
+      convertedAudioPath, // Use converted audio
+      "--whisper-model",
+      data.whisperModel || "turbo",
+      "--output",
+      data.outputPath + "_raw.json", // Output raw results
+    ]
+
+    if (data.language) args.push("--language", data.language)
+    if (data.numSpeakers) args.push("--num-speakers", data.numSpeakers.toString())
+    if (data.minSpeakers) args.push("--min-speakers", data.minSpeakers.toString())
+    if (data.maxSpeakers) args.push("--max-speakers", data.maxSpeakers.toString())
+
+    // Default: multilingual ON unless explicitly disabled (but in your pipeline: always true)
+    const multilingual = data.multilingual !== false
+    if (multilingual) {
+      args.push("--multilingual")
+    }
+
+    // Use HF_TOKEN from environment or from job data
+    const hfToken = process.env.HF_TOKEN || data.hfToken
+    if (hfToken) {
+      args.push("--hf-token", hfToken)
+    }
+
+    // Run simplified Python script (ASR + Diarization only)
+    Logger.info({ args }, "Running whisper_diarization.py (ASR + Diarization)")
+    const result = await runPythonScript("whisper_diarization.py", args)
+
+    if (result.exitCode !== 0) {
+      throw new Error(`Transcription failed (exit ${result.exitCode}): ${result.stderr}`)
+    }
+
+    Logger.info(
+      { jobId: data.jobId, rawOutputPath: data.outputPath + "_raw.json" },
+      "ASR + Diarization completed, starting TypeScript post-processing",
+    )
+
+    // Read the raw transcript JSON output from Python
+    const rawJsonPath = data.outputPath + "_raw.json"
+    const rawTranscriptData = await fs.readFile(rawJsonPath, "utf-8")
+    const rawTranscript: TranscriptResult = JSON.parse(rawTranscriptData)
+
+    // Decide whether to refine with LLM (default true if undefined)
+    const shouldRefine = data.refineWithLLM !== false
+
+    let finalTranscript: TranscriptResult = rawTranscript
+
+    if (shouldRefine) {
+      Logger.info({ jobId: data.jobId }, "Starting LLM refinement in TypeScript")
+
+      // Check which LLM provider is configured
+      const { defaultBestModel } = config
+
+      if (!defaultBestModel) {
+        Logger.warn(
+          {
+            jobId: data.jobId,
+          },
+          "No LLM provider configured for ASR refinement. Skipping refinement and using raw transcript.",
+        )
+      } else {
+        try {
+          // Run TypeScript refinement (works with any configured LLM provider)
+          finalTranscript = await refineTranscript(rawTranscript, {
+            maxTokens: 200000,
+          })
+          Logger.info({ jobId: data.jobId }, "LLM refinement completed successfully")
+        } catch (error) {
+          Logger.error(
+            { error, jobId: data.jobId },
+            "LLM refinement failed, falling back to raw transcript",
+          )
+          finalTranscript = rawTranscript
+        }
+      }
+    } else {
+      // Even without LLM refinement, merge consecutive segments deterministically
+      Logger.info(
+        { jobId: data.jobId },
+        "LLM refinement disabled, merging consecutive segments without refinement",
+      )
+      finalTranscript = {
+        ...rawTranscript,
+        segments: mergeConsecutiveSegments(rawTranscript.segments),
+      }
+    }
+
+    // Save final results in requested format(s)
+    const outputFormat = data.outputFormat || "json"
+    const suffix = shouldRefine ? "_refined" : "_merged"
+
+    if (outputFormat === "json" || outputFormat === "all") {
+      const jsonPath = data.outputPath + suffix + ".json"
+      await fs.writeFile(jsonPath, JSON.stringify(finalTranscript, null, 2), "utf-8")
+      Logger.info({ jobId: data.jobId, path: jsonPath }, "Saved JSON output")
+    }
+
+    if (outputFormat === "txt" || outputFormat === "all") {
+      const txtPath = data.outputPath + suffix + ".txt"
+      const txtContent = finalTranscript.segments
+        .map((seg) => `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`)
+        .join("\n")
+      await fs.writeFile(txtPath, txtContent, "utf-8")
+      Logger.info({ jobId: data.jobId, path: txtPath }, "Saved TXT output")
+    }
+
+    if (outputFormat === "srt" || outputFormat === "all") {
+      const srtPath = data.outputPath + suffix + ".srt"
+      const srtContent = finalTranscript.segments
+        .map((seg, idx) => {
+          const start = formatTimestamp(seg.start).replace(".", ",")
+          const end = formatTimestamp(seg.end).replace(".", ",")
+          const text = `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`
+          return `${idx + 1}\n${start} --> ${end}\n${text}\n`
+        })
+        .join("\n")
+      await fs.writeFile(srtPath, srtContent, "utf-8")
+      Logger.info({ jobId: data.jobId, path: srtPath }, "Saved SRT output")
+    }
+
+    Logger.info({ jobId: data.jobId }, "Transcription pipeline completed successfully")
+  } catch (error) {
+    Logger.error({ error, jobId: data.jobId }, "Transcription job failed")
+    throw error
+  }
+}


⚠️ Potential issue | 🟠 Major

Add cleanup for temporary files to prevent disk space exhaustion.

The function downloads audio files and creates converted versions but never cleans them up. Over time, this will lead to disk space exhaustion, especially for high-volume transcription workloads.

Add cleanup in a finally block:

async function handleTranscribeJob( boss: PgBoss, job: PgBoss.Job<TranscribeJobData>, ): Promise<void> { const data = job.data Logger.info({ jobId: data.jobId }, "Starting transcription job") + + let convertedAudioPath: string | null = null + let rawJsonPath: string | null = null try { // Download audio file Logger.info({ audioUrl: data.audioUrl, audioPath: data.audioPath }, "Downloading audio file") await downloadFile(data.audioUrl, data.audioPath) // Convert audio to optimal format - const convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav") + convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav") Logger.info({ convertedAudioPath }, "Using converted audio path") await convertAudioToOptimalFormat(data.audioPath, convertedAudioPath) // ... rest of the processing ... // Read the raw transcript JSON output from Python - const rawJsonPath = data.outputPath + "_raw.json" + rawJsonPath = data.outputPath + "_raw.json" const rawTranscriptData = await fs.readFile(rawJsonPath, "utf-8") // ... rest of the processing ... Logger.info({ jobId: data.jobId }, "Transcription pipeline completed successfully") } catch (error) { Logger.error({ error, jobId: data.jobId }, "Transcription job failed") throw error + } finally { + // Clean up temporary files + const filesToClean = [ + data.audioPath, + convertedAudioPath, + rawJsonPath, + ].filter((path): path is string => path !== null) + + for (const file of filesToClean) { + try { + await fs.unlink(file) + Logger.debug({ file }, "Cleaned up temporary file") + } catch (error) { + Logger.warn({ file, error }, "Failed to clean up temporary file") + } + } } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async function handleTranscribeJob(

boss: PgBoss, // currently unused, but kept for future extensions

job: PgBoss.Job<TranscribeJobData>,

): Promise<void> {

const data = job.data

Logger.info({ jobId: data.jobId }, "Starting transcription job")

try {

// Download audio file

Logger.info({ audioUrl: data.audioUrl, audioPath: data.audioPath }, "Downloading audio file")

await downloadFile(data.audioUrl, data.audioPath)

// Convert audio to optimal format (mono, 16kHz, 16-bit PCM WAV)

const convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav")

Logger.info({ convertedAudioPath }, "Using converted audio path")

await convertAudioToOptimalFormat(data.audioPath, convertedAudioPath)

// Build command arguments for simplified Python script (ASR + Diarization only)

const args = [

convertedAudioPath, // Use converted audio

"--whisper-model",

data.whisperModel || "turbo",

"--output",

data.outputPath + "_raw.json", // Output raw results

]

if (data.language) args.push("--language", data.language)

if (data.numSpeakers) args.push("--num-speakers", data.numSpeakers.toString())

if (data.minSpeakers) args.push("--min-speakers", data.minSpeakers.toString())

if (data.maxSpeakers) args.push("--max-speakers", data.maxSpeakers.toString())

// Default: multilingual ON unless explicitly disabled (but in your pipeline: always true)

const multilingual = data.multilingual !== false

if (multilingual) {

args.push("--multilingual")

}

// Use HF_TOKEN from environment or from job data

const hfToken = process.env.HF_TOKEN || data.hfToken

if (hfToken) {

args.push("--hf-token", hfToken)

}

// Run simplified Python script (ASR + Diarization only)

Logger.info({ args }, "Running whisper_diarization.py (ASR + Diarization)")

const result = await runPythonScript("whisper_diarization.py", args)

if (result.exitCode !== 0) {

throw new Error(`Transcription failed (exit ${result.exitCode}): ${result.stderr}`)

}

Logger.info(

{ jobId: data.jobId, rawOutputPath: data.outputPath + "_raw.json" },

"ASR + Diarization completed, starting TypeScript post-processing",

)

// Read the raw transcript JSON output from Python

const rawJsonPath = data.outputPath + "_raw.json"

const rawTranscriptData = await fs.readFile(rawJsonPath, "utf-8")

const rawTranscript: TranscriptResult = JSON.parse(rawTranscriptData)

// Decide whether to refine with LLM (default true if undefined)

const shouldRefine = data.refineWithLLM !== false

let finalTranscript: TranscriptResult = rawTranscript

if (shouldRefine) {

Logger.info({ jobId: data.jobId }, "Starting LLM refinement in TypeScript")

// Check which LLM provider is configured

const { defaultBestModel } = config

if (!defaultBestModel) {

Logger.warn(

{

jobId: data.jobId,

},

"No LLM provider configured for ASR refinement. Skipping refinement and using raw transcript.",

)

} else {

try {

// Run TypeScript refinement (works with any configured LLM provider)

finalTranscript = await refineTranscript(rawTranscript, {

maxTokens: 200000,

})

Logger.info({ jobId: data.jobId }, "LLM refinement completed successfully")

} catch (error) {

Logger.error(

{ error, jobId: data.jobId },

"LLM refinement failed, falling back to raw transcript",

)

finalTranscript = rawTranscript

}

}

} else {

// Even without LLM refinement, merge consecutive segments deterministically

Logger.info(

{ jobId: data.jobId },

"LLM refinement disabled, merging consecutive segments without refinement",

)

finalTranscript = {

...rawTranscript,

segments: mergeConsecutiveSegments(rawTranscript.segments),

}

}

// Save final results in requested format(s)

const outputFormat = data.outputFormat || "json"

const suffix = shouldRefine ? "_refined" : "_merged"

if (outputFormat === "json" || outputFormat === "all") {

const jsonPath = data.outputPath + suffix + ".json"

await fs.writeFile(jsonPath, JSON.stringify(finalTranscript, null, 2), "utf-8")

Logger.info({ jobId: data.jobId, path: jsonPath }, "Saved JSON output")

}

if (outputFormat === "txt" || outputFormat === "all") {

const txtPath = data.outputPath + suffix + ".txt"

const txtContent = finalTranscript.segments

.map((seg) => `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`)

.join("\n")

await fs.writeFile(txtPath, txtContent, "utf-8")

Logger.info({ jobId: data.jobId, path: txtPath }, "Saved TXT output")

}

if (outputFormat === "srt" || outputFormat === "all") {

const srtPath = data.outputPath + suffix + ".srt"

const srtContent = finalTranscript.segments

.map((seg, idx) => {

const start = formatTimestamp(seg.start).replace(".", ",")

const end = formatTimestamp(seg.end).replace(".", ",")

const text = `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`

return `${idx + 1}\n${start} --> ${end}\n${text}\n`

})

.join("\n")

await fs.writeFile(srtPath, srtContent, "utf-8")

Logger.info({ jobId: data.jobId, path: srtPath }, "Saved SRT output")

}

Logger.info({ jobId: data.jobId }, "Transcription pipeline completed successfully")

} catch (error) {

Logger.error({ error, jobId: data.jobId }, "Transcription job failed")

throw error

}

}

async function handleTranscribeJob(

boss: PgBoss, // currently unused, but kept for future extensions

job: PgBoss.Job<TranscribeJobData>,

): Promise<void> {

const data = job.data

Logger.info({ jobId: data.jobId }, "Starting transcription job")

let convertedAudioPath: string | null = null

let rawJsonPath: string | null = null

try {

// Download audio file

Logger.info({ audioUrl: data.audioUrl, audioPath: data.audioPath }, "Downloading audio file")

await downloadFile(data.audioUrl, data.audioPath)

// Convert audio to optimal format (mono, 16kHz, 16-bit PCM WAV)

convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav")

Logger.info({ convertedAudioPath }, "Using converted audio path")

await convertAudioToOptimalFormat(data.audioPath, convertedAudioPath)

// Build command arguments for simplified Python script (ASR + Diarization only)

const args = [

convertedAudioPath, // Use converted audio

"--whisper-model",

data.whisperModel || "turbo",

"--output",

data.outputPath + "_raw.json", // Output raw results

]

if (data.language) args.push("--language", data.language)

if (data.numSpeakers) args.push("--num-speakers", data.numSpeakers.toString())

if (data.minSpeakers) args.push("--min-speakers", data.minSpeakers.toString())

if (data.maxSpeakers) args.push("--max-speakers", data.maxSpeakers.toString())

// Default: multilingual ON unless explicitly disabled (but in your pipeline: always true)

const multilingual = data.multilingual !== false

if (multilingual) {

args.push("--multilingual")

}

// Use HF_TOKEN from environment or from job data

const hfToken = process.env.HF_TOKEN || data.hfToken

if (hfToken) {

args.push("--hf-token", hfToken)

}

// Run simplified Python script (ASR + Diarization only)

Logger.info({ args }, "Running whisper_diarization.py (ASR + Diarization)")

const result = await runPythonScript("whisper_diarization.py", args)

if (result.exitCode !== 0) {

throw new Error(`Transcription failed (exit ${result.exitCode}): ${result.stderr}`)

}

Logger.info(

{ jobId: data.jobId, rawOutputPath: data.outputPath + "_raw.json" },

"ASR + Diarization completed, starting TypeScript post-processing",

)

// Read the raw transcript JSON output from Python

rawJsonPath = data.outputPath + "_raw.json"

const rawTranscriptData = await fs.readFile(rawJsonPath, "utf-8")

const rawTranscript: TranscriptResult = JSON.parse(rawTranscriptData)

// Decide whether to refine with LLM (default true if undefined)

const shouldRefine = data.refineWithLLM !== false

let finalTranscript: TranscriptResult = rawTranscript

if (shouldRefine) {

Logger.info({ jobId: data.jobId }, "Starting LLM refinement in TypeScript")

// Check which LLM provider is configured

const { defaultBestModel } = config

if (!defaultBestModel) {

Logger.warn(

{

jobId: data.jobId,

},

"No LLM provider configured for ASR refinement. Skipping refinement and using raw transcript.",

)

} else {

try {

// Run TypeScript refinement (works with any configured LLM provider)

finalTranscript = await refineTranscript(rawTranscript, {

maxTokens: 200000,

})

Logger.info({ jobId: data.jobId }, "LLM refinement completed successfully")

} catch (error) {

Logger.error(

{ error, jobId: data.jobId },

"LLM refinement failed, falling back to raw transcript",

)

finalTranscript = rawTranscript

}

}

} else {

// Even without LLM refinement, merge consecutive segments deterministically

Logger.info(

{ jobId: data.jobId },

"LLM refinement disabled, merging consecutive segments without refinement",

)

finalTranscript = {

...rawTranscript,

segments: mergeConsecutiveSegments(rawTranscript.segments),

}

}

// Save final results in requested format(s)

const outputFormat = data.outputFormat || "json"

const suffix = shouldRefine ? "_refined" : "_merged"

if (outputFormat === "json" || outputFormat === "all") {

const jsonPath = data.outputPath + suffix + ".json"

await fs.writeFile(jsonPath, JSON.stringify(finalTranscript, null, 2), "utf-8")

Logger.info({ jobId: data.jobId, path: jsonPath }, "Saved JSON output")

}

if (outputFormat === "txt" || outputFormat === "all") {

const txtPath = data.outputPath + suffix + ".txt"

const txtContent = finalTranscript.segments

.map((seg) => `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`)

.join("\n")

await fs.writeFile(txtPath, txtContent, "utf-8")

Logger.info({ jobId: data.jobId, path: txtPath }, "Saved TXT output")

}

if (outputFormat === "srt" || outputFormat === "all") {

const srtPath = data.outputPath + suffix + ".srt"

const srtContent = finalTranscript.segments

.map((seg, idx) => {

const start = formatTimestamp(seg.start).replace(".", ",")

const end = formatTimestamp(seg.end).replace(".", ",")

const text = `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`

return `${idx + 1}\n${start} --> ${end}\n${text}\n`

})

.join("\n")

await fs.writeFile(srtPath, srtContent, "utf-8")

Logger.info({ jobId: data.jobId, path: srtPath }, "Saved SRT output")

}

Logger.info({ jobId: data.jobId }, "Transcription pipeline completed successfully")

} catch (error) {

Logger.error({ error, jobId: data.jobId }, "Transcription job failed")

throw error

} finally {

// Clean up temporary files

const filesToClean = [

data.audioPath,

convertedAudioPath,

rawJsonPath,

].filter((path): path is string => path !== null)

for (const file of filesToClean) {

try {

await fs.unlink(file)

Logger.debug({ file }, "Cleaned up temporary file")

} catch (error) {

Logger.warn({ file, error }, "Failed to clean up temporary file")

}

}

}

}

🤖 Prompt for AI Agents

In server/queue/asrProcessor.ts around lines 158 to 302, the function downloads the original audio and creates a converted audio file (and writes raw JSON) but never deletes these temporary files; declare convertedAudioPath and rawJsonPath (and any temp paths) in an outer scope before the try, add a finally block after the try/catch that attempts to unlink (await fs.unlink) the original downloaded file (data.audioPath), the converted audio file (convertedAudioPath), and the intermediate raw JSON (rawJsonPath) if they exist, wrapping each unlink in its own try/catch so deletion failures are logged but do not throw, and ensure the cleanup runs regardless of success or failure of the transcription to prevent disk space exhaustion.

feat: initial commit for the transcription service

d990015

thesecdude requested review from devesh-juspay, junaid-shirur, kalpadhwaryu, shivamashtikar and zereraz as code owners November 10, 2025 20:35

gemini-code-assist bot reviewed Nov 10, 2025

View reviewed changes

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

		const suffix = jobData.refineWithLLM ? "_refined" : "_raw"
		const format = jobData.outputFormat \|\| "json"

Conversation

thesecdude commented Nov 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Nov 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thesecdude commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading