feat: initial commit for the transcription service#1206
feat: initial commit for the transcription service#1206thesecdude wants to merge 1 commit intomainfrom
Conversation
WalkthroughThis PR introduces end-to-end automatic speech recognition capabilities, including a frontend transcription page for file uploads, backend ASR API endpoints, audio processing via Whisper and speaker diarization, job queue integration, LLM-driven transcript refinement, and multi-format result delivery. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Frontend
participant Server as API Server
participant Queue as Job Queue
participant Python as Whisper/<br/>Diarization
participant LLM as LLM Refiner
User->>Frontend: Upload audio file
Frontend->>Server: POST /files/upload-simple
Server->>Server: Save file to disk
Server-->>Frontend: Return upload URL
Frontend->>Server: POST /asr/transcribe (with file URL)
Server->>Server: Validate, create temp workspace
Server->>Queue: Enqueue TranscribeJobData
Server-->>Frontend: Return jobId
Frontend->>Frontend: Start polling (1s interval)
Frontend->>Server: GET /asr/job-status?jobId
Server->>Queue: Check job status
par Async Processing
Queue->>Python: Download audio & convert to WAV
Python->>Python: Run Whisper + Diarization<br/>(parallel execution)
Python-->>Queue: Return transcript + speakers
end
Queue->>LLM: Refine transcript (if enabled)
LLM-->>Queue: Refined segments
Queue->>Queue: Write JSON/TXT/SRT outputs
Queue->>Server: Mark job complete
Frontend->>Server: GET /asr/job-status?jobId (polling)
Server-->>Frontend: Return completed status +<br/>output URLs
Frontend->>User: Display transcript, segments,<br/>speaker labels, download links
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Key areas requiring attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @thesecdude, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a significant new feature: an end-to-end audio transcription service. It provides users with the ability to upload audio or video files and receive high-quality, speaker-diarized transcripts. The implementation spans both frontend and backend, establishing a robust, scalable processing workflow that leverages state-of-the-art AI models and includes intelligent refinement steps to ensure accurate and well-formatted output. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new audio transcription service, including a frontend page for uploads, backend APIs for job management, and a queue-based processing pipeline. The implementation is comprehensive, with a robust Python script for transcription and a well-designed TypeScript service for LLM refinement. My review focuses on improving the reliability of the job polling mechanism, ensuring consistency in API calls, and fixing critical bugs in the job status and results retrieval logic.
| const queueJobId = await boss.send(ASRQueue, jobData, { | ||
| expireInHours: 24, | ||
| retryLimit: 2, | ||
| retryDelay: 60, | ||
| retryBackoff: true, | ||
| }) |
There was a problem hiding this comment.
The job status polling will fail because boss.getJobById is being called with the custom UUID (jobId), but pg-boss expects its own internal job ID. This will result in a 404 "Job not found" error for every status check.
To fix this, you should specify the jobId (UUID) as the job's unique ID when sending it to the queue. This can be done using the id option in boss.send. This way, getJobById will work as expected.
| const queueJobId = await boss.send(ASRQueue, jobData, { | |
| expireInHours: 24, | |
| retryLimit: 2, | |
| retryDelay: 60, | |
| retryBackoff: true, | |
| }) | |
| const queueJobId = await boss.send(ASRQueue, jobData, { | |
| id: jobId, // Use the UUID as the job ID for pg-boss | |
| expireInHours: 24, | |
| retryLimit: 2, | |
| retryDelay: 60, | |
| retryBackoff: true, | |
| }) |
| const uploadFile = async (file: File): Promise<string> => { | ||
| const formData = new FormData() | ||
| formData.append("file", file) | ||
|
|
||
| const response = await fetch(`${api}/files/upload-simple`, { | ||
| method: "POST", | ||
| headers: { | ||
| Authorization: `Bearer ${localStorage.getItem("access_token")}`, | ||
| }, | ||
| body: formData, | ||
| }) | ||
|
|
||
| if (!response.ok) { | ||
| let message = "File upload failed" | ||
| try { | ||
| const data = await response.json() | ||
| if (data?.message) message = data.message | ||
| } catch { | ||
| // ignore JSON parse errors | ||
| } | ||
| throw new Error(message) | ||
| } | ||
|
|
||
| const data = await response.json() | ||
| return data.url | ||
| } |
There was a problem hiding this comment.
The API calls in this component are made using fetch directly, with the access token retrieved from localStorage. This is inconsistent with the rest of the application, which seems to use a centralized api client from hono/client configured with authFetch.
Using fetch directly bypasses any centralized logic for handling authentication, such as token refreshing, which can lead to authentication errors and maintenance issues. You should refactor uploadFile, pollJobStatus, and handleSubmit to use the api client.
For example, uploadFile can be refactored like this:
const uploadFile = async (file: File): Promise<string> => {
const formData = new FormData()
formData.append("file", file)
const response = await api.files["upload-simple"].$post({
body: formData,
})
if (!response.ok) {
let message = "File upload failed"
try {
const data = await response.json()
if (data?.message) message = data.message
} catch {
// ignore JSON parse errors
}
throw new Error(message)
}
const data = await response.json()
return data.url
}Similarly, pollJobStatus and the transcription request in handleSubmit should also use the api client.
| const suffix = jobData.refineWithLLM ? "_refined" : "_raw" | ||
| const format = jobData.outputFormat || "json" |
There was a problem hiding this comment.
There's a mismatch in the output filename suffix for non-refined transcripts. This API looks for a file with a _raw suffix, but the asrProcessor creates a file with a _merged suffix when LLM refinement is disabled. This will prevent the API from finding and returning the results for non-refined jobs.
To fix this, the suffix logic here should align with the logic in asrProcessor.ts.
| const suffix = jobData.refineWithLLM ? "_refined" : "_raw" | |
| const format = jobData.outputFormat || "json" | |
| const suffix = jobData.refineWithLLM ? "_refined" : "_merged" | |
| const format = jobData.outputFormat || "json" |
| const pollJobStatus = useCallback( | ||
| (jobId: string) => { | ||
| clearPolling() | ||
|
|
||
| const intervalId = setInterval(async () => { | ||
| try { | ||
| const response = await fetch(`${api}/asr/job-status?jobId=${jobId}`, { | ||
| headers: { | ||
| Authorization: `Bearer ${localStorage.getItem("access_token")}`, | ||
| }, | ||
| }) | ||
|
|
||
| if (!response.ok) { | ||
| throw new Error("Failed to fetch job status") | ||
| } | ||
|
|
||
| const data = await response.json() | ||
|
|
||
| if (data.status === "completed") { | ||
| clearPolling() | ||
| setJobStatus("completed") | ||
| setResult(data.outputs ?? null) | ||
| } else if (data.status === "failed") { | ||
| clearPolling() | ||
| setJobStatus("failed") | ||
| setError(data.error || "Transcription failed") | ||
| } else if (data.status === "active") { | ||
| setJobStatus("processing") | ||
| } | ||
| } catch (err) { | ||
| clearPolling() | ||
| setJobStatus("failed") | ||
| setError( | ||
| err instanceof Error | ||
| ? err.message | ||
| : "Failed to check job status", | ||
| ) | ||
| } | ||
| }, 3000) | ||
|
|
||
| pollIntervalRef.current = intervalId | ||
|
|
||
| // Hard timeout after 30 minutes | ||
| const timeoutId = setTimeout(() => { | ||
| clearPolling() | ||
| setJobStatus("failed") | ||
| setError("Transcription timed out. Please try again.") | ||
| }, 30 * 60 * 1000) | ||
|
|
||
| pollTimeoutRef.current = timeoutId | ||
| }, | ||
| [clearPolling], | ||
| ) |
There was a problem hiding this comment.
The current polling mechanism uses setInterval, which can lead to issues on unreliable networks. If an API request takes longer than the interval (3 seconds), or if the network is temporarily down, multiple requests can stack up, leading to unnecessary server load and unpredictable client-side behavior.
A more robust approach is to use a recursive setTimeout. This ensures that the next poll is only scheduled after the previous one has completed (either successfully or with an error).
Additionally, when a polling request fails, the UI shows a generic "Transcription failed" message. This is misleading, as the job on the backend might still be running. The error should clarify that it was a failure to check the status, not a failure of the job itself.
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (8)
server/api/files.ts (1)
754-776: Stream ASR files instead of bufferingAudio/video responses can be hundreds of MB. Reading them with
fs.readFileloads the whole file into memory per request, which tanks concurrency and risks OOM. Stream the file (e.g. viaBun.file().stream()) so memory stays constant.- const fileBuffer = await fs.readFile(normalizedPath) - - return c.newResponse(fileBuffer as any, 200, { - "Content-Type": contentType, - "Content-Length": fileBuffer.length.toString(), - }) + const bunFile = Bun.file(normalizedPath) + const stat = await bunFile.stat() + + return new Response(bunFile.stream(), { + status: 200, + headers: { + "Content-Type": contentType, + ...(stat?.size ? { "Content-Length": stat.size.toString() } : {}), + }, + })server/queue/asrProcessor.ts (3)
184-193: Clarify multilingual mode default behavior.The code comment states "Default: multilingual ON unless explicitly disabled (but in your pipeline: always true)" but the implementation is
data.multilingual !== false, which means multilingual is ON by default (undefined or true → multilingual enabled).However, the comment is confusing. Consider clarifying:
- // Default: multilingual ON unless explicitly disabled (but in your pipeline: always true) + // Default: multilingual mode is enabled unless explicitly set to false const multilingual = data.multilingual !== false if (multilingual) { args.push("--multilingual") }
205-207: Improve error message specificity.The error message includes stderr but doesn't distinguish between different failure modes. Consider checking for common failure scenarios to provide more actionable error messages.
if (result.exitCode !== 0) { + // Check for common error patterns + if (result.stderr.includes("CUDA out of memory")) { + throw new Error(`Transcription failed: GPU out of memory. Consider using a smaller model or CPU mode.`) + } + if (result.stderr.includes("HuggingFace") || result.stderr.includes("authentication")) { + throw new Error(`Transcription failed: HuggingFace authentication error. Verify HF_TOKEN is set correctly.`) + } throw new Error(`Transcription failed (exit ${result.exitCode}): ${result.stderr}`) }
283-295: Consider validating SRT timestamp ordering.SRT format requires sequential, non-overlapping timestamps. If segments have timing issues (overlaps, out-of-order), the generated SRT may be invalid.
Add a validation step before writing SRT:
if (outputFormat === "srt" || outputFormat === "all") { const srtPath = data.outputPath + suffix + ".srt" // Validate segment ordering for (let i = 1; i < finalTranscript.segments.length; i++) { const prev = finalTranscript.segments[i - 1] const curr = finalTranscript.segments[i] if (curr.start < prev.end) { Logger.warn( { jobId: data.jobId, segmentIndex: i }, "Overlapping segments detected in SRT output" ) } } const srtContent = finalTranscript.segments .map((seg, idx) => { const start = formatTimestamp(seg.start).replace(".", ",") const end = formatTimestamp(seg.end).replace(".", ",") const text = `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}` return `${idx + 1}\n${start} --> ${end}\n${text}\n` }) .join("\n") await fs.writeFile(srtPath, srtContent, "utf-8") Logger.info({ jobId: data.jobId, path: srtPath }, "Saved SRT output") }server/asr-sd/whisper_diarization.py (4)
1-1: Make script executable if intended for direct CLI usage.The shebang is present but the file isn't marked as executable. If this script is intended to be invoked directly (not just via
python whisper_diarization.py), make it executable.Run this command to make it executable:
chmod +x server/asr-sd/whisper_diarization.py
118-137: Consider adding timeout for future results.While the parallel execution handles interrupts and exceptions well, there's no timeout on
future.result(). If a worker thread hangs (e.g., network issue downloading model weights, deadlock), the main thread will wait indefinitely.Add a timeout parameter:
# Wait for completion and collect results for future in as_completed([whisper_future, diarization_future]): try: if future is whisper_future: - whisper_result, whisper_time = future.result(timeout=None) + whisper_result, whisper_time = future.result(timeout=7200) # 2 hour timeout print(f" Whisper completed ({whisper_time:.2f}s)") else: - diarization_result, diarization_time = future.result(timeout=None) + diarization_result, diarization_time = future.result(timeout=7200) print(f" Pyannote completed ({diarization_time:.2f}s)")
281-304: Consider language-aware text joining for multilingual support.The
_smart_joinmethod uses English punctuation rules (spaces around punctuation). For languages like Chinese, Japanese, or Thai that don't use spaces, or languages with different punctuation conventions, this may produce incorrect spacing.Consider adding language-aware joining:
def _smart_join(self, prev_text: str, token: str, language: str = "en") -> str: """Smart spacing around punctuation when joining tokens.""" if not prev_text: return token # Languages that don't use spaces between words no_space_languages = {"zh", "ja", "th", "my", "km"} if language in no_space_languages: return prev_text + token # English punctuation rules closing_punct = {",", ".", "!", "?", ":", ";", ")", "]", "}", "'"} opening_punct = {"(", "[", "{", "'"} if token in closing_punct: return prev_text + token if prev_text[-1] in opening_punct: return prev_text + token if token == "'" and prev_text and prev_text[-1].isalnum(): return prev_text + token return prev_text + " " + tokenNote: This would require tracking segment language throughout the pipeline.
404-413: Use specific exception types instead of bareException.Catching bare
Exceptioncan hide unexpected errors and make debugging difficult. It's better to catch specific exceptions that are expected during initialization.# Initialize pipeline try: pipeline = ParallelWhisperDiarization( whisper_model=args.whisper_model, diarization_model=args.diarization_model, device=args.device, hf_token=args.hf_token, ) - except Exception as e: + except (OSError, RuntimeError, ValueError) as e: print(f"\nError initializing pipeline: {e}") return + except KeyboardInterrupt: + print("\nInitialization cancelled by user") + return
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
frontend/src/components/Sidebar.tsx(13 hunks)frontend/src/routes/_authenticated/transcription.tsx(1 hunks)server/api/asr.ts(1 hunks)server/api/files.ts(2 hunks)server/asr-sd/whisper_diarization.py(1 hunks)server/queue/asrProcessor.ts(1 hunks)server/queue/index.ts(4 hunks)server/server.ts(4 hunks)server/services/transcriptRefinement.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-06-24T08:53:33.051Z
Learnt from: rahul1841
Repo: xynehq/xyne PR: 585
File: frontend/src/components/HistoryModal.tsx:341-351
Timestamp: 2025-06-24T08:53:33.051Z
Learning: In frontend/src/components/HistoryModal.tsx, bookmark buttons don't need the CLASS_NAMES.BOOKMARK_BUTTON class because the HistoryModal is rendered inside the sidebar area and is already protected by the sidebar's click outside handler logic.
Applied to files:
frontend/src/components/Sidebar.tsx
📚 Learning: 2025-09-16T08:57:58.762Z
Learnt from: rahul1841
Repo: xynehq/xyne PR: 843
File: server/server.ts:996-996
Timestamp: 2025-09-16T08:57:58.762Z
Learning: The DownloadFileApi handler in server/api/knowledgeBase.ts properly uses the clId route parameter to scope and validate file access within the correct collection.
Applied to files:
server/api/files.ts
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
Repo: xynehq/xyne PR: 484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/api/files.ts
📚 Learning: 2025-10-07T05:35:17.385Z
Learnt from: Himanshvarma
Repo: xynehq/xyne PR: 1046
File: server/services/fileProcessor.ts:118-131
Timestamp: 2025-10-07T05:35:17.385Z
Learning: In server/services/fileProcessor.ts, SheetProcessingResult objects intentionally use empty chunks_map and image_chunks_map arrays because each sheet is ingested as a separate Vespa document with sheet-level metadata (sheetName, sheetIndex, totalSheets, docId), making per-chunk metadata redundant for sheet files.
Applied to files:
server/api/files.ts
📚 Learning: 2025-06-10T05:40:04.427Z
Learnt from: naSim087
Repo: xynehq/xyne PR: 525
File: frontend/src/routes/_authenticated/admin/integrations/slack.tsx:141-148
Timestamp: 2025-06-10T05:40:04.427Z
Learning: In frontend/src/routes/_authenticated/admin/integrations/slack.tsx, the ConnectAction enum and related connectAction state (lines 141-148, 469-471) are intentionally kept for future development, even though they appear unused in the current implementation.
Applied to files:
frontend/src/routes/_authenticated/transcription.tsx
🧬 Code graph analysis (8)
server/api/asr.ts (3)
server/logger/index.ts (2)
getLogger(36-93)Subsystem(15-15)server/queue/asrProcessor.ts (1)
TranscribeJobData(35-50)server/queue/index.ts (1)
ASRQueue(59-59)
server/queue/asrProcessor.ts (2)
server/logger/index.ts (2)
getLogger(36-93)Subsystem(15-15)server/services/transcriptRefinement.ts (3)
TranscriptResult(26-41)refineTranscript(416-474)mergeConsecutiveSegments(142-195)
server/queue/index.ts (2)
server/queue/boss.ts (1)
boss(4-7)server/queue/asrProcessor.ts (1)
handleASRJob(305-327)
server/services/transcriptRefinement.ts (3)
server/logger/index.ts (2)
getLogger(36-93)Subsystem(15-15)server/ai/types.ts (1)
LLMProvider(340-350)server/ai/provider/index.ts (1)
getProviderByModel(369-419)
server/server.ts (2)
server/api/files.ts (2)
handleSimpleFileUpload(675-723)serveASRFile(726-784)server/api/asr.ts (4)
transcribeAudioSchema(20-33)TranscribeAudioApi(40-151)getJobStatusSchema(35-37)GetJobStatusApi(154-230)
frontend/src/components/Sidebar.tsx (2)
frontend/src/lib/utils.ts (1)
cn(5-7)frontend/src/components/Tooltip.tsx (1)
Tip(3-28)
server/api/files.ts (1)
server/logger/index.ts (2)
getLogger(36-93)Subsystem(15-15)
frontend/src/routes/_authenticated/transcription.tsx (2)
frontend/src/api.ts (1)
api(5-5)frontend/src/components/Sidebar.tsx (1)
Sidebar(41-439)
🪛 GitHub Actions: TypeScript Build Check
frontend/src/components/Sidebar.tsx
[error] 246-246: bunx tsc -b failed: Type '"/transcription"' is not assignable to type '"/" | "/auth" | "/call" | "/oauth/success" | "/agent" | "/buzz" | "/chat" | "/dashboard" | "/knowledgeManagement" | "/search" | "/tuning" | "/workflow" | "/admin/chat-overview" | ... 20 more ... | ".."'.
frontend/src/routes/_authenticated/transcription.tsx
[error] 31-31: bunx tsc -b failed: Argument of type '"/_authenticated/transcription"' is not assignable to parameter of type 'keyof FileRoutesByPath'.
🪛 Ruff (0.14.4)
server/asr-sd/whisper_diarization.py
1-1: Shebang is present but file is not executable
(EXE001)
55-55: Use raise without specifying exception name
Remove exception name
(TRY201)
61-63: try-except-pass detected, consider logging the exception
(S110)
61-61: Do not catch blind exception: Exception
(BLE001)
145-145: Avoid specifying long messages outside the exception class
(TRY003)
300-300: Possible hardcoded password assigned to: "token"
(S105)
411-411: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (12)
server/queue/asrProcessor.ts (6)
17-17: Verifyprocess.cwd()returns repository root in all execution contexts.The
ASR_SD_DIRpath assumesprocess.cwd()returns the repository root. In some deployment scenarios (systemd services, Docker containers with custom working directories, or when the process is started from a different directory), this might not hold true, leading to file-not-found errors.Consider using
__dirnameor an explicit environment variable for the project root:-const ASR_SD_DIR = path.join(process.cwd(), "asr-sd") +const ASR_SD_DIR = path.join(__dirname, "../..", "asr-sd")Or rely on an environment variable:
const PROJECT_ROOT = process.env.PROJECT_ROOT || process.cwd() const ASR_SD_DIR = path.join(PROJECT_ROOT, "asr-sd")
21-29: LGTM!The timestamp formatting logic correctly converts seconds to SRT format (HH:MM:SS.mmm) with proper padding.
31-52: LGTM!The type definitions are well-structured with clear interfaces and appropriate use of optional fields. The enum pattern allows for future extensibility.
171-173: Potential issue with audio file extension replacement.The regex
replace(/\.[^.]+$/, "_converted.wav")assumes the input file has an extension. Ifdata.audioPathhas no extension (e.g., just "audiofile"), the converted path will be "audiofile_converted.wav", which is fine. However, if it has multiple dots (e.g., "my.audio.file.mp3"), it works correctly. The logic appears sound.
228-251: LGTM - Good fallback handling for LLM refinement.The code properly checks for LLM configuration and gracefully falls back to raw transcript if refinement fails. The warning messages are clear and informative.
305-327: LGTM!The job router is well-structured with proper type narrowing, error handling, and extensibility for future job types. The switch statement with a default case ensures unknown job types are caught.
server/asr-sd/whisper_diarization.py (6)
107-142: LGTM - Excellent parallel execution with proper cleanup.The parallel execution implementation is robust with:
- Proper resource cleanup via executor.shutdown
- KeyboardInterrupt handling for graceful termination
- Exception handling with future cancellation
- Clear progress logging
180-204: LGTM!Both worker methods are correctly implemented with
torch.no_grad()for inference optimization and accurate timing measurements.
206-262: LGTM - Robust merging logic with good fallbacks.The merge implementation correctly handles edge cases (missing word timestamps, segment-level fallback) and produces a well-structured result with speaker assignments.
264-279: LGTM!The deterministic speaker assignment uses a sensible distance-based tiebreaking strategy when a timestamp overlaps multiple speaker segments.
306-338: LGTM!The speaker grouping logic correctly consolidates consecutive words from the same speaker while maintaining all necessary metadata (timestamps, word list).
341-463: LGTM - Well-structured CLI with comprehensive output.The CLI implementation is thorough with:
- Clear argument definitions with defaults
- Input validation
- Detailed progress logging
- Comprehensive timing metrics
- Informative summary output
| export const Route = createFileRoute("/_authenticated/transcription")({ | ||
| component: TranscriptionPage, | ||
| errorComponent: () => <div>Error loading transcription page</div>, | ||
| loader: async ({ context }) => { | ||
| return { | ||
| user: context.user, | ||
| workspace: context.workspace, | ||
| } | ||
| }, | ||
| }) |
There was a problem hiding this comment.
Route path must be /transcription, not /_authenticated/transcription.
Our router typings reject "/_authenticated/transcription" (see failing bunx tsc -b), which means the page never registers under the intended /transcription URL. As a result the sidebar link can’t type-check or hit this route. Please switch the path to the actual URL segment so the generated FileRoutesByPath includes /transcription.
Apply this diff:
-export const Route = createFileRoute("/_authenticated/transcription")({
+export const Route = createFileRoute("/transcription")({📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const Route = createFileRoute("/_authenticated/transcription")({ | |
| component: TranscriptionPage, | |
| errorComponent: () => <div>Error loading transcription page</div>, | |
| loader: async ({ context }) => { | |
| return { | |
| user: context.user, | |
| workspace: context.workspace, | |
| } | |
| }, | |
| }) | |
| export const Route = createFileRoute("/transcription")({ | |
| component: TranscriptionPage, | |
| errorComponent: () => <div>Error loading transcription page</div>, | |
| loader: async ({ context }) => { | |
| return { | |
| user: context.user, | |
| workspace: context.workspace, | |
| } | |
| }, | |
| }) |
🧰 Tools
🪛 GitHub Actions: TypeScript Build Check
[error] 31-31: bunx tsc -b failed: Argument of type '"/_authenticated/transcription"' is not assignable to parameter of type 'keyof FileRoutesByPath'.
🤖 Prompt for AI Agents
In frontend/src/routes/_authenticated/transcription.tsx around lines 31 to 40,
the route is currently registered with the path "/_authenticated/transcription"
but should be "/transcription"; update the createFileRoute(...) call to use
"/transcription" so the compiled router typings include the correct
FileRoutesByPath entry and the page registers at the intended URL, then run
type-check to verify the change.
| const response = await fetch(`${api}/files/upload-simple`, { | ||
| method: "POST", | ||
| headers: { | ||
| Authorization: `Bearer ${localStorage.getItem("access_token")}`, | ||
| }, | ||
| body: formData, | ||
| }) | ||
|
|
||
| if (!response.ok) { | ||
| let message = "File upload failed" | ||
| try { | ||
| const data = await response.json() | ||
| if (data?.message) message = data.message | ||
| } catch { | ||
| // ignore JSON parse errors | ||
| } | ||
| throw new Error(message) | ||
| } | ||
|
|
||
| const data = await response.json() | ||
| return data.url | ||
| } | ||
|
|
||
| const pollJobStatus = useCallback( | ||
| (jobId: string) => { | ||
| clearPolling() | ||
|
|
||
| const intervalId = setInterval(async () => { | ||
| try { | ||
| const response = await fetch(`${api}/asr/job-status?jobId=${jobId}`, { | ||
| headers: { | ||
| Authorization: `Bearer ${localStorage.getItem("access_token")}`, | ||
| }, | ||
| }) | ||
|
|
||
| if (!response.ok) { | ||
| throw new Error("Failed to fetch job status") | ||
| } | ||
|
|
||
| const data = await response.json() | ||
|
|
||
| if (data.status === "completed") { | ||
| clearPolling() | ||
| setJobStatus("completed") | ||
| setResult(data.outputs ?? null) | ||
| } else if (data.status === "failed") { | ||
| clearPolling() | ||
| setJobStatus("failed") | ||
| setError(data.error || "Transcription failed") | ||
| } else if (data.status === "active") { | ||
| setJobStatus("processing") | ||
| } | ||
| } catch (err) { | ||
| clearPolling() | ||
| setJobStatus("failed") | ||
| setError( | ||
| err instanceof Error | ||
| ? err.message | ||
| : "Failed to check job status", | ||
| ) | ||
| } | ||
| }, 3000) | ||
|
|
||
| pollIntervalRef.current = intervalId | ||
|
|
||
| // Hard timeout after 30 minutes | ||
| const timeoutId = setTimeout(() => { | ||
| clearPolling() | ||
| setJobStatus("failed") | ||
| setError("Transcription timed out. Please try again.") | ||
| }, 30 * 60 * 1000) | ||
|
|
||
| pollTimeoutRef.current = timeoutId | ||
| }, | ||
| [clearPolling], | ||
| ) | ||
|
|
||
| const handleSubmit = async () => { | ||
| if (!selectedFile) { | ||
| setError("Please select a file first") | ||
| return | ||
| } | ||
|
|
||
| try { | ||
| clearPolling() | ||
| setJobStatus("uploading") | ||
| setError(null) | ||
| setResult(null) | ||
| setJobId(null) | ||
| setUploadProgress(0) | ||
|
|
||
| // Upload file | ||
| const audioUrl = await uploadFile(selectedFile) | ||
| setUploadProgress(50) | ||
|
|
||
| // Start transcription job | ||
| setJobStatus("queued") | ||
| const response = await fetch(`${api}/asr/transcribe`, { | ||
| method: "POST", | ||
| headers: { | ||
| "Content-Type": "application/json", | ||
| Authorization: `Bearer ${localStorage.getItem("access_token")}`, | ||
| }, | ||
| body: JSON.stringify({ | ||
| audioUrl, | ||
| whisperModel, | ||
| refineWithLLM: true, | ||
| outputFormat, | ||
| numSpeakers: numSpeakers || undefined, | ||
| multilingual: true, | ||
| }), | ||
| }) |
There was a problem hiding this comment.
Fix API calls: api is not a string base URL.
api from @/api is the typed client object; interpolating it in a template literal coerces it to "[object Object]", so every fetch goes to "[object Object]/files/upload-simple" / "[object Object]/asr/*". That breaks uploads, transcription kicks, and status polling entirely. Please point these fetches at the real /api/v1/... endpoints (or use the typed client helpers) in all three spots.
One possible fix:
- const response = await fetch(`${api}/files/upload-simple`, {
+ const response = await fetch("/api/v1/files/upload-simple", {
@@
- const response = await fetch(`${api}/asr/job-status?jobId=${jobId}`, {
+ const response = await fetch(
+ `/api/v1/asr/job-status?jobId=${jobId}`,
+ {
@@
- const response = await fetch(`${api}/asr/transcribe`, {
+ const response = await fetch("/api/v1/asr/transcribe", {Make sure to apply the same correction anywhere else this pattern appears.
| const job = await boss.getJobById(ASRQueue, jobId) | ||
|
|
||
| if (!job) { | ||
| throw new HTTPException(404, { | ||
| message: "Job not found", | ||
| }) | ||
| } | ||
|
|
||
| const tempDir = path.join(ASR_SD_DIR, "temp", jobId) | ||
| let outputs: Record<string, any> = {} | ||
|
|
||
| // If job is completed, read output files | ||
| if (job.state === "completed") { | ||
| try { | ||
| const jobData = job.data as TranscribeJobData | ||
| const suffix = jobData.refineWithLLM ? "_refined" : "_raw" | ||
| const format = jobData.outputFormat || "json" | ||
|
|
||
| if (format === "json" || format === "all") { | ||
| const jsonPath = path.join(tempDir, `transcription${suffix}.json`) | ||
| try { | ||
| const jsonContent = await fs.readFile(jsonPath, "utf-8") | ||
| outputs.json = JSON.parse(jsonContent) | ||
| } catch (error) { | ||
| Logger.warn({ error, jsonPath }, "Failed to read JSON output") | ||
| } | ||
| } | ||
|
|
||
| if (format === "txt" || format === "all") { | ||
| const txtPath = path.join(tempDir, `transcription${suffix}.txt`) | ||
| try { | ||
| outputs.txt = await fs.readFile(txtPath, "utf-8") | ||
| } catch (error) { | ||
| Logger.warn({ error, txtPath }, "Failed to read TXT output") | ||
| } | ||
| } | ||
|
|
||
| if (format === "srt" || format === "all") { | ||
| const srtPath = path.join(tempDir, `transcription${suffix}.srt`) | ||
| try { | ||
| outputs.srt = await fs.readFile(srtPath, "utf-8") | ||
| } catch (error) { | ||
| Logger.warn({ error, srtPath }, "Failed to read SRT output") | ||
| } | ||
| } | ||
| } catch (error) { | ||
| Logger.warn({ error }, "Error reading output files") | ||
| } | ||
| } | ||
|
|
||
| return c.json({ | ||
| success: true, | ||
| jobId, | ||
| status: job.state, | ||
| createdOn: job.createdOn, | ||
| startedOn: job.startedOn, | ||
| completedOn: job.completedOn, | ||
| outputs: Object.keys(outputs).length > 0 ? outputs : undefined, |
There was a problem hiding this comment.
Job status lookup always returns 404
boss.getJobById expects the pg-boss job id that boss.send returns. Here we query with our own UUID (jobId) that only lives inside the payload, so pg-boss never finds the job and the endpoint 404s even when work is running or finished.
Use the actual boss id (queueJobId) for lookups (or publish the job with that UUID). Without this, clients can never fetch status or outputs.
-export const getJobStatusSchema = z.object({
- jobId: z.string({ message: "Invalid job ID" }).uuid(),
-})
+export const getJobStatusSchema = z.object({
+ queueJobId: z.string({ message: "Invalid job ID" }).uuid(),
+})
...
- const { jobId } = c.req.valid("query" as never) as z.infer<typeof getJobStatusSchema>
+ const { queueJobId } = c.req.valid("query" as never) as z.infer<typeof getJobStatusSchema>
...
- const job = await boss.getJobById(ASRQueue, jobId)
+ const job = await boss.getJobById(ASRQueue, queueJobId)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
server/api/asr.ts lines 161-218: the code calls boss.getJobById(jobId) using our
internal UUID (jobId) but pg-boss requires the pg-boss job id, so replace this
lookup to use the actual boss job id (queueJobId) that you must persist when
enqueuing the job; specifically, when you create/send the job, store the
returned boss id into the job payload/DB (e.g., data.queueJobId), and here read
that mapping (e.g., fetch your job record or job metadata by our UUID and
extract queueJobId) then call boss.getJobById(queueJobId); if queueJobId is
missing return 404 with a clear message. Also ensure job creation code includes
queueJobId in the stored job record so future lookups succeed.
| if self.device == "cuda" and torch.cuda.is_available(): | ||
| try: | ||
| self.diarization_pipeline.to(torch.device("cuda")) | ||
| except Exception: | ||
| # Not all components may support .to("cuda"); silently fall back | ||
| pass |
There was a problem hiding this comment.
Log CUDA migration failures instead of silently ignoring.
The try-except block silently catches failures when moving the diarization pipeline to CUDA. This makes debugging difficult when GPU acceleration isn't working as expected.
Apply this diff:
if self.device == "cuda" and torch.cuda.is_available():
try:
self.diarization_pipeline.to(torch.device("cuda"))
+ print("Diarization pipeline moved to CUDA")
except Exception:
- # Not all components may support .to("cuda"); silently fall back
- pass
+ # Not all components may support .to("cuda"); fall back to CPU
+ print("Warning: Could not move diarization pipeline to CUDA, using CPU")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self.device == "cuda" and torch.cuda.is_available(): | |
| try: | |
| self.diarization_pipeline.to(torch.device("cuda")) | |
| except Exception: | |
| # Not all components may support .to("cuda"); silently fall back | |
| pass | |
| if self.device == "cuda" and torch.cuda.is_available(): | |
| try: | |
| self.diarization_pipeline.to(torch.device("cuda")) | |
| print("Diarization pipeline moved to CUDA") | |
| except Exception: | |
| # Not all components may support .to("cuda"); fall back to CPU | |
| print("Warning: Could not move diarization pipeline to CUDA, using CPU") |
🧰 Tools
🪛 Ruff (0.14.4)
61-63: try-except-pass detected, consider logging the exception
(S110)
61-61: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In server/asr-sd/whisper_diarization.py around lines 58 to 63, the try-except
that moves the diarization pipeline to CUDA currently swallows all exceptions;
change it to catch Exception as e and log the failure (e.g.,
logger.error("Failed to move diarization pipeline to CUDA", exc_info=True) or
logger.exception(...)) before falling back to CPU so GPU migration failures are
recorded for debugging.
| # Transcribe with parallel processing | ||
| result = pipeline.transcribe( | ||
| args.audio, | ||
| language=args.language, | ||
| num_speakers=args.num_speakers, | ||
| min_speakers=args.min_speakers, | ||
| max_speakers=args.max_speakers, | ||
| multilingual=args.multilingual, | ||
| ) |
There was a problem hiding this comment.
Add error handling for the transcribe call.
The transcribe call could fail for various reasons (audio file corrupt, out of memory, model errors), but there's no try-except block around it. This will cause an unhandled exception and ugly stack trace instead of a clean error message.
+ # Transcribe with parallel processing
+ try:
result = pipeline.transcribe(
args.audio,
language=args.language,
num_speakers=args.num_speakers,
min_speakers=args.min_speakers,
max_speakers=args.max_speakers,
multilingual=args.multilingual,
)
+ except KeyboardInterrupt:
+ print("\nTranscription cancelled by user")
+ return
+ except Exception as e:
+ print(f"\nError during transcription: {e}")
+ returnCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In server/asr-sd/whisper_diarization.py around lines 415 to 423, the
pipeline.transcribe(...) call is unprotected and can raise exceptions (corrupt
audio, OOM, model errors); wrap this call in a try-except that catches broad
exceptions (Exception), log a clear error message including the exception
details and the input file/parameters, and either re-raise a custom exception or
exit/return a controlled error value so the program doesn't crash with an ugly
stack trace; ensure the logger/error handler includes the exception type and
message and optionally the stacktrace for debugging.
| async function runPythonScript( | ||
| scriptName: string, | ||
| args: string[], | ||
| ): Promise<{ stdout: string; stderr: string; exitCode: number }> { | ||
| return new Promise((resolve, reject) => { | ||
| const scriptPath = path.join(ASR_SD_DIR, scriptName) | ||
| const pythonProcess = spawn(PYTHON_EXECUTABLE, [scriptPath, ...args]) | ||
|
|
||
| let stdout = "" | ||
| let stderr = "" | ||
|
|
||
| pythonProcess.stdout.on("data", (data) => { | ||
| const output = data.toString() | ||
| stdout += output | ||
| Logger.info({ output }, "Python stdout") | ||
| }) | ||
|
|
||
| pythonProcess.stderr.on("data", (data) => { | ||
| const output = data.toString() | ||
| stderr += output | ||
| Logger.warn({ output }, "Python stderr") | ||
| }) | ||
|
|
||
| pythonProcess.on("close", (code, signal) => { | ||
| const exitCode = code ?? -1 // if null (killed by signal), treat as failure | ||
| Logger.info({ exitCode, signal }, "Python process exited") | ||
| resolve({ | ||
| stdout, | ||
| stderr, | ||
| exitCode, | ||
| }) | ||
| }) | ||
|
|
||
| pythonProcess.on("error", (error) => { | ||
| Logger.error({ error }, "Failed to start Python process") | ||
| reject(error) | ||
| }) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Add timeout mechanism for Python process execution.
The Python process has no timeout, which could cause the job to hang indefinitely if the script freezes, encounters an infinite loop, or waits for unavailable resources. This creates a reliability risk where jobs may never complete or fail gracefully.
Add a timeout with cleanup:
async function runPythonScript(
scriptName: string,
args: string[],
+ timeoutMs: number = 3600000, // 1 hour default
): Promise<{ stdout: string; stderr: string; exitCode: number }> {
return new Promise((resolve, reject) => {
const scriptPath = path.join(ASR_SD_DIR, scriptName)
const pythonProcess = spawn(PYTHON_EXECUTABLE, [scriptPath, ...args])
let stdout = ""
let stderr = ""
+ let timeoutHandle: NodeJS.Timeout | null = null
+
+ // Set up timeout
+ if (timeoutMs > 0) {
+ timeoutHandle = setTimeout(() => {
+ Logger.warn({ scriptName, timeoutMs }, "Python process timeout, killing")
+ pythonProcess.kill("SIGTERM")
+ setTimeout(() => pythonProcess.kill("SIGKILL"), 5000) // Force kill after 5s
+ }, timeoutMs)
+ }
pythonProcess.stdout.on("data", (data) => {
const output = data.toString()
stdout += output
Logger.info({ output }, "Python stdout")
})
pythonProcess.stderr.on("data", (data) => {
const output = data.toString()
stderr += output
Logger.warn({ output }, "Python stderr")
})
pythonProcess.on("close", (code, signal) => {
+ if (timeoutHandle) clearTimeout(timeoutHandle)
const exitCode = code ?? -1
Logger.info({ exitCode, signal }, "Python process exited")
resolve({
stdout,
stderr,
exitCode,
})
})
pythonProcess.on("error", (error) => {
+ if (timeoutHandle) clearTimeout(timeoutHandle)
Logger.error({ error }, "Failed to start Python process")
reject(error)
})
})
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function runPythonScript( | |
| scriptName: string, | |
| args: string[], | |
| ): Promise<{ stdout: string; stderr: string; exitCode: number }> { | |
| return new Promise((resolve, reject) => { | |
| const scriptPath = path.join(ASR_SD_DIR, scriptName) | |
| const pythonProcess = spawn(PYTHON_EXECUTABLE, [scriptPath, ...args]) | |
| let stdout = "" | |
| let stderr = "" | |
| pythonProcess.stdout.on("data", (data) => { | |
| const output = data.toString() | |
| stdout += output | |
| Logger.info({ output }, "Python stdout") | |
| }) | |
| pythonProcess.stderr.on("data", (data) => { | |
| const output = data.toString() | |
| stderr += output | |
| Logger.warn({ output }, "Python stderr") | |
| }) | |
| pythonProcess.on("close", (code, signal) => { | |
| const exitCode = code ?? -1 // if null (killed by signal), treat as failure | |
| Logger.info({ exitCode, signal }, "Python process exited") | |
| resolve({ | |
| stdout, | |
| stderr, | |
| exitCode, | |
| }) | |
| }) | |
| pythonProcess.on("error", (error) => { | |
| Logger.error({ error }, "Failed to start Python process") | |
| reject(error) | |
| }) | |
| }) | |
| } | |
| async function runPythonScript( | |
| scriptName: string, | |
| args: string[], | |
| timeoutMs: number = 3600000, // 1 hour default | |
| ): Promise<{ stdout: string; stderr: string; exitCode: number }> { | |
| return new Promise((resolve, reject) => { | |
| const scriptPath = path.join(ASR_SD_DIR, scriptName) | |
| const pythonProcess = spawn(PYTHON_EXECUTABLE, [scriptPath, ...args]) | |
| let stdout = "" | |
| let stderr = "" | |
| let timeoutHandle: NodeJS.Timeout | null = null | |
| // Set up timeout | |
| if (timeoutMs > 0) { | |
| timeoutHandle = setTimeout(() => { | |
| Logger.warn({ scriptName, timeoutMs }, "Python process timeout, killing") | |
| pythonProcess.kill("SIGTERM") | |
| setTimeout(() => pythonProcess.kill("SIGKILL"), 5000) // Force kill after 5s | |
| }, timeoutMs) | |
| } | |
| pythonProcess.stdout.on("data", (data) => { | |
| const output = data.toString() | |
| stdout += output | |
| Logger.info({ output }, "Python stdout") | |
| }) | |
| pythonProcess.stderr.on("data", (data) => { | |
| const output = data.toString() | |
| stderr += output | |
| Logger.warn({ output }, "Python stderr") | |
| }) | |
| pythonProcess.on("close", (code, signal) => { | |
| if (timeoutHandle) clearTimeout(timeoutHandle) | |
| const exitCode = code ?? -1 // if null (killed by signal), treat as failure | |
| Logger.info({ exitCode, signal }, "Python process exited") | |
| resolve({ | |
| stdout, | |
| stderr, | |
| exitCode, | |
| }) | |
| }) | |
| pythonProcess.on("error", (error) => { | |
| if (timeoutHandle) clearTimeout(timeoutHandle) | |
| Logger.error({ error }, "Failed to start Python process") | |
| reject(error) | |
| }) | |
| }) | |
| } |
🤖 Prompt for AI Agents
In server/queue/asrProcessor.ts around lines 55 to 93, the runPythonScript
function has no timeout and can hang indefinitely; add a configurable timeout
(e.g., parameter or constant like PYTHON_TIMEOUT_MS) and implement a timer that,
after the timeout, kills the spawned pythonProcess (process.kill), rejects the
promise with a timeout error, and logs the timeout; ensure you clear the timeout
on normal process close or error, and remove or guard duplicate event handlers
so the promise is resolved/rejected only once (cleanup listeners after
close/error/timeout) to avoid memory leaks or double resolutions.
| async function downloadFile(url: string, outputPath: string): Promise<void> { | ||
| const response = await fetch(url) | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to download file: ${response.status} ${response.statusText}`) | ||
| } | ||
| const buffer = await response.arrayBuffer() | ||
| await fs.writeFile(outputPath, Buffer.from(buffer)) | ||
| } |
There was a problem hiding this comment.
Add timeout and stream-based download for reliability and memory efficiency.
The current implementation has several reliability and performance concerns:
- No timeout: The fetch call could hang indefinitely on slow/unresponsive servers
- Memory inefficient: Loading the entire audio file into memory before writing is problematic for large files (multi-GB audio files are common in transcription workflows)
- No retry logic: Transient network failures will cause job failure without retry
Refactor to use streaming with timeout and basic retry:
-async function downloadFile(url: string, outputPath: string): Promise<void> {
- const response = await fetch(url)
+async function downloadFile(url: string, outputPath: string, timeoutMs = 300000): Promise<void> {
+ const controller = new AbortController()
+ const timeoutHandle = setTimeout(() => controller.abort(), timeoutMs)
+
+ try {
+ const response = await fetch(url, { signal: controller.signal })
+
if (!response.ok) {
throw new Error(`Failed to download file: ${response.status} ${response.statusText}`)
}
- const buffer = await response.arrayBuffer()
- await fs.writeFile(outputPath, Buffer.from(buffer))
+
+ if (!response.body) {
+ throw new Error("Response body is null")
+ }
+
+ // Stream directly to file
+ const fileStream = require('fs').createWriteStream(outputPath)
+ const reader = response.body.getReader()
+
+ while (true) {
+ const { done, value } = await reader.read()
+ if (done) break
+ fileStream.write(Buffer.from(value))
+ }
+
+ await new Promise((resolve, reject) => {
+ fileStream.end()
+ fileStream.on('finish', resolve)
+ fileStream.on('error', reject)
+ })
+ } finally {
+ clearTimeout(timeoutHandle)
+ }
}| async function convertAudioToOptimalFormat( | ||
| inputPath: string, | ||
| outputPath: string | ||
| ): Promise<void> { | ||
| Logger.info( | ||
| { inputPath, outputPath }, | ||
| "Converting audio to optimal format (mono, 16kHz, 16-bit PCM WAV)", | ||
| ) | ||
|
|
||
| const ffmpegArgs = [ | ||
| "-i", | ||
| inputPath, // Input file | ||
| "-ac", | ||
| "1", // Convert to mono (1 audio channel) | ||
| "-ar", | ||
| "16000", // Sample rate: 16 kHz | ||
| "-sample_fmt", | ||
| "s16", // 16-bit PCM | ||
| "-acodec", | ||
| "pcm_s16le", // PCM 16-bit little-endian codec | ||
| "-y", // Overwrite output file if exists | ||
| outputPath, | ||
| ] | ||
|
|
||
| return new Promise((resolve, reject) => { | ||
| const ffmpegProcess = spawn("ffmpeg", ffmpegArgs) | ||
|
|
||
| let stderr = "" | ||
|
|
||
| ffmpegProcess.stderr.on("data", (data) => { | ||
| stderr += data.toString() | ||
| }) | ||
|
|
||
| ffmpegProcess.on("close", (code) => { | ||
| if (code === 0) { | ||
| Logger.info({ outputPath }, "Audio conversion completed successfully") | ||
| resolve() | ||
| } else { | ||
| Logger.error({ exitCode: code, stderr }, "FFmpeg conversion failed") | ||
| reject(new Error(`FFmpeg conversion failed with code ${code}: ${stderr}`)) | ||
| } | ||
| }) | ||
|
|
||
| ffmpegProcess.on("error", (error) => { | ||
| Logger.error({ error }, "Failed to start FFmpeg process") | ||
| reject(new Error(`Failed to start FFmpeg: ${error.message}`)) | ||
| }) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Add timeout and cleanup on conversion failure.
Similar to the Python script execution, the FFmpeg process lacks a timeout mechanism. Additionally, if conversion fails partway through, a partial/corrupt output file may remain.
Apply similar timeout pattern and add cleanup:
async function convertAudioToOptimalFormat(
inputPath: string,
- outputPath: string
+ outputPath: string,
+ timeoutMs: number = 600000 // 10 minutes default
): Promise<void> {
Logger.info(
{ inputPath, outputPath },
"Converting audio to optimal format (mono, 16kHz, 16-bit PCM WAV)",
)
const ffmpegArgs = [
"-i",
inputPath,
"-ac",
"1",
"-ar",
"16000",
"-sample_fmt",
"s16",
"-acodec",
"pcm_s16le",
"-y",
outputPath,
]
return new Promise((resolve, reject) => {
const ffmpegProcess = spawn("ffmpeg", ffmpegArgs)
+ let timeoutHandle: NodeJS.Timeout | null = null
+ if (timeoutMs > 0) {
+ timeoutHandle = setTimeout(() => {
+ Logger.warn({ inputPath, timeoutMs }, "FFmpeg timeout, killing process")
+ ffmpegProcess.kill("SIGTERM")
+ setTimeout(() => ffmpegProcess.kill("SIGKILL"), 5000)
+ }, timeoutMs)
+ }
let stderr = ""
ffmpegProcess.stderr.on("data", (data) => {
stderr += data.toString()
})
ffmpegProcess.on("close", (code) => {
+ if (timeoutHandle) clearTimeout(timeoutHandle)
if (code === 0) {
Logger.info({ outputPath }, "Audio conversion completed successfully")
resolve()
} else {
Logger.error({ exitCode: code, stderr }, "FFmpeg conversion failed")
+ // Clean up partial output file
+ fs.unlink(outputPath).catch(() => {})
reject(new Error(`FFmpeg conversion failed with code ${code}: ${stderr}`))
}
})
ffmpegProcess.on("error", (error) => {
+ if (timeoutHandle) clearTimeout(timeoutHandle)
Logger.error({ error }, "Failed to start FFmpeg process")
+ fs.unlink(outputPath).catch(() => {})
reject(new Error(`Failed to start FFmpeg: ${error.message}`))
})
})
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In server/queue/asrProcessor.ts around lines 107 to 155, the FFmpeg conversion
currently has no timeout and leaves partial/corrupt output files on failure; add
a timeout similar to the Python pattern that will kill the ffmpeg process if it
exceeds a configured duration (e.g., 30s), clear the timer on normal close, and
on any non-zero exit, error, or timeout attempt to kill the child process and
unlink/remove the outputPath to clean up partial files; also ensure event
listeners are cleaned (clearTimeout and remove listeners) before
resolving/rejecting and include the stderr/timeout reason in the rejected Error
so callers get useful context.
| async function handleTranscribeJob( | ||
| boss: PgBoss, // currently unused, but kept for future extensions | ||
| job: PgBoss.Job<TranscribeJobData>, | ||
| ): Promise<void> { | ||
| const data = job.data | ||
| Logger.info({ jobId: data.jobId }, "Starting transcription job") | ||
|
|
||
| try { | ||
| // Download audio file | ||
| Logger.info({ audioUrl: data.audioUrl, audioPath: data.audioPath }, "Downloading audio file") | ||
| await downloadFile(data.audioUrl, data.audioPath) | ||
|
|
||
| // Convert audio to optimal format (mono, 16kHz, 16-bit PCM WAV) | ||
| const convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav") | ||
| Logger.info({ convertedAudioPath }, "Using converted audio path") | ||
| await convertAudioToOptimalFormat(data.audioPath, convertedAudioPath) | ||
|
|
||
| // Build command arguments for simplified Python script (ASR + Diarization only) | ||
| const args = [ | ||
| convertedAudioPath, // Use converted audio | ||
| "--whisper-model", | ||
| data.whisperModel || "turbo", | ||
| "--output", | ||
| data.outputPath + "_raw.json", // Output raw results | ||
| ] | ||
|
|
||
| if (data.language) args.push("--language", data.language) | ||
| if (data.numSpeakers) args.push("--num-speakers", data.numSpeakers.toString()) | ||
| if (data.minSpeakers) args.push("--min-speakers", data.minSpeakers.toString()) | ||
| if (data.maxSpeakers) args.push("--max-speakers", data.maxSpeakers.toString()) | ||
|
|
||
| // Default: multilingual ON unless explicitly disabled (but in your pipeline: always true) | ||
| const multilingual = data.multilingual !== false | ||
| if (multilingual) { | ||
| args.push("--multilingual") | ||
| } | ||
|
|
||
| // Use HF_TOKEN from environment or from job data | ||
| const hfToken = process.env.HF_TOKEN || data.hfToken | ||
| if (hfToken) { | ||
| args.push("--hf-token", hfToken) | ||
| } | ||
|
|
||
| // Run simplified Python script (ASR + Diarization only) | ||
| Logger.info({ args }, "Running whisper_diarization.py (ASR + Diarization)") | ||
| const result = await runPythonScript("whisper_diarization.py", args) | ||
|
|
||
| if (result.exitCode !== 0) { | ||
| throw new Error(`Transcription failed (exit ${result.exitCode}): ${result.stderr}`) | ||
| } | ||
|
|
||
| Logger.info( | ||
| { jobId: data.jobId, rawOutputPath: data.outputPath + "_raw.json" }, | ||
| "ASR + Diarization completed, starting TypeScript post-processing", | ||
| ) | ||
|
|
||
| // Read the raw transcript JSON output from Python | ||
| const rawJsonPath = data.outputPath + "_raw.json" | ||
| const rawTranscriptData = await fs.readFile(rawJsonPath, "utf-8") | ||
| const rawTranscript: TranscriptResult = JSON.parse(rawTranscriptData) | ||
|
|
||
| // Decide whether to refine with LLM (default true if undefined) | ||
| const shouldRefine = data.refineWithLLM !== false | ||
|
|
||
| let finalTranscript: TranscriptResult = rawTranscript | ||
|
|
||
| if (shouldRefine) { | ||
| Logger.info({ jobId: data.jobId }, "Starting LLM refinement in TypeScript") | ||
|
|
||
| // Check which LLM provider is configured | ||
| const { defaultBestModel } = config | ||
|
|
||
| if (!defaultBestModel) { | ||
| Logger.warn( | ||
| { | ||
| jobId: data.jobId, | ||
| }, | ||
| "No LLM provider configured for ASR refinement. Skipping refinement and using raw transcript.", | ||
| ) | ||
| } else { | ||
| try { | ||
| // Run TypeScript refinement (works with any configured LLM provider) | ||
| finalTranscript = await refineTranscript(rawTranscript, { | ||
| maxTokens: 200000, | ||
| }) | ||
| Logger.info({ jobId: data.jobId }, "LLM refinement completed successfully") | ||
| } catch (error) { | ||
| Logger.error( | ||
| { error, jobId: data.jobId }, | ||
| "LLM refinement failed, falling back to raw transcript", | ||
| ) | ||
| finalTranscript = rawTranscript | ||
| } | ||
| } | ||
| } else { | ||
| // Even without LLM refinement, merge consecutive segments deterministically | ||
| Logger.info( | ||
| { jobId: data.jobId }, | ||
| "LLM refinement disabled, merging consecutive segments without refinement", | ||
| ) | ||
| finalTranscript = { | ||
| ...rawTranscript, | ||
| segments: mergeConsecutiveSegments(rawTranscript.segments), | ||
| } | ||
| } | ||
|
|
||
| // Save final results in requested format(s) | ||
| const outputFormat = data.outputFormat || "json" | ||
| const suffix = shouldRefine ? "_refined" : "_merged" | ||
|
|
||
| if (outputFormat === "json" || outputFormat === "all") { | ||
| const jsonPath = data.outputPath + suffix + ".json" | ||
| await fs.writeFile(jsonPath, JSON.stringify(finalTranscript, null, 2), "utf-8") | ||
| Logger.info({ jobId: data.jobId, path: jsonPath }, "Saved JSON output") | ||
| } | ||
|
|
||
| if (outputFormat === "txt" || outputFormat === "all") { | ||
| const txtPath = data.outputPath + suffix + ".txt" | ||
| const txtContent = finalTranscript.segments | ||
| .map((seg) => `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`) | ||
| .join("\n") | ||
| await fs.writeFile(txtPath, txtContent, "utf-8") | ||
| Logger.info({ jobId: data.jobId, path: txtPath }, "Saved TXT output") | ||
| } | ||
|
|
||
| if (outputFormat === "srt" || outputFormat === "all") { | ||
| const srtPath = data.outputPath + suffix + ".srt" | ||
| const srtContent = finalTranscript.segments | ||
| .map((seg, idx) => { | ||
| const start = formatTimestamp(seg.start).replace(".", ",") | ||
| const end = formatTimestamp(seg.end).replace(".", ",") | ||
| const text = `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}` | ||
| return `${idx + 1}\n${start} --> ${end}\n${text}\n` | ||
| }) | ||
| .join("\n") | ||
| await fs.writeFile(srtPath, srtContent, "utf-8") | ||
| Logger.info({ jobId: data.jobId, path: srtPath }, "Saved SRT output") | ||
| } | ||
|
|
||
| Logger.info({ jobId: data.jobId }, "Transcription pipeline completed successfully") | ||
| } catch (error) { | ||
| Logger.error({ error, jobId: data.jobId }, "Transcription job failed") | ||
| throw error | ||
| } | ||
| } |
There was a problem hiding this comment.
Add cleanup for temporary files to prevent disk space exhaustion.
The function downloads audio files and creates converted versions but never cleans them up. Over time, this will lead to disk space exhaustion, especially for high-volume transcription workloads.
Add cleanup in a finally block:
async function handleTranscribeJob(
boss: PgBoss,
job: PgBoss.Job<TranscribeJobData>,
): Promise<void> {
const data = job.data
Logger.info({ jobId: data.jobId }, "Starting transcription job")
+
+ let convertedAudioPath: string | null = null
+ let rawJsonPath: string | null = null
try {
// Download audio file
Logger.info({ audioUrl: data.audioUrl, audioPath: data.audioPath }, "Downloading audio file")
await downloadFile(data.audioUrl, data.audioPath)
// Convert audio to optimal format
- const convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav")
+ convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav")
Logger.info({ convertedAudioPath }, "Using converted audio path")
await convertAudioToOptimalFormat(data.audioPath, convertedAudioPath)
// ... rest of the processing ...
// Read the raw transcript JSON output from Python
- const rawJsonPath = data.outputPath + "_raw.json"
+ rawJsonPath = data.outputPath + "_raw.json"
const rawTranscriptData = await fs.readFile(rawJsonPath, "utf-8")
// ... rest of the processing ...
Logger.info({ jobId: data.jobId }, "Transcription pipeline completed successfully")
} catch (error) {
Logger.error({ error, jobId: data.jobId }, "Transcription job failed")
throw error
+ } finally {
+ // Clean up temporary files
+ const filesToClean = [
+ data.audioPath,
+ convertedAudioPath,
+ rawJsonPath,
+ ].filter((path): path is string => path !== null)
+
+ for (const file of filesToClean) {
+ try {
+ await fs.unlink(file)
+ Logger.debug({ file }, "Cleaned up temporary file")
+ } catch (error) {
+ Logger.warn({ file, error }, "Failed to clean up temporary file")
+ }
+ }
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function handleTranscribeJob( | |
| boss: PgBoss, // currently unused, but kept for future extensions | |
| job: PgBoss.Job<TranscribeJobData>, | |
| ): Promise<void> { | |
| const data = job.data | |
| Logger.info({ jobId: data.jobId }, "Starting transcription job") | |
| try { | |
| // Download audio file | |
| Logger.info({ audioUrl: data.audioUrl, audioPath: data.audioPath }, "Downloading audio file") | |
| await downloadFile(data.audioUrl, data.audioPath) | |
| // Convert audio to optimal format (mono, 16kHz, 16-bit PCM WAV) | |
| const convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav") | |
| Logger.info({ convertedAudioPath }, "Using converted audio path") | |
| await convertAudioToOptimalFormat(data.audioPath, convertedAudioPath) | |
| // Build command arguments for simplified Python script (ASR + Diarization only) | |
| const args = [ | |
| convertedAudioPath, // Use converted audio | |
| "--whisper-model", | |
| data.whisperModel || "turbo", | |
| "--output", | |
| data.outputPath + "_raw.json", // Output raw results | |
| ] | |
| if (data.language) args.push("--language", data.language) | |
| if (data.numSpeakers) args.push("--num-speakers", data.numSpeakers.toString()) | |
| if (data.minSpeakers) args.push("--min-speakers", data.minSpeakers.toString()) | |
| if (data.maxSpeakers) args.push("--max-speakers", data.maxSpeakers.toString()) | |
| // Default: multilingual ON unless explicitly disabled (but in your pipeline: always true) | |
| const multilingual = data.multilingual !== false | |
| if (multilingual) { | |
| args.push("--multilingual") | |
| } | |
| // Use HF_TOKEN from environment or from job data | |
| const hfToken = process.env.HF_TOKEN || data.hfToken | |
| if (hfToken) { | |
| args.push("--hf-token", hfToken) | |
| } | |
| // Run simplified Python script (ASR + Diarization only) | |
| Logger.info({ args }, "Running whisper_diarization.py (ASR + Diarization)") | |
| const result = await runPythonScript("whisper_diarization.py", args) | |
| if (result.exitCode !== 0) { | |
| throw new Error(`Transcription failed (exit ${result.exitCode}): ${result.stderr}`) | |
| } | |
| Logger.info( | |
| { jobId: data.jobId, rawOutputPath: data.outputPath + "_raw.json" }, | |
| "ASR + Diarization completed, starting TypeScript post-processing", | |
| ) | |
| // Read the raw transcript JSON output from Python | |
| const rawJsonPath = data.outputPath + "_raw.json" | |
| const rawTranscriptData = await fs.readFile(rawJsonPath, "utf-8") | |
| const rawTranscript: TranscriptResult = JSON.parse(rawTranscriptData) | |
| // Decide whether to refine with LLM (default true if undefined) | |
| const shouldRefine = data.refineWithLLM !== false | |
| let finalTranscript: TranscriptResult = rawTranscript | |
| if (shouldRefine) { | |
| Logger.info({ jobId: data.jobId }, "Starting LLM refinement in TypeScript") | |
| // Check which LLM provider is configured | |
| const { defaultBestModel } = config | |
| if (!defaultBestModel) { | |
| Logger.warn( | |
| { | |
| jobId: data.jobId, | |
| }, | |
| "No LLM provider configured for ASR refinement. Skipping refinement and using raw transcript.", | |
| ) | |
| } else { | |
| try { | |
| // Run TypeScript refinement (works with any configured LLM provider) | |
| finalTranscript = await refineTranscript(rawTranscript, { | |
| maxTokens: 200000, | |
| }) | |
| Logger.info({ jobId: data.jobId }, "LLM refinement completed successfully") | |
| } catch (error) { | |
| Logger.error( | |
| { error, jobId: data.jobId }, | |
| "LLM refinement failed, falling back to raw transcript", | |
| ) | |
| finalTranscript = rawTranscript | |
| } | |
| } | |
| } else { | |
| // Even without LLM refinement, merge consecutive segments deterministically | |
| Logger.info( | |
| { jobId: data.jobId }, | |
| "LLM refinement disabled, merging consecutive segments without refinement", | |
| ) | |
| finalTranscript = { | |
| ...rawTranscript, | |
| segments: mergeConsecutiveSegments(rawTranscript.segments), | |
| } | |
| } | |
| // Save final results in requested format(s) | |
| const outputFormat = data.outputFormat || "json" | |
| const suffix = shouldRefine ? "_refined" : "_merged" | |
| if (outputFormat === "json" || outputFormat === "all") { | |
| const jsonPath = data.outputPath + suffix + ".json" | |
| await fs.writeFile(jsonPath, JSON.stringify(finalTranscript, null, 2), "utf-8") | |
| Logger.info({ jobId: data.jobId, path: jsonPath }, "Saved JSON output") | |
| } | |
| if (outputFormat === "txt" || outputFormat === "all") { | |
| const txtPath = data.outputPath + suffix + ".txt" | |
| const txtContent = finalTranscript.segments | |
| .map((seg) => `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`) | |
| .join("\n") | |
| await fs.writeFile(txtPath, txtContent, "utf-8") | |
| Logger.info({ jobId: data.jobId, path: txtPath }, "Saved TXT output") | |
| } | |
| if (outputFormat === "srt" || outputFormat === "all") { | |
| const srtPath = data.outputPath + suffix + ".srt" | |
| const srtContent = finalTranscript.segments | |
| .map((seg, idx) => { | |
| const start = formatTimestamp(seg.start).replace(".", ",") | |
| const end = formatTimestamp(seg.end).replace(".", ",") | |
| const text = `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}` | |
| return `${idx + 1}\n${start} --> ${end}\n${text}\n` | |
| }) | |
| .join("\n") | |
| await fs.writeFile(srtPath, srtContent, "utf-8") | |
| Logger.info({ jobId: data.jobId, path: srtPath }, "Saved SRT output") | |
| } | |
| Logger.info({ jobId: data.jobId }, "Transcription pipeline completed successfully") | |
| } catch (error) { | |
| Logger.error({ error, jobId: data.jobId }, "Transcription job failed") | |
| throw error | |
| } | |
| } | |
| async function handleTranscribeJob( | |
| boss: PgBoss, // currently unused, but kept for future extensions | |
| job: PgBoss.Job<TranscribeJobData>, | |
| ): Promise<void> { | |
| const data = job.data | |
| Logger.info({ jobId: data.jobId }, "Starting transcription job") | |
| let convertedAudioPath: string | null = null | |
| let rawJsonPath: string | null = null | |
| try { | |
| // Download audio file | |
| Logger.info({ audioUrl: data.audioUrl, audioPath: data.audioPath }, "Downloading audio file") | |
| await downloadFile(data.audioUrl, data.audioPath) | |
| // Convert audio to optimal format (mono, 16kHz, 16-bit PCM WAV) | |
| convertedAudioPath = data.audioPath.replace(/\.[^.]+$/, "_converted.wav") | |
| Logger.info({ convertedAudioPath }, "Using converted audio path") | |
| await convertAudioToOptimalFormat(data.audioPath, convertedAudioPath) | |
| // Build command arguments for simplified Python script (ASR + Diarization only) | |
| const args = [ | |
| convertedAudioPath, // Use converted audio | |
| "--whisper-model", | |
| data.whisperModel || "turbo", | |
| "--output", | |
| data.outputPath + "_raw.json", // Output raw results | |
| ] | |
| if (data.language) args.push("--language", data.language) | |
| if (data.numSpeakers) args.push("--num-speakers", data.numSpeakers.toString()) | |
| if (data.minSpeakers) args.push("--min-speakers", data.minSpeakers.toString()) | |
| if (data.maxSpeakers) args.push("--max-speakers", data.maxSpeakers.toString()) | |
| // Default: multilingual ON unless explicitly disabled (but in your pipeline: always true) | |
| const multilingual = data.multilingual !== false | |
| if (multilingual) { | |
| args.push("--multilingual") | |
| } | |
| // Use HF_TOKEN from environment or from job data | |
| const hfToken = process.env.HF_TOKEN || data.hfToken | |
| if (hfToken) { | |
| args.push("--hf-token", hfToken) | |
| } | |
| // Run simplified Python script (ASR + Diarization only) | |
| Logger.info({ args }, "Running whisper_diarization.py (ASR + Diarization)") | |
| const result = await runPythonScript("whisper_diarization.py", args) | |
| if (result.exitCode !== 0) { | |
| throw new Error(`Transcription failed (exit ${result.exitCode}): ${result.stderr}`) | |
| } | |
| Logger.info( | |
| { jobId: data.jobId, rawOutputPath: data.outputPath + "_raw.json" }, | |
| "ASR + Diarization completed, starting TypeScript post-processing", | |
| ) | |
| // Read the raw transcript JSON output from Python | |
| rawJsonPath = data.outputPath + "_raw.json" | |
| const rawTranscriptData = await fs.readFile(rawJsonPath, "utf-8") | |
| const rawTranscript: TranscriptResult = JSON.parse(rawTranscriptData) | |
| // Decide whether to refine with LLM (default true if undefined) | |
| const shouldRefine = data.refineWithLLM !== false | |
| let finalTranscript: TranscriptResult = rawTranscript | |
| if (shouldRefine) { | |
| Logger.info({ jobId: data.jobId }, "Starting LLM refinement in TypeScript") | |
| // Check which LLM provider is configured | |
| const { defaultBestModel } = config | |
| if (!defaultBestModel) { | |
| Logger.warn( | |
| { | |
| jobId: data.jobId, | |
| }, | |
| "No LLM provider configured for ASR refinement. Skipping refinement and using raw transcript.", | |
| ) | |
| } else { | |
| try { | |
| // Run TypeScript refinement (works with any configured LLM provider) | |
| finalTranscript = await refineTranscript(rawTranscript, { | |
| maxTokens: 200000, | |
| }) | |
| Logger.info({ jobId: data.jobId }, "LLM refinement completed successfully") | |
| } catch (error) { | |
| Logger.error( | |
| { error, jobId: data.jobId }, | |
| "LLM refinement failed, falling back to raw transcript", | |
| ) | |
| finalTranscript = rawTranscript | |
| } | |
| } | |
| } else { | |
| // Even without LLM refinement, merge consecutive segments deterministically | |
| Logger.info( | |
| { jobId: data.jobId }, | |
| "LLM refinement disabled, merging consecutive segments without refinement", | |
| ) | |
| finalTranscript = { | |
| ...rawTranscript, | |
| segments: mergeConsecutiveSegments(rawTranscript.segments), | |
| } | |
| } | |
| // Save final results in requested format(s) | |
| const outputFormat = data.outputFormat || "json" | |
| const suffix = shouldRefine ? "_refined" : "_merged" | |
| if (outputFormat === "json" || outputFormat === "all") { | |
| const jsonPath = data.outputPath + suffix + ".json" | |
| await fs.writeFile(jsonPath, JSON.stringify(finalTranscript, null, 2), "utf-8") | |
| Logger.info({ jobId: data.jobId, path: jsonPath }, "Saved JSON output") | |
| } | |
| if (outputFormat === "txt" || outputFormat === "all") { | |
| const txtPath = data.outputPath + suffix + ".txt" | |
| const txtContent = finalTranscript.segments | |
| .map((seg) => `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}`) | |
| .join("\n") | |
| await fs.writeFile(txtPath, txtContent, "utf-8") | |
| Logger.info({ jobId: data.jobId, path: txtPath }, "Saved TXT output") | |
| } | |
| if (outputFormat === "srt" || outputFormat === "all") { | |
| const srtPath = data.outputPath + suffix + ".srt" | |
| const srtContent = finalTranscript.segments | |
| .map((seg, idx) => { | |
| const start = formatTimestamp(seg.start).replace(".", ",") | |
| const end = formatTimestamp(seg.end).replace(".", ",") | |
| const text = `[${seg.speaker || "UNKNOWN"}] ${seg.text.trim()}` | |
| return `${idx + 1}\n${start} --> ${end}\n${text}\n` | |
| }) | |
| .join("\n") | |
| await fs.writeFile(srtPath, srtContent, "utf-8") | |
| Logger.info({ jobId: data.jobId, path: srtPath }, "Saved SRT output") | |
| } | |
| Logger.info({ jobId: data.jobId }, "Transcription pipeline completed successfully") | |
| } catch (error) { | |
| Logger.error({ error, jobId: data.jobId }, "Transcription job failed") | |
| throw error | |
| } finally { | |
| // Clean up temporary files | |
| const filesToClean = [ | |
| data.audioPath, | |
| convertedAudioPath, | |
| rawJsonPath, | |
| ].filter((path): path is string => path !== null) | |
| for (const file of filesToClean) { | |
| try { | |
| await fs.unlink(file) | |
| Logger.debug({ file }, "Cleaned up temporary file") | |
| } catch (error) { | |
| Logger.warn({ file, error }, "Failed to clean up temporary file") | |
| } | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In server/queue/asrProcessor.ts around lines 158 to 302, the function downloads
the original audio and creates a converted audio file (and writes raw JSON) but
never deletes these temporary files; declare convertedAudioPath and rawJsonPath
(and any temp paths) in an outer scope before the try, add a finally block after
the try/catch that attempts to unlink (await fs.unlink) the original downloaded
file (data.audioPath), the converted audio file (convertedAudioPath), and the
intermediate raw JSON (rawJsonPath) if they exist, wrapping each unlink in its
own try/catch so deletion failures are logged but do not throw, and ensure the
cleanup runs regardless of success or failure of the transcription to prevent
disk space exhaustion.
Summary by CodeRabbit
New Features
Bug Fixes
Performance