Skip to content

vogelcodes/render-whisper-captions

Repository files navigation

Whisper Captions Generator

A complete solution for generating beautiful animated captions from Whisper speech-to-text JSON files. This tool creates word-by-word animated captions with precise timing and exports them to FCPXML for easy import into video editors like DaVinci Resolve or Final Cut Pro.

Features

  • Convert Whisper JSON files to SRT subtitles and structured caption data
  • Generate static caption PNGs (one per caption block)
  • Generate word-level animated caption PNGs
  • Convert PNGs to ProRes 4444 MOVs with alpha channel
  • Generate FCPXML timelines for import into video editors
  • Multiple render modes:
    • Static captions (one PNG per caption)
    • Word-by-word animation (one PNG/MOV per word)
    • Frame-by-frame animation (one PNG per video frame)
  • Configurable styling options:
    • Text case: uppercase, lowercase, or preserve
    • Highlight specific words
    • Remove periods
    • Hide unspoken words
  • Special utilities for diagnosing and fixing issues with first words

Requirements

  • Node.js 14 or higher
  • FFmpeg (required for MOV generation)
  • DaVinci Resolve or Final Cut Pro (for importing FCPXML)

Installation

# Clone the repository
git clone https://github.com/yourusername/whisper-captions-generator.git
cd whisper-captions-generator

# Install dependencies
npm install

# Optional: Install globally
npm install -g .

Usage

Basic Usage

node src/cli/main.js --input path/to/whisper.json --output-dir ./output

Or if installed globally:

whisper-captions --input path/to/whisper.json --output-dir ./output

Command-line Options

Options:
  --input, -i            Input JSON file from speech-to-text [string]
  --output-dir, -o       Output directory [string] [default: "./output"]
  --input-dir, -d        Process all JSON files in this directory [string]
  --fps, -f              Frames per second [number] [default: 30]
  --max-chars, -l        Maximum characters per line [number] [default: 26]
  --case, -c             Text case: uppercase, lowercase, or preserve
                         [string] [choices: "uppercase", "lowercase", "preserve"]
                         [default: "uppercase"]
  --remove-periods       Remove periods from text [boolean] [default: false]
  --hide-unspoken        Hide unspoken words [boolean] [default: false]
  --highlight            Comma-separated list of words to highlight [string]
  --highlight-file       JSON file with words to highlight [string]
  --generate-mov         Generate MOV files from PNGs [boolean] [default: true]
  --render-all-frames    Render all frames instead of one PNG per caption
                         [boolean] [default: false]
  --start                Start time in seconds [number] [default: 0]
  --duration             Duration in seconds [number]
  --regenerate-first-word  Regenerate only the first word PNG (usually for fixing issues)
                         [boolean] [default: false]
  --check-first-word     Check for issues with the first word [boolean] [default: false]
  --help, -h             Show help [boolean]
  --version, -v          Show version number [boolean]

Examples

Generate captions with default settings:

whisper-captions --input transcript.json --output-dir ./captions

Generate lowercase captions with specific words highlighted:

whisper-captions --input transcript.json --output-dir ./captions --case lowercase --highlight "important,keyword,phrase"

Generate only PNG files without MOV conversion:

whisper-captions --input transcript.json --output-dir ./captions --generate-mov false

Generate frame-by-frame PNGs for direct video creation:

whisper-captions --input transcript.json --output-dir ./captions --render-all-frames --generate-mov false

Process all JSON files in a directory:

whisper-captions --input-dir ./transcripts --output-dir ./captions

Check for issues with the first word:

whisper-captions --input transcript.json --output-dir ./captions --check-first-word

Regenerate the first word (if there are issues):

whisper-captions --input transcript.json --output-dir ./captions --regenerate-first-word

Output Files

When processing completes, you'll have several output files:

  • output/filename.srt - SRT subtitle file
  • output/filename.group.json - Structured caption data
  • output/filename_frames/ - Directory with caption block PNGs
  • output/filename_word_frames/ - Directory with word PNGs and MOVs
  • output/filename_word_timeline.fcpxml - FCPXML file for importing into video editors

Importing into Video Editors

DaVinci Resolve

  1. Open your DaVinci Resolve project
  2. Go to File > Import > Timeline > Import AAF, EDL, XML...
  3. Select the generated FCPXML file
  4. Make sure "Automatically import source clips into media pool" is checked
  5. Click "Import"
  6. If media appears offline, right-click the clips and use "Relink Selected Clips" to locate the MOV files

Final Cut Pro

  1. Open Final Cut Pro
  2. Go to File > Import > XML
  3. Select the generated FCPXML file
  4. The captions will be imported as a new project

Programmatic Usage

You can also use this library programmatically in your own Node.js projects:

const {
  processJsonFile,
  CaptionRenderer,
  RenderMode,
} = require("whisper-captions-generator");

async function generateCaptions() {
  try {
    const result = await processJsonFile({
      inputFile: "path/to/whisper.json",
      outputDir: "./output",
      fps: 30,
      textCase: "uppercase",
      removePeriods: false,
      hideUnspoken: false,
      highlightWords: ["important", "words"],
      maxCharsPerLine: 26,
      generateMov: true,
      renderAllFrames: false,
    });

    console.log("Caption generation completed!", result);
  } catch (error) {
    console.error("Error generating captions:", error);
  }
}

generateCaptions();

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published