Whisper Captions Generator

A complete solution for generating beautiful animated captions from Whisper speech-to-text JSON files. This tool creates word-by-word animated captions with precise timing and exports them to FCPXML for easy import into video editors like DaVinci Resolve or Final Cut Pro.

Features

Convert Whisper JSON files to SRT subtitles and structured caption data
Generate static caption PNGs (one per caption block)
Generate word-level animated caption PNGs
Convert PNGs to ProRes 4444 MOVs with alpha channel
Generate FCPXML timelines for import into video editors
Multiple render modes:
- Static captions (one PNG per caption)
- Word-by-word animation (one PNG/MOV per word)
- Frame-by-frame animation (one PNG per video frame)
Configurable styling options:
- Text case: uppercase, lowercase, or preserve
- Highlight specific words
- Remove periods
- Hide unspoken words
Special utilities for diagnosing and fixing issues with first words

Requirements

Node.js 14 or higher
FFmpeg (required for MOV generation)
DaVinci Resolve or Final Cut Pro (for importing FCPXML)

Installation

# Clone the repository
git clone https://github.com/yourusername/whisper-captions-generator.git
cd whisper-captions-generator

# Install dependencies
npm install

# Optional: Install globally
npm install -g .

Usage

Basic Usage

node src/cli/main.js --input path/to/whisper.json --output-dir ./output

Or if installed globally:

whisper-captions --input path/to/whisper.json --output-dir ./output

Command-line Options

Options:
  --input, -i            Input JSON file from speech-to-text [string]
  --output-dir, -o       Output directory [string] [default: "./output"]
  --input-dir, -d        Process all JSON files in this directory [string]
  --fps, -f              Frames per second [number] [default: 30]
  --max-chars, -l        Maximum characters per line [number] [default: 26]
  --case, -c             Text case: uppercase, lowercase, or preserve
                         [string] [choices: "uppercase", "lowercase", "preserve"]
                         [default: "uppercase"]
  --remove-periods       Remove periods from text [boolean] [default: false]
  --hide-unspoken        Hide unspoken words [boolean] [default: false]
  --highlight            Comma-separated list of words to highlight [string]
  --highlight-file       JSON file with words to highlight [string]
  --generate-mov         Generate MOV files from PNGs [boolean] [default: true]
  --render-all-frames    Render all frames instead of one PNG per caption
                         [boolean] [default: false]
  --start                Start time in seconds [number] [default: 0]
  --duration             Duration in seconds [number]
  --regenerate-first-word  Regenerate only the first word PNG (usually for fixing issues)
                         [boolean] [default: false]
  --check-first-word     Check for issues with the first word [boolean] [default: false]
  --help, -h             Show help [boolean]
  --version, -v          Show version number [boolean]

Examples

Generate captions with default settings:

whisper-captions --input transcript.json --output-dir ./captions

Generate lowercase captions with specific words highlighted:

whisper-captions --input transcript.json --output-dir ./captions --case lowercase --highlight "important,keyword,phrase"

Generate only PNG files without MOV conversion:

whisper-captions --input transcript.json --output-dir ./captions --generate-mov false

Generate frame-by-frame PNGs for direct video creation:

whisper-captions --input transcript.json --output-dir ./captions --render-all-frames --generate-mov false

Process all JSON files in a directory:

whisper-captions --input-dir ./transcripts --output-dir ./captions

Check for issues with the first word:

whisper-captions --input transcript.json --output-dir ./captions --check-first-word

Regenerate the first word (if there are issues):

whisper-captions --input transcript.json --output-dir ./captions --regenerate-first-word

Output Files

When processing completes, you'll have several output files:

output/filename.srt - SRT subtitle file
output/filename.group.json - Structured caption data
output/filename_frames/ - Directory with caption block PNGs
output/filename_word_frames/ - Directory with word PNGs and MOVs
output/filename_word_timeline.fcpxml - FCPXML file for importing into video editors

Importing into Video Editors

DaVinci Resolve

Open your DaVinci Resolve project
Go to File > Import > Timeline > Import AAF, EDL, XML...
Select the generated FCPXML file
Make sure "Automatically import source clips into media pool" is checked
Click "Import"
If media appears offline, right-click the clips and use "Relink Selected Clips" to locate the MOV files

Final Cut Pro

Open Final Cut Pro
Go to File > Import > XML
Select the generated FCPXML file
The captions will be imported as a new project

Programmatic Usage

You can also use this library programmatically in your own Node.js projects:

const {
  processJsonFile,
  CaptionRenderer,
  RenderMode,
} = require("whisper-captions-generator");

async function generateCaptions() {
  try {
    const result = await processJsonFile({
      inputFile: "path/to/whisper.json",
      outputDir: "./output",
      fps: 30,
      textCase: "uppercase",
      removePeriods: false,
      hideUnspoken: false,
      highlightWords: ["important", "words"],
      maxCharsPerLine: 26,
      generateMov: true,
      renderAllFrames: false,
    });

    console.log("Caption generation completed!", result);
  } catch (error) {
    console.error("Error generating captions:", error);
  }
}

generateCaptions();

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
input		input
output		output
.gitignore		.gitignore
README.md		README.md
convert-json-to-srt.js		convert-json-to-srt.js
generate-caption-pngs-efficient.js		generate-caption-pngs-efficient.js
generate-caption-pngs-word-efficient.js		generate-caption-pngs-word-efficient.js
generate-caption-pngs.js		generate-caption-pngs.js
generate-fcpxml.js		generate-fcpxml.js
main.js		main.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Captions Generator

Features

Requirements

Installation

Usage

Basic Usage

Command-line Options

Examples

Generate captions with default settings:

Generate lowercase captions with specific words highlighted:

Generate only PNG files without MOV conversion:

Generate frame-by-frame PNGs for direct video creation:

Process all JSON files in a directory:

Check for issues with the first word:

Regenerate the first word (if there are issues):

Output Files

Importing into Video Editors

DaVinci Resolve

Final Cut Pro

Programmatic Usage

License

About

Uh oh!

Releases

Packages

Languages

vogelcodes/render-whisper-captions

Folders and files

Latest commit

History

Repository files navigation

Whisper Captions Generator

Features

Requirements

Installation

Usage

Basic Usage

Command-line Options

Examples

Generate captions with default settings:

Generate lowercase captions with specific words highlighted:

Generate only PNG files without MOV conversion:

Generate frame-by-frame PNGs for direct video creation:

Process all JSON files in a directory:

Check for issues with the first word:

Regenerate the first word (if there are issues):

Output Files

Importing into Video Editors

DaVinci Resolve

Final Cut Pro

Programmatic Usage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages