A complete solution for generating beautiful animated captions from Whisper speech-to-text JSON files. This tool creates word-by-word animated captions with precise timing and exports them to FCPXML for easy import into video editors like DaVinci Resolve or Final Cut Pro.
- Convert Whisper JSON files to SRT subtitles and structured caption data
- Generate static caption PNGs (one per caption block)
- Generate word-level animated caption PNGs
- Convert PNGs to ProRes 4444 MOVs with alpha channel
- Generate FCPXML timelines for import into video editors
- Multiple render modes:
- Static captions (one PNG per caption)
- Word-by-word animation (one PNG/MOV per word)
- Frame-by-frame animation (one PNG per video frame)
- Configurable styling options:
- Text case: uppercase, lowercase, or preserve
- Highlight specific words
- Remove periods
- Hide unspoken words
- Special utilities for diagnosing and fixing issues with first words
- Node.js 14 or higher
- FFmpeg (required for MOV generation)
- DaVinci Resolve or Final Cut Pro (for importing FCPXML)
# Clone the repository
git clone https://github.com/yourusername/whisper-captions-generator.git
cd whisper-captions-generator
# Install dependencies
npm install
# Optional: Install globally
npm install -g .node src/cli/main.js --input path/to/whisper.json --output-dir ./outputOr if installed globally:
whisper-captions --input path/to/whisper.json --output-dir ./outputOptions:
--input, -i Input JSON file from speech-to-text [string]
--output-dir, -o Output directory [string] [default: "./output"]
--input-dir, -d Process all JSON files in this directory [string]
--fps, -f Frames per second [number] [default: 30]
--max-chars, -l Maximum characters per line [number] [default: 26]
--case, -c Text case: uppercase, lowercase, or preserve
[string] [choices: "uppercase", "lowercase", "preserve"]
[default: "uppercase"]
--remove-periods Remove periods from text [boolean] [default: false]
--hide-unspoken Hide unspoken words [boolean] [default: false]
--highlight Comma-separated list of words to highlight [string]
--highlight-file JSON file with words to highlight [string]
--generate-mov Generate MOV files from PNGs [boolean] [default: true]
--render-all-frames Render all frames instead of one PNG per caption
[boolean] [default: false]
--start Start time in seconds [number] [default: 0]
--duration Duration in seconds [number]
--regenerate-first-word Regenerate only the first word PNG (usually for fixing issues)
[boolean] [default: false]
--check-first-word Check for issues with the first word [boolean] [default: false]
--help, -h Show help [boolean]
--version, -v Show version number [boolean]
whisper-captions --input transcript.json --output-dir ./captionswhisper-captions --input transcript.json --output-dir ./captions --case lowercase --highlight "important,keyword,phrase"whisper-captions --input transcript.json --output-dir ./captions --generate-mov falsewhisper-captions --input transcript.json --output-dir ./captions --render-all-frames --generate-mov falsewhisper-captions --input-dir ./transcripts --output-dir ./captionswhisper-captions --input transcript.json --output-dir ./captions --check-first-wordwhisper-captions --input transcript.json --output-dir ./captions --regenerate-first-wordWhen processing completes, you'll have several output files:
output/filename.srt- SRT subtitle fileoutput/filename.group.json- Structured caption dataoutput/filename_frames/- Directory with caption block PNGsoutput/filename_word_frames/- Directory with word PNGs and MOVsoutput/filename_word_timeline.fcpxml- FCPXML file for importing into video editors
- Open your DaVinci Resolve project
- Go to File > Import > Timeline > Import AAF, EDL, XML...
- Select the generated FCPXML file
- Make sure "Automatically import source clips into media pool" is checked
- Click "Import"
- If media appears offline, right-click the clips and use "Relink Selected Clips" to locate the MOV files
- Open Final Cut Pro
- Go to File > Import > XML
- Select the generated FCPXML file
- The captions will be imported as a new project
You can also use this library programmatically in your own Node.js projects:
const {
processJsonFile,
CaptionRenderer,
RenderMode,
} = require("whisper-captions-generator");
async function generateCaptions() {
try {
const result = await processJsonFile({
inputFile: "path/to/whisper.json",
outputDir: "./output",
fps: 30,
textCase: "uppercase",
removePeriods: false,
hideUnspoken: false,
highlightWords: ["important", "words"],
maxCharsPerLine: 26,
generateMov: true,
renderAllFrames: false,
});
console.log("Caption generation completed!", result);
} catch (error) {
console.error("Error generating captions:", error);
}
}
generateCaptions();This project is licensed under the MIT License - see the LICENSE file for details.