An audio transcription tool based on Next.js and OpenAI Whisper API, supporting audio file transcription and intelligent summary generation.
- 🎯 Support both file upload and URL input
- 🎙️ Support for Xiaoyuzhou podcast transcription
- 📝 High-quality audio transcription using OpenAI Whisper API
- 📊 AI-powered content summarization
- 🎨 Modern UI design
- 💾 Download transcripts and summaries
- 🎵 Built-in audio player
- 🖥️ CLI tool support (
ptcommand) - 📋 SRT subtitle format output
- 🔄 Chunked processing for large audio files
- ⚡ Parallel transcription for better performance
- 📤 Multiple output formats (text, JSON, markdown, SRT)
npm install -g @winterfx/ptChoose one of the following methods:
Option 1: Environment Variable (Recommended)
# Add to ~/.zshrc or ~/.bashrc
export API_KEY="your-api-key"
export BASE_URL="https://api.openai.com/v1" # optionalOption 2: Config File (~/.pt/.env)
mkdir -p ~/.pt
cat > ~/.pt/.env << 'EOF'
API_KEY=your-api-key
BASE_URL=https://api.openai.com/v1
EOFOption 3: Current Directory (.env)
cat > .env << 'EOF'
API_KEY=your-api-key
BASE_URL=https://api.openai.com/v1
EOF- Node.js 18+
- OpenAI API Key
- FFmpeg (required for audio processing)
# macOS
brew install ffmpeg
# Linux (Ubuntu/Debian)
sudo apt-get install ffmpeg
# Windows
choco install ffmpeg- Clone the repository:
git clone https://github.com/yourusername/podcast-transcription.git
cd podcast-transcription- Install dependencies:
npm install
# or
yarn install
# or
pnpm install- Configure environment variables:
Create a
.env.localfile and add:
API_KEY=your_openai_api_key
BASE_URL=your_endpoint- Start the development server:
npm run dev
# or
yarn dev
# or
pnpm devVisit http://localhost:3000 to view the app.
- Build the Docker image:
docker build -t podcast-transcription .- Run the container:
docker run -p 3000:3000 podcast-transcriptionThe project includes a command-line tool pt for transcribing audio files directly from the terminal.
pt <input> [options]Arguments:
<input>- Local file path or audio URL
Options:
-s, --summary- Generate AI summary after transcription-l, --language <lang>- Language code:auto,en,zh, etc. (default:auto)-o, --output <file>- Output file path (default: stdout)--output-format <format>- Output format:text,json,markdown,srt(default:text)-q, --quiet- Suppress progress output
# Transcribe a local audio file
pt /path/to/podcast.mp3
# Transcribe with AI summary
pt podcast.mp3 --summary
# Generate SRT subtitles
pt podcast.mp3 --output-format srt -o subtitles.srt
# JSON output with summary
pt podcast.mp3 --summary --output-format json -o result.json
# Transcribe from URL
pt https://example.com/audio.mp3 --summary# Via npm script
npm run pt <input> [options]
# Or after global install
npm link
pt <input> [options]Pull Requests and Issues are welcome!
MIT License - See LICENSE file for details.
