A comprehensive collection of Python demo scripts demonstrating various Amazon IVS (Interactive Video Service) capabilities across both Real-Time Stages and Channels (low-latency HLS). This project showcases publishing, subscribing, transcription, AI video analysis, AI-powered speech-to-speech, and timed metadata publishing functionality.
This project is intended for education purposes only and not for production usage.
- Overview
- Project Structure
- Prerequisites
- Installation
- Configuration
- Sub-Projects
- Usage Examples
- Troubleshooting
- Dependencies
- Contributing
- License
- Support
This project demonstrates how to integrate Amazon IVS services with various AI and media processing capabilities:
- WebRTC Publishing: Stream video/audio content to IVS stages
- WebRTC Subscribing: Receive and process streams from IVS stages
- AI Speech-to-Speech: Integrate Amazon Nova Sonic for conversational AI
- SEI Publishing: Embed metadata directly into H.264 video streams using SEI NAL units
- Event Handling: Process real-time stage events via WebSocket connections
- Audio Visualization: Generate dynamic audio visualizations
- Channel Subscription: Subscribe to and analyze IVS channel streams
- Frame Analysis: AI-powered video frame analysis using Amazon Bedrock Claude
- Video Analysis: Comprehensive video segment analysis using TwelveLabs Pegasus
- Real-time Transcription: Convert speech to text using OpenAI Whisper
- Timed Metadata Publishing: Publish analysis results back to IVS as timed metadata
- Rendition Selection: Automatic or manual selection of stream quality
Important
Using these demos with your AWS account will create and consume AWS resources, which will cost money.
amazon-ivs-python-demos/
├── README.md # This file
├── requirements.txt # Python dependencies
├── channels-subscribe/ # IVS Channel analysis tools
│ ├── README.md # Channel tools documentation
│ ├── ivs-channel-subscribe-analyze-frames.py # Frame analysis with Claude
│ ├── ivs-channel-subscribe-analyze-video.py # Video analysis with Pegasus
│ ├── ivs-channel-subscribe-analyze-audio-video.py # Combined audio/video analysis
│ ├── ivs-channel-subscribe-transcribe.py # Real-time transcription
│ └── ivs_metadata_publisher.py # Timed metadata publisher
├── stages-publish/ # Real-Time Stages publishing
│ ├── ivs-stage-publish.py # Basic media publishing
│ ├── ivs-stage-publish-events.py # Publishing with event handling
│ └── ivs-stage-pub-sub.py # Simultaneous publish/subscribe
├── stages-subscribe/ # Real-Time Stages subscribing
│ ├── ivs-stage-subscribe-transcribe.py # Subscribe with transcription
│ ├── ivs-stage-subscribe-analyze-frames.py # Subscribe with AI frame analysis
│ └── ivs-stage-subscribe-analyze-video.py # Subscribe with AI video analysis
├── stages-nova-s2s/ # AI Speech-to-Speech
│ └── ivs-stage-nova-s2s.py # Nova Sonic integration
└── stages_sei/ # SEI Publishing System
├── SEI.md # SEI documentation and usage guide
├── sei_publisher.py # High-level SEI message publishing
└── h264_sei_patch.py # Low-level H.264 encoder patching
- Python 3.8 or higher
- AWS CLI configured with appropriate credentials
- Amazon IVS Real-Time Stage ARN and participant tokens
- FFmpeg (for media processing when using transcription demo - not necessary otherwise)
- Audio input/output devices (for speech-to-speech functionality)
Your AWS credentials need the following permissions:
For IVS Real-Time Stages:
ivs:CreateParticipantToken
bedrock:InvokeModel
(for video frame analysis with Claude)bedrock:InvokeModelWithBidirectionalStream
(for Nova Sonic)- Access to Amazon IVS Real-Time Stages
For IVS Channels:
ivs:PutMetadata
(for publishing timed metadata)bedrock:InvokeModel
(for Claude frame analysis and TwelveLabs Pegasus video analysis)- Access to Amazon IVS Channels
-
Clone and navigate to the project directory:
cd /amazon-ivs-aiortc-demos
-
Create and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate # On macOS/Linux # or .venv\Scripts\activate # On Windows
-
Install dependencies:
pip install -r requirements.txt
-
Install system dependencies:
macOS:
brew install ffmpeg portaudio
Ubuntu/Debian:
sudo apt-get update sudo apt-get install ffmpeg portaudio19-dev
Windows:
# Install FFmpeg # Download from https://ffmpeg.org/download.html and add to PATH # Or use chocolatey: choco install ffmpeg # PortAudio is typically installed automatically with pyaudio # If you encounter issues, you may need to install Microsoft Visual C++ Build Tools
Set the following environment variables or ensure AWS CLI is configured:
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
# Optional: For weather functionality in Nova speech-to-speech
export WEATHER_API_KEY=your_weather_api_key
The Nova speech-to-speech script supports weather queries through WeatherAPI.com:
- Sign up at WeatherAPI.com for a free account
- Get your API key from the dashboard
- Set the
WEATHER_API_KEY
environment variable - The AI assistant will then be able to answer weather-related questions
The channels-subscribe/
directory contains scripts for subscribing to and analyzing Amazon IVS Channels (low-latency HLS streams).
- Frame Analysis: Analyze individual video frames using Amazon Bedrock Claude models
- Video Analysis: Process video segments using TwelveLabs Pegasus for comprehensive content analysis
- Audio/Video Analysis: Combined audio and video processing with proper synchronization using PyAV
- Real-Time Transcription: Live speech-to-text using OpenAI Whisper with multi-language support
- Timed Metadata Publishing: Publish analysis results back to IVS channels as timed metadata
- Rendition Selection: Automatic or manual selection of stream quality
ivs-channel-subscribe-analyze-frames.py
- Analyzes individual video frames at configurable intervals using Amazon Bedrock Claude
- Supports multiple Claude models (Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku)
- Configurable analysis intervals for cost control
- Optional video display and rendition quality selection
ivs-channel-subscribe-analyze-video.py
- Records and analyzes video segments using TwelveLabs Pegasus
- Encodes video chunks to MP4 for comprehensive analysis
- OpenCV-based video capture with configurable recording duration
ivs-channel-subscribe-analyze-audio-video.py
- Advanced script using PyAV for proper audio/video stream handling
- Native audio capture and encoding with H.264 video and AAC audio
- Complete media analysis with TwelveLabs Pegasus
ivs-channel-subscribe-transcribe.py
- Real-time audio transcription using OpenAI Whisper
- Support for 99+ languages with auto-detection
- Multiple Whisper models from tiny to large-v3
- Optional publishing of transcripts as IVS timed metadata
ivs_metadata_publisher.py
- Reusable module for publishing timed metadata to IVS channels
- Automatic channel ARN extraction from M3U8 playlist URLs
- Rate limiting compliance and automatic payload splitting
- Support for transcripts, events, and custom metadata
# Frame analysis with Claude Sonnet 4
python channels-subscribe/ivs-channel-subscribe-analyze-frames.py \
--playlist-url "https://example.com/playlist.m3u8" \
--highest-quality
# Real-time transcription with metadata publishing
python channels-subscribe/ivs-channel-subscribe-transcribe.py \
--playlist-url "https://example.com/playlist.m3u8" \
--language en \
--whisper-model base \
--publish-transcript-as-timed-metadata
# Video analysis with TwelveLabs Pegasus
python channels-subscribe/ivs-channel-subscribe-analyze-video.py \
--playlist-url "https://example.com/playlist.m3u8" \
--analysis-duration 15 \
--show-video
For detailed documentation, see channels-subscribe/README.md
.
The stages-publish/
directory contains scripts for publishing media content to IVS Real-Time Stages.
Basic media publishing script that streams video/audio content to an IVS stage.
Features:
- Publishes video and audio tracks from MP4 files to IVS Real-Time Stages
- JWT token validation and capability checking
- WebRTC connection management
- Option to publish video-only streams
Usage:
cd stages-publish
python ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "path/to/video.mp4"
Command-line Arguments:
--token
: JWT participant token with publish capabilities (required)--path-to-mp4
: Path to MP4 file to publish (required)--video-only
: Publish video only, no audio (optional flag)
Enhanced publishing script with real-time event handling via WebSocket connections.
Features:
- All features of basic publisher
- Real-time stage event monitoring via WebSocket
- Participant join/leave notifications
- Stage state change handling
Usage:
cd stages-publish
python ivs-stage-publish-events.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "path/to/video.mp4"
Command-line Arguments:
--token
: JWT participant token with publish capabilities (required)--path-to-mp4
: Path to MP4 file to publish (required)--video-only
: Publish video only, no audio (optional flag)
Advanced script that demonstrates simultaneous publishing and subscribing capabilities.
Features:
- Publishes audio from MP4 file while subscribing to other participants
- Demonstrates bidirectional communication
- Audio/video track management
- SDP (Session Description Protocol) handling
Usage:
cd stages-publish
python ivs-stage-pub-sub.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "path/to/audio.mp4"
Command-line Arguments:
--token
: JWT participant token with both publish and subscribe capabilities (required)--path-to-mp4
: Path to MP4 file to publish audio from (required)--video-only
: Publish video only, no audio (optional flag)--subscribe-to
: List of participant IDs to subscribe to (optional)
The stages-subscribe/
directory contains scripts for receiving and processing streams from IVS Real-Time Stages.
Subscribes to IVS stage audio streams and provides real-time speech-to-text transcription using OpenAI Whisper.
Features:
- Subscribes to audio tracks from specific participants in IVS Real-Time Stages
- Real-time speech transcription using Whisper
- Audio chunk processing and buffering
- Multiple language support
- Audio format conversion and normalization
Usage:
cd stages-subscribe
python ivs-stage-subscribe-transcribe.py \
--participant-id "participant123" \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..."
Command-line Arguments:
--participant-id
: ID of the participant to subscribe to (required)--token
: JWT participant token with subscribe capabilities (required)--whisper-model
: Whisper model size - "tiny", "base", "small", "medium", "large" (default: "tiny")--fp16
: Enable FP16 precision for faster processing (default: true)--language
: Language code for transcription (default: "en")--chunk-duration
: Audio chunk duration in seconds (default: 5)
Supported Languages:
- English ("en")
- Spanish ("es")
- French ("fr")
- German ("de")
- Italian ("it")
- Portuguese ("pt")
- And many more supported by Whisper
Subscribes to IVS stage video streams and provides AI-powered video frame analysis using Amazon Bedrock Claude models for content discovery, moderation, and accessibility.
Features:
- Subscribes to video tracks from specific participants in IVS Real-Time Stages
- AI-powered video frame analysis using Claude Sonnet 4
- Configurable analysis intervals to control costs
- Support for multiple Claude models (Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku)
- Detailed frame descriptions for content moderation and accessibility
- Background processing to avoid blocking video streams
- Cost-conscious design with smart frame sampling
Usage:
cd stages-subscribe
python ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"
Command-line Arguments:
--token
: JWT participant token with subscribe capabilities (required)--subscribe-to
: Participant ID to subscribe to (required)--analysis-interval
: Time in seconds between frame analyses (default: 30.0)--bedrock-region
: AWS region for Bedrock service (default: "us-east-1")--bedrock-model-id
: Bedrock model ID for analysis (default: "us.anthropic.claude-sonnet-4-20250514-v1:0")--disable-analysis
: Disable video frame analysis, just subscribe to video (optional flag)
Supported Models:
- Claude Sonnet 4 (default):
us.anthropic.claude-sonnet-4-20250514-v1:0
- Most capable, best for complex analysis - Claude 3.5 Sonnet:
anthropic.claude-3-5-sonnet-20241022-v2:0
- Very capable, good balance of performance and cost - Claude 3.5 Haiku:
anthropic.claude-3-5-haiku-20241022-v1:0
- Fastest and cheapest, good for basic content moderation
Use Cases:
- Content Moderation: Automatically detect inappropriate content in live streams
- Content Discovery: Generate descriptions and tags for video content
- Accessibility: Create detailed descriptions for visually impaired users
- Analytics: Track objects, activities, and engagement in video streams
- Compliance: Monitor streams for regulatory compliance
Cost Control Features:
- Configurable analysis intervals (default 30 seconds to minimize costs)
- Background processing doesn't block video streaming
- Option to disable analysis entirely for testing
- Smart error handling prevents failed analyses from crashing streams
Subscribes to IVS stage audio and video streams and provides AI-powered video analysis using Amazon Bedrock TwelveLabs Pegasus for comprehensive video understanding.
Features:
- Subscribes to both audio and video tracks from specific participants
- Records short video clips (configurable duration) for analysis
- Encodes audio and video to MP4 format in memory
- AI-powered video analysis using TwelveLabs Pegasus model
- Detailed video content descriptions including people, objects, activities, and text
- Asynchronous processing to maintain stream performance
- Configurable analysis duration and frequency
Usage:
cd stages-subscribe
python ivs-stage-subscribe-analyze-video.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"
Command-line Arguments:
--token
: JWT participant token with subscribe capabilities (required)--subscribe-to
: Participant ID to subscribe to (required)--analysis-duration
: Duration in seconds for video recording before analysis (default: 10.0)--bedrock-region
: AWS region for Bedrock service (default: "us-west-2")--bedrock-model-id
: Bedrock model ID for analysis (default: "us.twelvelabs.pegasus-1-2-v1:0")--disable-analysis
: Disable video analysis, just subscribe to video (optional flag)
The stages-nova-s2s/
directory contains the most advanced script integrating Amazon Nova Sonic for AI-powered speech-to-speech functionality.
A comprehensive script that combines IVS Real-Time Stages with Amazon Nova Sonic for conversational AI experiences.
Features:
- Bidirectional audio streaming with IVS participants
- Amazon Nova Sonic integration for AI responses
- Real-time waveform visualization
- Audio resampling and format conversion
- WebRTC track management for both publishing and subscribing
- Dynamic audio visualization with gradient colormaps
- AI-powered video frame analysis using Amazon Bedrock Claude models
- Built-in tools for date/time, weather, and visual analysis
- Configurable frame analysis with multiple Claude model options
Usage:
cd stages-nova-s2s
python ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"
Command-line Arguments:
--token
: JWT participant token with both publish and subscribe capabilities (required)--subscribe-to
: Participant ID to subscribe to (required)--nova-model-id
: Amazon Nova model identifier (default: "amazon.nova-sonic-v1:0")--nova-region
: AWS region for Nova service (default: "us-east-1")--disable-frame-analysis
: Disable video frame analysis (default: enabled)--bedrock-model-id
: Bedrock model ID for frame analysis (default: "us.anthropic.claude-sonnet-4-20250514-v1:0")--bedrock-region
: AWS region for Bedrock service (default: "us-east-1")--ice-timeout
: ICE gathering timeout in seconds (default: 1, original: 5) - Lower values speed up connection establishment
Key Components:
- AgentAudioTrack: Custom audio track for streaming Nova responses
- AgentVideoTrack: Dynamic waveform visualization with thinking states
- BedrockStreamManager: Manages bidirectional Nova Sonic streaming
- Audio Processing: Handles resampling between IVS (48kHz) and Nova (16kHz)
- Tool Support: Built-in tools for date/time, weather, and video frame analysis
- Frame Analysis: Non-blocking AI-powered video frame analysis using Claude models
Available Tools:
- Date/Time Tool: Get current date and time information with timezone support
- Weather Tool: Get current weather and 5-day forecast (requires
WEATHER_API_KEY
) - Frame Analysis Tool: Analyze video frames for visual assistance and content description
The stages_sei/
directory contains a comprehensive SEI (Supplemental Enhancement Information) publishing system for embedding metadata directly into H.264 video streams.
What is SEI?
SEI NAL units are part of the H.264/AVC video compression standard that allow embedding additional metadata within the video stream itself. This metadata travels with the video frames, ensuring perfect synchronization between video content and associated data.
Key Features:
- Perfect Synchronization: Metadata is embedded directly in video frames
- Low Latency: No separate data channels needed
- Standards Compliant: Uses official H.264 specification
- Multi-format Support: Handles Annex B, AVCC, and RTP H.264 formats
- Automatic Integration: Patches aiortc and PyAV encoders automatically
- Reliable Delivery: 3x repetition with client-side deduplication
Components:
sei_publisher.py
: High-level interface for publishing SEI messagesh264_sei_patch.py
: Low-level H.264 encoder patching systemSEI.md
: Comprehensive documentation and usage guide
Usage Example:
from stages_sei import SeiPublisher, patch_h264_encoder, set_global_sei_publisher
# Apply H.264 encoder patch (do this early in your application)
patch_h264_encoder()
# Create and configure SEI publisher
sei_publisher = SeiPublisher()
set_global_sei_publisher(sei_publisher)
# Publish metadata
await sei_publisher.publish_json({
"type": "chat_message",
"user": "alice",
"message": "Hello world!",
"timestamp": time.time()
})
Integration:
The Nova speech-to-speech script (stages-nova-s2s/ivs-stage-nova-s2s.py
) demonstrates SEI publishing in action, embedding AI assistant responses directly into the video stream for synchronized delivery.
For detailed documentation, see stages_sei/SEI.md
.
Note: Utility scripts are excluded from this documentation as they are development/testing tools.
# Subscribe to IVS channel and analyze frames with Claude
python channels-subscribe/ivs-channel-subscribe-analyze-frames.py \
--playlist-url "https://example.com/playlist.m3u8" \
--highest-quality \
--analysis-interval 30
# Real-time transcription of IVS channel audio
python channels-subscribe/ivs-channel-subscribe-transcribe.py \
--playlist-url "https://example.com/playlist.m3u8" \
--language en \
--whisper-model base \
--publish-transcript-as-timed-metadata
# Comprehensive video analysis with TwelveLabs Pegasus
python channels-subscribe/ivs-channel-subscribe-analyze-video.py \
--playlist-url "https://example.com/playlist.m3u8" \
--analysis-duration 10 \
--bedrock-region us-west-2
# Combined audio/video analysis using PyAV
python channels-subscribe/ivs-channel-subscribe-analyze-audio-video.py \
--playlist-url "https://example.com/playlist.m3u8" \
--highest-quality \
--analysis-duration 15
# Publish MP4 file to IVS stage
python stages-publish/ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "sample-video.mp4"
# Publish with real-time event monitoring
python stages-publish/ivs-stage-publish-events.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "sample-video.mp4"
# Subscribe and transcribe audio in Spanish
python stages-subscribe/ivs-stage-subscribe-transcribe.py \
--participant-id "participant123" \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--language "es" \
--whisper-model "medium"
# Basic video frame analysis (every 30 seconds with Claude Sonnet 4)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"
# Frequent analysis for real-time moderation (every 5 seconds)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--analysis-interval 5.0
# Cost-effective analysis using Claude 3.5 Haiku (every 60 seconds)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--bedrock-model-id "anthropic.claude-3-5-haiku-20241022-v1:0" \
--analysis-interval 60.0
# Analysis in different AWS region
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--bedrock-region "eu-west-1"
# Subscribe to video without analysis (testing connectivity)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--disable-analysis
# Basic video analysis with TwelveLabs Pegasus
python stages-subscribe/ivs-stage-subscribe-analyze-video.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"
# Shorter video clips for more frequent analysis
python stages-subscribe/ivs-stage-subscribe-analyze-video.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--analysis-duration 5.0
# Start Nova Sonic conversation with frame analysis
python stages-nova-s2s/ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--nova-model-id "amazon.nova-sonic-v1:0" \
--nova-region "us-east-1"
# Nova conversation without frame analysis
python stages-nova-s2s/ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--disable-frame-analysis
# Nova conversation with custom Bedrock model and region
python stages-nova-s2s/ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--bedrock-model-id "anthropic.claude-3-5-sonnet-20241022-v2:0" \
--bedrock-region "us-west-2"
# Nova conversation with fast connection setup
python stages-nova-s2s/ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--ice-timeout 1
# Simultaneously publish and subscribe
python stages-publish/ivs-stage-pub-sub.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "audio-file.mp4" \
--subscribe-to "participant1" "participant2"
Use the AWS CLI to create participant tokens:
# Create a token with publish capabilities
aws ivs-realtime create-participant-token \
--stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
--user-id "user123" \
--capabilities PUBLISH \
--duration 720
# Create a token with subscribe capabilities
aws ivs-realtime create-participant-token \
--stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
--user-id "user456" \
--capabilities SUBSCRIBE \
--duration 720
# Create a token with both publish and subscribe capabilities
aws ivs-realtime create-participant-token \
--stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
--user-id "user789" \
--capabilities PUBLISH SUBSCRIBE \
--duration 720
-
"No audio stream found"
- Check if the M3U8 stream contains audio using
ffprobe
- Try different rendition quality options
- Verify stream accessibility with
curl
- Check if the M3U8 stream contains audio using
-
"Unable to open video stream"
- Verify M3U8 URL is accessible
- Check network connectivity and firewall settings
- Try different rendition selections
-
Whisper Model Issues
- Clear Whisper cache:
rm -rf ~/.cache/whisper/
- Use smaller models for memory-constrained environments
- Enable FP16 for faster processing
- Clear Whisper cache:
-
Timed Metadata Publishing Issues
- Verify AWS credentials have
ivs:PutMetadata
permissions - Check rate limiting (5 RPS per channel, 155 RPS per account)
- Ensure channel ARN extraction is working correctly
- Verify AWS credentials have
-
Audio Quality Problems
- Ensure consistent chunk sizes (512 samples recommended)
- Check audio resampling configuration
- Verify WebRTC connection stability
-
WebRTC Connection Failures
- Verify JWT token has correct capabilities
- Check network connectivity and firewall settings
- Ensure SDP munging is applied correctly
-
Nova Sonic Issues
- Verify AWS credentials have Bedrock permissions
- Check model availability in your region
- Ensure proper event sequence (START_SESSION → START_PROMPT → content)
-
Video Frame Analysis Issues
- Verify AWS credentials have
bedrock:InvokeModel
permissions - Check Claude/Pegasus model availability in your region
- Monitor analysis costs with appropriate intervals
- Ensure video track is receiving frames before analysis begins
- Verify AWS credentials have
-
Transcription Accuracy
- Use appropriate Whisper model size for your use case
- Ensure clean audio input
- Consider language-specific models
Enable debug logging for detailed troubleshooting:
export PYTHONPATH=$PYTHONPATH:.
python -c "import logging; logging.basicConfig(level=logging.DEBUG)"
python your-script.py --your-args
-
For Channel Transcription:
- Use
--whisper-model tiny
or--whisper-model base
for real-time processing - Enable FP16:
--fp16 true
- Use shorter chunks:
--chunk-duration 3
- Specify language:
--language en
(faster than auto-detect)
- Use
-
For Channel Video Analysis:
- Use
--lowest-quality
for faster processing - Adjust
--analysis-duration
based on content complexity - Run without
--show-video
for headless operation
- Use
-
For Channel Frame Analysis:
- Increase
--analysis-interval
for less frequent analysis (cost control) - Use
--lowest-quality
for faster frame processing - Choose appropriate Claude model for your use case
- Increase
-
Connection Speed:
- Use
--ice-timeout 1
for faster WebRTC connection establishment (default) - Original WebRTC ICE timeout is 5 seconds, optimized to 1 second for better user experience
- Increase timeout if experiencing connection issues in poor network conditions
- This optimization reduces startup time from ~11 seconds to ~3 seconds
- Use
-
For Nova Sonic:
- Use consistent 1ms delays between audio chunks
- Implement proper buffering strategies
- Monitor memory usage during long sessions
-
For Stage Transcription:
- Choose appropriate chunk duration (5-10 seconds)
- Use smaller Whisper models for real-time processing
- Consider GPU acceleration for large models
- For Video Frame Analysis:
- Use longer analysis intervals (30+ seconds) to control costs
- Choose appropriate Claude model for your use case:
- Claude 3.5 Haiku for basic content moderation
- Claude 3.5 Sonnet for balanced performance
- Claude Sonnet 4 for complex analysis requiring highest accuracy
- Monitor Bedrock usage and costs in AWS console
- Consider regional model availability and latency
aiortc>=1.12.0
- WebRTC implementationav>=10.0.0
- Media processingrequests>=2.28.0
- HTTP clientwebsockets>=11.0.0
- WebSocket clientnumpy>=1.21.0
- Numerical computing
whisper
(from GitHub) - Speech recognitionboto3>=1.34.0
- AWS SDK for Bedrock and IVSaws-sdk-bedrock-runtime
- Amazon Bedrock clientsmithy-aws-core>=0.0.1
- AWS SDK corepyaudio>=0.2.13
- Audio I/Orx>=3.2.0
- Reactive extensionsPillow>=10.0.0
- Image processing for video frame analysisopencv-python>=4.8.0
- Computer vision for video processing
pytz
- Timezone handlingtzlocal
- Local timezone detection
- Python 3.8+
- FFmpeg
- PortAudio (for audio I/O)
- Sufficient bandwidth for WebRTC streams
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
For issues related to:
- Amazon IVS Real-Time Stages: Check the IVS Real-Time Streaming documentation
- Amazon IVS Channels: Check the IVS Low-Latency Streaming documentation
- Amazon Nova: Check the Bedrock documentation
- Amazon Bedrock: Check the Bedrock User Guide
- aiortc: Check the aiortc documentation
- OpenAI Whisper: Check the Whisper repository
This project demonstrates advanced integration patterns between Amazon IVS services and AI capabilities. From real-time conversational AI with Nova Sonic to comprehensive video analysis with Claude and TwelveLabs Pegasus, these demos showcase the power of combining live video streaming with cutting-edge AI services.