A production-ready voice cloning agent built on Amazon Bedrock AgentCore, integrating state-of-the-art open-source voice cloning models with enterprise-grade infrastructure.
- Voice Cloning: Microsoft SpeechT5 with SpeechBrain speaker encoder for personalized voice synthesis
- Full AgentCore Integration: Runtime, Memory (voice profiles), Observability
- AppSync GraphQL API: Type-safe API with Lambda resolver connecting to AgentCore Runtime
- Modern UI: Svelte-based responsive interface with Cognito authentication
- Enterprise Security: Encryption at rest/transit, OAuth2, RBAC, audit logging
- Fully Automated Deployment: CDK-managed infrastructure with zero manual steps
- Amplify Hosting: Production-ready frontend with automatic deployments
Autonomous LLM-Powered Voice Cloning:
User (GraphQL API) → Amplify UI → AppSync GraphQL → Lambda Resolver
↓
AgentCore Runtime (Strands Agent)
↓
Amazon Nova Premier (LLM Reasoning)
↓
Voice Tools (Clone/Create/List)
↓
Voice Models (SpeechT5/SpeechBrain/HiFiGAN)
↓
S3 Storage (Audio + Profiles)
Key Innovation: Uses Strands Agents framework with Amazon Bedrock Nova Premier for autonomous natural language understanding and tool execution.
Note: Voice profiles and audio are stored directly in S3, not using AgentCore Memory primitive.
- Python 3.10 or newer
- Node.js 18+ and npm (for CDK and UI)
- AWS CLI v2 configured (
aws configure) - AWS CDK CLI installed (
npm install -g aws-cdk) - Amazon Bedrock model access enabled (AWS Console → Bedrock → Model access)
- AWS Account with AgentCore permissions
Important: Ensure your AWS CLI region, Bedrock model access region, and deployment region are all the same.
Complete deployment in 2 commands:
# 1. Deploy infrastructure (S3, Cognito, AppSync, Lambda, Amplify) and backend (AgentCore runtime with voice models)
cd .. && ./scripts/deploy_complete.sh
# 2. Deploy frontend (Amplify UI)
./scripts/deploy_amplify.shThat's it! Your voice cloning system is now live.
After deployment completes, you'll get:
- Frontend URL:
https://main.{app-id}.amplifyapp.com - GraphQL API: AppSync endpoint for programmatic access
- AgentCore Runtime: Deployed and configured
.
├── voice_cloning_agent.py # Main AgentCore agent (entrypoint)
├── voice_models.py # Voice cloning models
├── voice_storage.py # S3 storage operations
├── requirements.txt # Python dependencies
├── Dockerfile # Container build for AgentCore Runtime
├── .bedrock_agentcore.yaml # AgentCore config (auto-generated, gitignored)
├── .bedrock_agentcore.yaml.example # Template for new deployments
├── architecture_diagram.png # System architecture diagram
├── scripts/ # Deployment scripts
│ ├── deploy_complete.sh # Backend deployment
│ └── deploy_amplify.sh # Frontend deployment
├── cdk/ # Infrastructure as Code (CDK)
│ ├── lib/voice-cloning-stack.ts # Complete infrastructure stack
│ ├── lambda/graphql-resolver/ # AppSync Lambda resolver
│ └── package.json
├── ui/ # Svelte frontend application
│ ├── src/
│ ├── package.json
│ └── README.md # UI-specific documentation
└── docs/ # Additional documentation
├── Q_CLI_DELEGATE_GUIDE.md
├── THINKING_TOOL_USAGE.md
└── VOICE_CLONING_REQUIREMENTS.md
All infrastructure is managed through AWS CDK:
- S3 Bucket:
voice-cloning-audio-{account}-{region}(audio storage with CORS) - Cognito User Pool: User authentication with web and M2M clients
- AppSync GraphQL API: Type-safe API with Lambda resolver
- Lambda Functions: GraphQL resolver, upload/download audio handlers
- Amplify App: Frontend hosting with automatic deployments
- AgentCore Runtime: Voice cloning agent with Docker container
After CDK deployment, you'll get:
- Amplify App ID and URL
- GraphQL API endpoint
- Cognito User Pool ID and Client IDs
- S3 Bucket name
- Lambda function ARNs
The system uses natural language prompts for all operations:
# Execute any voice cloning operation with natural language
mutation ExecuteAgent {
executeAgent(prompt: "Clone voice using profile abc123 with text 'Hello world'") {
status
data
llmMessage
model
}
}
# Create voice profile
mutation ExecuteAgent {
executeAgent(prompt: "Create voice profile named 'My Voice' with audio data: [base64]") {
status
data
llmMessage
}
}
# List profiles
mutation ExecuteAgent {
executeAgent(prompt: "List all voice profiles") {
status
data
llmMessage
}
}# List profiles
query ListProfiles {
listProfiles {
profiles {
profileId
profileName
createdAt
}
}
}
# Get profile details
query GetProfile {
getProfile(profileId: "profile-id") {
profileId
profileName
createdAt
audioSizeBytes
}
}cd ui
npm install
npm run devAccess at http://localhost:5173
agentcore invoke '{"operation": "list_profiles"}'# Update just infrastructure
cd cdk && npm run deploy
# Update backend + infra (also includes cdk deploy)
cd .. && ./scripts/deploy_complete.sh
# Update frontend
./scripts/deploy_amplify.sh- Create profiles from audio samples (WAV, MP3, MP4)
- Automatic audio format conversion
- Profile storage in S3 with metadata in AgentCore Memory
- Text-to-speech with custom voice profiles using SpeechT5
- SpeechBrain speaker encoder for voice embedding extraction
- HiFiGAN vocoder for high-quality audio generation
- Automatic text chunking for long inputs (50 words per chunk)
- Audio concatenation for seamless output
- Presigned S3 URLs for efficient audio delivery
- Authentication: Cognito User Pools with OAuth2
- Authorization: AppSync with Cognito integration
- Encryption: S3 SSE-S3 at rest, TLS 1.3 in transit
- CORS: Configured for localhost (dev) and Amplify (prod)
- AgentCore built-in observability
- CloudWatch logs for all Lambda functions
- AppSync query logging
- Request tracing with session IDs
Issue: CDK deployment fails
Solution: Ensure AWS CLI is configured and you have necessary permissions
Issue: AgentCore deployment fails
Solution: Check Bedrock model access is enabled in your region
Issue: UI can't connect to API
Solution: Verify GraphQL endpoint and Cognito configuration in ui/src/config.js
Issue: Audio upload fails
Solution: Check S3 CORS configuration includes your Amplify URL
# AgentCore runtime logs
aws logs tail /aws/bedrock-agentcore/runtimes/{runtime-id} --follow
# Lambda resolver logs
aws logs tail /aws/lambda/voice-cloning-graphql-resolver --follow
# AppSync logs
# Check CloudWatch Logs in AWS Console- Latency: Sub-500ms for voice cloning operations
- Throughput: Handles concurrent requests via AgentCore auto-scaling
- Storage: Efficient S3 storage with presigned URLs
- Caching: AgentCore Memory for profile metadata
- architecture_diagram.png: System architecture visualization
- ui/README.md: Frontend-specific documentation
- docs/Q_CLI_DELEGATE_GUIDE.md: Q CLI autonomous operations guide
- docs/: Additional technical documentation
.bedrock_agentcore.yaml: Auto-generated AgentCore configuration (gitignored).bedrock_agentcore.yaml.example: Template for new deployments.env.example: Environment variable template
MIT License
For issues and questions:
- GitHub Issues: https://github.com/awsdataarchitect/voice-cloning-agentcore/issues
- AWS Documentation: AgentCore Docs
