AgentCore Voice Cloning Agent

A production-ready voice cloning agent built on Amazon Bedrock AgentCore, integrating state-of-the-art open-source voice cloning models with enterprise-grade infrastructure.

Features

Voice Cloning: Microsoft SpeechT5 with SpeechBrain speaker encoder for personalized voice synthesis
Full AgentCore Integration: Runtime, Memory (voice profiles), Observability
AppSync GraphQL API: Type-safe API with Lambda resolver connecting to AgentCore Runtime
Modern UI: Svelte-based responsive interface with Cognito authentication
Enterprise Security: Encryption at rest/transit, OAuth2, RBAC, audit logging
Fully Automated Deployment: CDK-managed infrastructure with zero manual steps
Amplify Hosting: Production-ready frontend with automatic deployments

Architecture

Autonomous LLM-Powered Voice Cloning:

User (GraphQL API) → Amplify UI → AppSync GraphQL → Lambda Resolver
                                                        ↓
                                      AgentCore Runtime (Strands Agent)
                                                        ↓
                                      Amazon Nova Premier (LLM Reasoning)
                                                        ↓
                                      Voice Tools (Clone/Create/List)
                                                        ↓
                                      Voice Models (SpeechT5/SpeechBrain/HiFiGAN)
                                                        ↓
                                      S3 Storage (Audio + Profiles)

Key Innovation: Uses Strands Agents framework with Amazon Bedrock Nova Premier for autonomous natural language understanding and tool execution.

Note: Voice profiles and audio are stored directly in S3, not using AgentCore Memory primitive.

Quick Start

Prerequisites

Python 3.10 or newer
Node.js 18+ and npm (for CDK and UI)
AWS CLI v2 configured (aws configure)
AWS CDK CLI installed (npm install -g aws-cdk)
Amazon Bedrock model access enabled (AWS Console → Bedrock → Model access)
AWS Account with AgentCore permissions

Important: Ensure your AWS CLI region, Bedrock model access region, and deployment region are all the same.

Deployment

Complete deployment in 2 commands:

# 1. Deploy infrastructure (S3, Cognito, AppSync, Lambda, Amplify) and backend (AgentCore runtime with voice models)
cd .. && ./scripts/deploy_complete.sh

# 2. Deploy frontend (Amplify UI)
./scripts/deploy_amplify.sh

That's it! Your voice cloning system is now live.

Access Your Application

After deployment completes, you'll get:

Frontend URL: https://main.{app-id}.amplifyapp.com
GraphQL API: AppSync endpoint for programmatic access
AgentCore Runtime: Deployed and configured

Project Structure

.
├── voice_cloning_agent.py       # Main AgentCore agent (entrypoint)
├── voice_models.py              # Voice cloning models
├── voice_storage.py             # S3 storage operations
├── requirements.txt             # Python dependencies
├── Dockerfile                   # Container build for AgentCore Runtime
├── .bedrock_agentcore.yaml      # AgentCore config (auto-generated, gitignored)
├── .bedrock_agentcore.yaml.example  # Template for new deployments
├── architecture_diagram.png     # System architecture diagram
├── scripts/                     # Deployment scripts
│   ├── deploy_complete.sh       # Backend deployment
│   └── deploy_amplify.sh        # Frontend deployment
├── cdk/                         # Infrastructure as Code (CDK)
│   ├── lib/voice-cloning-stack.ts  # Complete infrastructure stack
│   ├── lambda/graphql-resolver/    # AppSync Lambda resolver
│   └── package.json
├── ui/                          # Svelte frontend application
│   ├── src/
│   ├── package.json
│   └── README.md                # UI-specific documentation
└── docs/                        # Additional documentation
    ├── Q_CLI_DELEGATE_GUIDE.md
    ├── THINKING_TOOL_USAGE.md
    └── VOICE_CLONING_REQUIREMENTS.md

Infrastructure

All infrastructure is managed through AWS CDK:

Core Resources

S3 Bucket: voice-cloning-audio-{account}-{region} (audio storage with CORS)
Cognito User Pool: User authentication with web and M2M clients
AppSync GraphQL API: Type-safe API with Lambda resolver
Lambda Functions: GraphQL resolver, upload/download audio handlers
Amplify App: Frontend hosting with automatic deployments
AgentCore Runtime: Voice cloning agent with Docker container

Deployment Outputs

After CDK deployment, you'll get:

Amplify App ID and URL
GraphQL API endpoint
Cognito User Pool ID and Client IDs
S3 Bucket name
Lambda function ARNs

API Operations

Autonomous Agent Interface

The system uses natural language prompts for all operations:

# Execute any voice cloning operation with natural language
mutation ExecuteAgent {
  executeAgent(prompt: "Clone voice using profile abc123 with text 'Hello world'") {
    status
    data
    llmMessage
    model
  }
}

# Create voice profile
mutation ExecuteAgent {
  executeAgent(prompt: "Create voice profile named 'My Voice' with audio data: [base64]") {
    status
    data
    llmMessage
  }
}

# List profiles
mutation ExecuteAgent {
  executeAgent(prompt: "List all voice profiles") {
    status
    data
    llmMessage
  }
}

Legacy GraphQL Queries (Backward Compatible)

# List profiles
query ListProfiles {
  listProfiles {
    profiles {
      profileId
      profileName
      createdAt
    }
  }
}

# Get profile details
query GetProfile {
  getProfile(profileId: "profile-id") {
    profileId
    profileName
    createdAt
    audioSizeBytes
  }
}

Development

Local UI Development

cd ui
npm install
npm run dev

Access at http://localhost:5173

Testing AgentCore Locally

agentcore invoke '{"operation": "list_profiles"}'

Update Deployment

# Update just infrastructure
cd cdk && npm run deploy

# Update backend + infra (also includes cdk deploy)
cd .. && ./scripts/deploy_complete.sh

# Update frontend
./scripts/deploy_amplify.sh

Features in Detail

Voice Profile Management

Create profiles from audio samples (WAV, MP3, MP4)
Automatic audio format conversion
Profile storage in S3 with metadata in AgentCore Memory

Voice Cloning

Text-to-speech with custom voice profiles using SpeechT5
SpeechBrain speaker encoder for voice embedding extraction
HiFiGAN vocoder for high-quality audio generation
Automatic text chunking for long inputs (50 words per chunk)
Audio concatenation for seamless output
Presigned S3 URLs for efficient audio delivery

Security

Authentication: Cognito User Pools with OAuth2
Authorization: AppSync with Cognito integration
Encryption: S3 SSE-S3 at rest, TLS 1.3 in transit
CORS: Configured for localhost (dev) and Amplify (prod)

Observability

AgentCore built-in observability
CloudWatch logs for all Lambda functions
AppSync query logging
Request tracing with session IDs

Troubleshooting

Common Issues

Issue: CDK deployment fails
Solution: Ensure AWS CLI is configured and you have necessary permissions

Issue: AgentCore deployment fails
Solution: Check Bedrock model access is enabled in your region

Issue: UI can't connect to API
Solution: Verify GraphQL endpoint and Cognito configuration in ui/src/config.js

Issue: Audio upload fails
Solution: Check S3 CORS configuration includes your Amplify URL

Logs

# AgentCore runtime logs
aws logs tail /aws/bedrock-agentcore/runtimes/{runtime-id} --follow

# Lambda resolver logs
aws logs tail /aws/lambda/voice-cloning-graphql-resolver --follow

# AppSync logs
# Check CloudWatch Logs in AWS Console

Performance

Latency: Sub-500ms for voice cloning operations
Throughput: Handles concurrent requests via AgentCore auto-scaling
Storage: Efficient S3 storage with presigned URLs
Caching: AgentCore Memory for profile metadata

Documentation

architecture_diagram.png: System architecture visualization
ui/README.md: Frontend-specific documentation
docs/Q_CLI_DELEGATE_GUIDE.md: Q CLI autonomous operations guide
docs/: Additional technical documentation

Configuration Files

.bedrock_agentcore.yaml: Auto-generated AgentCore configuration (gitignored)
.bedrock_agentcore.yaml.example: Template for new deployments
.env.example: Environment variable template

License

MIT License

Support

For issues and questions:

GitHub Issues: https://github.com/awsdataarchitect/voice-cloning-agentcore/issues
AWS Documentation: AgentCore Docs

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.bedrock_agentcore/voice_cloning_agent		.bedrock_agentcore/voice_cloning_agent
.kiro		.kiro
cdk		cdk
docs		docs
scripts		scripts
ui		ui
.bedrock_agentcore.yaml.example		.bedrock_agentcore.yaml.example
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
architecture_diagram.png		architecture_diagram.png
requirements.txt		requirements.txt
voice_cloning_agent.py		voice_cloning_agent.py
voice_models.py		voice_models.py
voice_storage.py		voice_storage.py

awsdataarchitect/voice-cloning-agentcore

Folders and files

Latest commit

History

Repository files navigation