Skip to content

An autonomous voice cloning agent powered by Amazon Bedrock AgentCore, Strands framework, and Nova Premier Model that captures your voice profile & synthesizes speech with intelligent tool execution.

Notifications You must be signed in to change notification settings

awsdataarchitect/voice-cloning-agentcore

Repository files navigation

AgentCore Voice Cloning Agent

A production-ready voice cloning agent built on Amazon Bedrock AgentCore, integrating state-of-the-art open-source voice cloning models with enterprise-grade infrastructure.

Features

  • Voice Cloning: Microsoft SpeechT5 with SpeechBrain speaker encoder for personalized voice synthesis
  • Full AgentCore Integration: Runtime, Memory (voice profiles), Observability
  • AppSync GraphQL API: Type-safe API with Lambda resolver connecting to AgentCore Runtime
  • Modern UI: Svelte-based responsive interface with Cognito authentication
  • Enterprise Security: Encryption at rest/transit, OAuth2, RBAC, audit logging
  • Fully Automated Deployment: CDK-managed infrastructure with zero manual steps
  • Amplify Hosting: Production-ready frontend with automatic deployments

Architecture

Architecture Diagram

Autonomous LLM-Powered Voice Cloning:

User (GraphQL API) → Amplify UI → AppSync GraphQL → Lambda Resolver
                                                        ↓
                                      AgentCore Runtime (Strands Agent)
                                                        ↓
                                      Amazon Nova Premier (LLM Reasoning)
                                                        ↓
                                      Voice Tools (Clone/Create/List)
                                                        ↓
                                      Voice Models (SpeechT5/SpeechBrain/HiFiGAN)
                                                        ↓
                                      S3 Storage (Audio + Profiles)

Key Innovation: Uses Strands Agents framework with Amazon Bedrock Nova Premier for autonomous natural language understanding and tool execution.

Note: Voice profiles and audio are stored directly in S3, not using AgentCore Memory primitive.

Quick Start

Prerequisites

  • Python 3.10 or newer
  • Node.js 18+ and npm (for CDK and UI)
  • AWS CLI v2 configured (aws configure)
  • AWS CDK CLI installed (npm install -g aws-cdk)
  • Amazon Bedrock model access enabled (AWS Console → Bedrock → Model access)
  • AWS Account with AgentCore permissions

Important: Ensure your AWS CLI region, Bedrock model access region, and deployment region are all the same.

Deployment

Complete deployment in 2 commands:

# 1. Deploy infrastructure (S3, Cognito, AppSync, Lambda, Amplify) and backend (AgentCore runtime with voice models)
cd .. && ./scripts/deploy_complete.sh

# 2. Deploy frontend (Amplify UI)
./scripts/deploy_amplify.sh

That's it! Your voice cloning system is now live.

Access Your Application

After deployment completes, you'll get:

  • Frontend URL: https://main.{app-id}.amplifyapp.com
  • GraphQL API: AppSync endpoint for programmatic access
  • AgentCore Runtime: Deployed and configured

Project Structure

.
├── voice_cloning_agent.py       # Main AgentCore agent (entrypoint)
├── voice_models.py              # Voice cloning models
├── voice_storage.py             # S3 storage operations
├── requirements.txt             # Python dependencies
├── Dockerfile                   # Container build for AgentCore Runtime
├── .bedrock_agentcore.yaml      # AgentCore config (auto-generated, gitignored)
├── .bedrock_agentcore.yaml.example  # Template for new deployments
├── architecture_diagram.png     # System architecture diagram
├── scripts/                     # Deployment scripts
│   ├── deploy_complete.sh       # Backend deployment
│   └── deploy_amplify.sh        # Frontend deployment
├── cdk/                         # Infrastructure as Code (CDK)
│   ├── lib/voice-cloning-stack.ts  # Complete infrastructure stack
│   ├── lambda/graphql-resolver/    # AppSync Lambda resolver
│   └── package.json
├── ui/                          # Svelte frontend application
│   ├── src/
│   ├── package.json
│   └── README.md                # UI-specific documentation
└── docs/                        # Additional documentation
    ├── Q_CLI_DELEGATE_GUIDE.md
    ├── THINKING_TOOL_USAGE.md
    └── VOICE_CLONING_REQUIREMENTS.md

Infrastructure

All infrastructure is managed through AWS CDK:

Core Resources

  • S3 Bucket: voice-cloning-audio-{account}-{region} (audio storage with CORS)
  • Cognito User Pool: User authentication with web and M2M clients
  • AppSync GraphQL API: Type-safe API with Lambda resolver
  • Lambda Functions: GraphQL resolver, upload/download audio handlers
  • Amplify App: Frontend hosting with automatic deployments
  • AgentCore Runtime: Voice cloning agent with Docker container

Deployment Outputs

After CDK deployment, you'll get:

  • Amplify App ID and URL
  • GraphQL API endpoint
  • Cognito User Pool ID and Client IDs
  • S3 Bucket name
  • Lambda function ARNs

API Operations

Autonomous Agent Interface

The system uses natural language prompts for all operations:

# Execute any voice cloning operation with natural language
mutation ExecuteAgent {
  executeAgent(prompt: "Clone voice using profile abc123 with text 'Hello world'") {
    status
    data
    llmMessage
    model
  }
}

# Create voice profile
mutation ExecuteAgent {
  executeAgent(prompt: "Create voice profile named 'My Voice' with audio data: [base64]") {
    status
    data
    llmMessage
  }
}

# List profiles
mutation ExecuteAgent {
  executeAgent(prompt: "List all voice profiles") {
    status
    data
    llmMessage
  }
}

Legacy GraphQL Queries (Backward Compatible)

# List profiles
query ListProfiles {
  listProfiles {
    profiles {
      profileId
      profileName
      createdAt
    }
  }
}

# Get profile details
query GetProfile {
  getProfile(profileId: "profile-id") {
    profileId
    profileName
    createdAt
    audioSizeBytes
  }
}

Development

Local UI Development

cd ui
npm install
npm run dev

Access at http://localhost:5173

Testing AgentCore Locally

agentcore invoke '{"operation": "list_profiles"}'

Update Deployment

# Update just infrastructure
cd cdk && npm run deploy

# Update backend + infra (also includes cdk deploy)
cd .. && ./scripts/deploy_complete.sh

# Update frontend
./scripts/deploy_amplify.sh

Features in Detail

Voice Profile Management

  • Create profiles from audio samples (WAV, MP3, MP4)
  • Automatic audio format conversion
  • Profile storage in S3 with metadata in AgentCore Memory

Voice Cloning

  • Text-to-speech with custom voice profiles using SpeechT5
  • SpeechBrain speaker encoder for voice embedding extraction
  • HiFiGAN vocoder for high-quality audio generation
  • Automatic text chunking for long inputs (50 words per chunk)
  • Audio concatenation for seamless output
  • Presigned S3 URLs for efficient audio delivery

Security

  • Authentication: Cognito User Pools with OAuth2
  • Authorization: AppSync with Cognito integration
  • Encryption: S3 SSE-S3 at rest, TLS 1.3 in transit
  • CORS: Configured for localhost (dev) and Amplify (prod)

Observability

  • AgentCore built-in observability
  • CloudWatch logs for all Lambda functions
  • AppSync query logging
  • Request tracing with session IDs

Troubleshooting

Common Issues

Issue: CDK deployment fails
Solution: Ensure AWS CLI is configured and you have necessary permissions

Issue: AgentCore deployment fails
Solution: Check Bedrock model access is enabled in your region

Issue: UI can't connect to API
Solution: Verify GraphQL endpoint and Cognito configuration in ui/src/config.js

Issue: Audio upload fails
Solution: Check S3 CORS configuration includes your Amplify URL

Logs

# AgentCore runtime logs
aws logs tail /aws/bedrock-agentcore/runtimes/{runtime-id} --follow

# Lambda resolver logs
aws logs tail /aws/lambda/voice-cloning-graphql-resolver --follow

# AppSync logs
# Check CloudWatch Logs in AWS Console

Performance

  • Latency: Sub-500ms for voice cloning operations
  • Throughput: Handles concurrent requests via AgentCore auto-scaling
  • Storage: Efficient S3 storage with presigned URLs
  • Caching: AgentCore Memory for profile metadata

Documentation

Configuration Files

  • .bedrock_agentcore.yaml: Auto-generated AgentCore configuration (gitignored)
  • .bedrock_agentcore.yaml.example: Template for new deployments
  • .env.example: Environment variable template

License

MIT License

Support

For issues and questions:

About

An autonomous voice cloning agent powered by Amazon Bedrock AgentCore, Strands framework, and Nova Premier Model that captures your voice profile & synthesizes speech with intelligent tool execution.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published