VoiceMorph is a modular, AI-powered voice conversion platform that allows users to record and send their own voice, train a model based on a target speaker, and transform new voice inputs into the target voice.
The system routes conversion tasks to external rvc APIs (currently Replicate), prioritizes them based on user subscription tier, and will eventually migrate to an in-house, cost-efficient inference service.
- FastAPI for API server
- Beanie for MongoDB ODM
- Docker / Docker Compose for containerization
- fastapi-mongo-base Python framework (published on PyPI) for scaffolding and development structure
- Replicate (temporary) for voice conversion backend
- Modular app structure supporting task-based execution
.
├── app/
│ ├── apps/
│ │ ├── neda/ # voice conversion logic
│ │ └── voice/ # voice-related APIs
│ ├── server/ # App server and configuration
│ ├── utils/ # Utilities: finance logic, media handling, etc.
│ └── Dockerfile
├── docker-compose.yml
├── README.md
└── sample.env
-
apps.voice
:
Handles management of available target voice models.schemas.py
: Data structures for request/response validationmodels.py
: MongoDB voice model definitionsservices.py
: Business logic for managing voice modelsroutes.py
: REST endpoints for adding and listing models
-
apps.neda
:
Contains the core voice conversion APIs.- Accepts voice conversion requests from users
- Sends tasks to the external RVC backend (Replicate)
- Selects the appropriate target voice model
-
utils.finance
:
Connects to an external finance microservice- Fetches user subscription info
- Prioritizes tasks based on subscription tier
-
server.worker
:
Manages periodic background checks on tasks- Detects and retries incomplete conversions
- Handles failure recovery in case of webhook errors
git clone https://github.com/your-org/voicemorph.git
cd voicemorph
Copy the example environment file and fill in required values:
cp sample.env .env
docker-compose up --build
The app should be available at http://localhost:8000
.
- 🔄 Migrate from Replicate to self-hosted GPU inference
- 📊 Add user dashboard for voice model and task management
MIT License