Minimal voice cloning experiment. This repository contains a small Python app that runs a voice-cloning workflow (load model, accept input, synthesize audio). The project is lightweight and uses the uv helper/runner (see run instructions) to sync dependencies and run the app.
Clone the repo to your machine (replace <repo-url> with the repository HTTPS or SSH URL):
git clone https://github.com/arifulislamat/local-voice-cloning-app.git
cd local-voice-cloning-appThis project uses uv as the local helper for syncing and running. If your project expects a different tool, you can also use the normal Python tooling (pip, poetry, etc.).
To sync dependencies with uv:
uv syncIf you don't have uv available, use your normal environment setup (for example, create a venv and install packages listed in pyproject.toml).
Run locally (default, not publicly exposed):
uv run main.pyRun in "public" mode (if the app supports exposing an endpoint or external access):
uv run main.py --publicFallback (if you prefer running directly with Python):
source .venv/bin/activate # (linux/mac)
python main.pyNote: The app will automatically use your CUDA GPU (NVIDIA) if available for faster inference. If no compatible GPU is found, it will fall back to CPU mode automatically.
A .env file is already included in the repository for your convenience. Review and update as needed before running the app. Do not share sensitive information from .env publicly.
Example .env
# Internal environment variables (for transformers/diffusers)
TRANSFORMERS_ATTN_IMPLEMENTATION=eager # Use eager attention implementation for HuggingFace Transformers (improves compatibility)
TOKENIZERS_PARALLELISM=false # Disable parallelism in tokenizers to avoid warning spam
TRANSFORMERS_VERBOSITY=error # Only show error logs from HuggingFace Transformers
DIFFUSERS_VERBOSITY=error # Only show error logs from HuggingFace DiffusersThe following Mermaid diagram shows the core flow of the app. Save it in this README or render it in a Markdown viewer that supports Mermaid.
flowchart TD
A[User / Client] -->|Text/Audio Input| B[Gradio UI]
B --> C[main.py Handler]
C --> D[Validate & Preprocess Input]
D --> E[Load TTS Model]
E --> F[Generate Audio]
F --> G[Return Audio to User]
This diagram is intentionally generic. If you want a more detailed sequence diagram (for async tasks, queues, or third-party API calls), tell me which modules to include and I will expand it.
We welcome contributions. A minimal workflow:
- Fork the repository.
- Create a branch for your change:
git checkout -b feat/your-feature. - Make changes and add tests where applicable.
- Run any project linters/tests and ensure they pass.
- Commit with clear messages and push your branch:
git push origin feat/your-feature. - Open a Pull Request against the
mainbranch, describe the change, and reference any related issues. - Address any feedback and iterate as needed.

