Syndicate Smashers

Prodigal.AI <> KodeKurrent
AI Voice Cloning Model Development Challenge

<a href="https://www.python.org/downloads/release/python-3120/">
<img src="https://img.shields.io/badge/Python-3.12+-orange" alt="Python">

Syndicate Smashers 🔥

Overview

Our Model is an advanced text-to-speech system that uses the power of large language models (LLM) for highly accurate and natural-sounding voice synthesis. It is designed to be efficient, flexible, and powerful for both research and production use.

Key Features

Simplicity and Efficiency: Built entirely on Qwen2.5, Syndicate Smasher Model eliminates the need for additional generation models like flow matching. Instead of relying on separate models to generate acoustic features, it directly reconstructs audio from the code predicted by the LLM, improving efficiency and reducing complexity.
High-Quality Voice Cloning: Supports zero-shot voice cloning, allowing it to replicate a speaker's voice without specific training data. This is ideal for cross-lingual and code-switching scenarios, ensuring seamless transitions between lvoices.
Controllable Speech Generation: Allows customization of gender, pitch, and speaking rate, making it easier to create virtual speakers.
User Authentication & Security:
- Sign-up/Login System for secure access.

Inference Overview of Voice Cloning

Inference Overview of Controlled Generation

Install

Clone and Install

Clone the repo

git clone https://github.com/bharathgaddam1712/SyndicateSmashers.git
cd SYNDICATE_SMASHER

Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
Create Conda env:

conda create -n venv -y python=3.12
conda venv
pip install -r requirements.txt

*Model Download

Download via python:

from huggingface_hub import snapshot_download

snapshot_download("SparkAudio/Spark-TTS-0.5B", local_dir="pretrained_models/Spark-TTS-0.5B")

Download via git clone:

mkdir -p pretrained_models

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B

Basic Usage

You can simply run the demo with the following commands:

cd example
bash infer.sh

Web UI Usage

You can start the UI interface by running python webui2.py --device 0, which allows you to perform Voice Cloning and Voice Creation. Voice Cloning supports uploading reference audio or directly recording the audio.

Voice Cloning	Voice Creation

Utkarsh Raj	Bharath Gaddam
utkarsh.mp3	Bharath_Gaddam.mp3

Sunny Kumar	Shivam Jogdand
Sunny_Kumar.mp3	Shivam_Jogdand.mp3

🎥 Demo Video

Demo_Video (Replace with your actual demo video link)

👥 Team Details

Name	Role
Utkarsh Raj	Deep Learning
Bharath Gaddam	AI Engineer
Sunny Kumar	Machine Learning
Shivam Jogdand	Machine Learning

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Audio_Sample		Audio_Sample
cli		cli
example		example
figures		figures
runtime/triton_trtllm		runtime/triton_trtllm
sparktts		sparktts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
users.json		users.json
webui2.py		webui2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Syndicate Smashers

Syndicate Smashers 🔥

Overview

Key Features

Install

🎥 Demo Video

👥 Team Details

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

bharathgaddam1712/SyndicateSmashers

Folders and files

Latest commit

History

Repository files navigation

Syndicate Smashers

Syndicate Smashers 🔥

Overview

Key Features

Install

🎥 Demo Video

👥 Team Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

Packages