ChatterCast is a project that allows you to generate a podcast based on the uploaded files realized for the Neural Networks Architectures course at AGH university. It is not meant to be a fully functional podcast generator, but rather a proof of concept that demonstrates the capabilities of the ChatterBox model for generating audio content.
The project is built using Streamlit for the (vibe-coded) web interface and ChatterBox Turbo model for podcast generation.
The examples of generated podcasts can be found in the examples folder. The generated audio files are in .wav format
and cover the topic of Spatio-Temporal Graph Convolutional Networks (ST-GCN) for hand gesture recognition.
To use ChatterCast use Docker to build and run the application. First, fill in the HF_TOKEN environment variable in
the compose.yml file with your HuggingFace token. Then, run the following command in the terminal:
docker compose up --buildThis will build and start two containers: one for the Streamlit application and one for the ChatterBox model. The
Streamlit application will be available at http://localhost:8501.
Then, fill in the OpenAI API key in the input field in the settings page of the application.
To generate a podcast, we use the ChatterBox Turbo multilingual text-to-audio model. The model is capable of generating high-quality audio content in multiple languages based on the input text and the specified voice to clone. Both voices (Alice) and (Frank) are taken from open-source TTS datasets available online.