GitHub - mryt66/ATTA: ATTA is a voice chatbot. It gets audio and returns proceed tts audio by LLM.

Api Audio to text to audio

ATTA is a voice chatbot. It gets audio and returns proceed audio by LLM. For now everything is running on Fast API.

The program is currently using my tunned version of whisper-base model "marcsixtysix/whisper-base-pl" to speech recognition. You can see the model in my github repository: https://github.com/mryt66/Speech-recognition-pl

The output from the speech recognition model is processed by a language tool that formats the text before sending it to the large language model "marcsixtysix/gemma-3-4b-it-pl-polqa" via Ollama. For this project, I have fine-tuned the Gemma-3-1B-IT model to function as a Polish-language Q&A system. You can find it here: https://huggingface.co/marcsixtysix/gemma-3-1b-it-pl-polqa-GGUF

Once a response is generated by the LLM, it is passed to the edge_tts, which converts the text into voice output.

Example of program usage

Api's endpoints

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
app		app
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Api Audio to text to audio

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mryt66/ATTA

Folders and files

Latest commit

History

Repository files navigation

Api Audio to text to audio

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages