Skip to content

mryt66/ATTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

Api Audio to text to audio

ATTA is a voice chatbot. It gets audio and returns proceed audio by LLM. For now everything is running on Fast API.

The program is currently using my tunned version of whisper-base model "marcsixtysix/whisper-base-pl" to speech recognition. You can see the model in my github repository: https://github.com/mryt66/Speech-recognition-pl

The output from the speech recognition model is processed by a language tool that formats the text before sending it to the large language model "marcsixtysix/gemma-3-4b-it-pl-polqa" via Ollama. For this project, I have fine-tuned the Gemma-3-1B-IT model to function as a Polish-language Q&A system. You can find it here: https://huggingface.co/marcsixtysix/gemma-3-1b-it-pl-polqa-GGUF

Once a response is generated by the LLM, it is passed to the edge_tts, which converts the text into voice output.


Example of program usage


Api's endpoints

About

ATTA is a voice chatbot. It gets audio and returns proceed tts audio by LLM.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages