VoiceLint is a fast, local-first C++ application that transforms raw speech into clean, structured text — including grammar correction and summarization — using offline ASR and lightweight large language models. It features an intuitive ImGui-based user interface for real-time interaction and review.
VoiceLint provides a full offline pipeline for processing spoken language:
-
ASR Transcription
Transcribe audio using FunASR. -
Text Correction
Fix recognition errors, restore punctuation, and clean up grammar using local LLMs (LLaMA/Qwen3 via llama.cpp). -
Summarization
Generate clear summaries — paragraph-style or bullet points — from corrected text. -
GUI Interface
Built with Dear ImGui for easy, cross-platform visual interaction.
| Component | Description |
|---|---|
| Language | C++17 / C++20 |
| ASR | FunASR — fast offline transcription |
| LLMs | llama.cpp, Qwen3 — local LLM inference |
| UI | Dear ImGui — minimal, cross-platform GUI |
| Build | CMake, fully portable |
graph LR
A[Audio File] --> B[ASR *FunASR*]
B --> C[Raw Transcript]
C --> D[LLM Correction *LLaMA/Qwen3*]
D --> E[Cleaned Text]
E --> F[LLM Summarization]
F --> G[Summary Output]
git clone https://github.com/szsteven008/VoiceLint.git
cd VoiceLint
cmake -B build
cmake --build build --config release -j 8
build/bin/voicelint -c config/config.json
MIT License.
• FunASR by DAMO Academy
• llama.cpp by Georgi Gerganov
• Qwen3 by Alibaba Group
• Dear ImGui by Omar Cornut
• json by Niels Lohmann
• stb by Sean Barrett
