A Kimi-Audio powered single-speaker profile detector. Provide an audio clip that contains only one speaker, and get a JSON profile with design_text and design_instruct.
- CLI usage
- FastAPI service + Web UI
- MCP Server (stdio / SSE)
cd InnoFranceSpeakerDetect
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtKimi-Audio dependency (required):
git clone https://github.com/MoonshotAI/Kimi-Audio
pip install -e Kimi-AudioConfigure the model path:
cp env.example .env
# edit .envpython3 -m app.cli /path/to/speaker.wav -o speaker.jsonuvicorn app.main:app --host 0.0.0.0 --port 8012Open http://localhost:8012 for the Web UI.
The Web UI also supports an audio URL (.wav / .mp3).
stdio mode:
python3 -m app.mcp_server --transport stdioSSE mode:
python3 -m app.mcp_server --transport sse --host 127.0.0.1 --port 8013Tools:
detect_speaker(audio_path, output_path=None, model_path=None)detect_speaker_from_url(audio_url, output_path=None, model_path=None)
The output is a JSON array. Example:
[
{
"design_text": "Host responsible for leading the conversation.",
"design_instruct": "Female, around 30, medium pace, clear timbre, friendly tone."
}
]