Skip to content

Latest commit

 

History

History
69 lines (56 loc) · 1.41 KB

File metadata and controls

69 lines (56 loc) · 1.41 KB

InnoFranceSpeakerDetect

A Kimi-Audio powered single-speaker profile detector. Provide an audio clip that contains only one speaker, and get a JSON profile with design_text and design_instruct.

UI 1

Features

  • CLI usage
  • FastAPI service + Web UI
  • MCP Server (stdio / SSE)

Installation

cd InnoFranceSpeakerDetect
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Kimi-Audio dependency (required):

git clone https://github.com/MoonshotAI/Kimi-Audio
pip install -e Kimi-Audio

Configure the model path:

cp env.example .env
# edit .env

CLI

python3 -m app.cli /path/to/speaker.wav -o speaker.json

Run the service

uvicorn app.main:app --host 0.0.0.0 --port 8012

Open http://localhost:8012 for the Web UI.

The Web UI also supports an audio URL (.wav / .mp3).

MCP Server

stdio mode:

python3 -m app.mcp_server --transport stdio

SSE mode:

python3 -m app.mcp_server --transport sse --host 127.0.0.1 --port 8013

Tools:

  • detect_speaker(audio_path, output_path=None, model_path=None)
  • detect_speaker_from_url(audio_url, output_path=None, model_path=None)

Output

The output is a JSON array. Example:

[
  {
    "design_text": "Host responsible for leading the conversation.",
    "design_instruct": "Female, around 30, medium pace, clear timbre, friendly tone."
  }
]