Skip to content

Commit bd3345b

Browse files
committed
Update config examples
Signed-off-by: Michael Yuan <[email protected]>
1 parent 42325a4 commit bd3345b

File tree

7 files changed

+68
-20
lines changed

7 files changed

+68
-20
lines changed

doc/docs/config/asr.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,23 @@ sidebar_position: 2
66

77
The EchoKit server supports popular ASR providers.
88

9+
| Platform | URL example | Notes |
10+
| ------------- | ------------- | ---- |
11+
| `openai` | `https://api.openai.com/v1/audio/transcriptions` | Supports endpoint URLs from any OpenAI-compatible services, such as Groq and Open Router. |
12+
| `paraformer_v2` | `wss://dashscope.aliyuncs.com/api-ws/v1/inference` | A Web socket streaming ASR service endpoint supported by the ALi Cloud |
13+
914

1015
## OpenAI and compatible services
1116

1217
The OpenAI `/v1/audio/transcriptions` API is supported by OpenAI, Open Router, Groq, Azure, AWS and many other providers.
18+
This is a non-streaming service endpoint, meaning that EchoKit server must determine when the user is done
19+
talking (via an VAD service), and then submit the entire audio to get a transscription.
1320

1421
OpenAI example
1522

1623
```toml
1724
[asr]
25+
platform = "openai"
1826
url = "https://api.openai.com/v1/audio/transcriptions"
1927
api_key = "sk_ABCD"
2028
model = "gpt-4o-mini-transcribe"
@@ -26,6 +34,7 @@ Groq example
2634

2735
```toml
2836
[asr]
37+
platform = "openai"
2938
url = "https://api.groq.com/openai/v1/audio/transcriptions"
3039
api_key = "gsk_ABCD"
3140
model = "whisper-large-v3"
@@ -44,6 +53,8 @@ send back text and voice activity events as they happen. There is no need to a s
4453

4554
```toml
4655
[asr]
56+
platform = "paraformer_v2"
57+
url = "wss://dashscope.aliyuncs.com/api-ws/v1/inference"
4758
paraformer_token = "sk-API-KEY"
4859
```
4960

doc/docs/config/gemini-live.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ hello_wav = "hello.wav"
6565
api_key = "your_api_key_here"
6666

6767
[tts]
68-
platform = "StreamGSV"
68+
platform = "stream_gsv"
6969
url = "http://localhost:9094/v1/audio/stream_speech"
7070
speaker = "cooper"
7171

doc/docs/config/intro.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,12 @@ The rest of the `config.toml` specifies how to use different AI services. Each s
4545
* The `[llm]` section configures the [large language model](llm.md) services, including [tools](llm-tools.md) and [MCP actions](mcp.md).
4646
* The `[tts]` section configures the [text-to-voice](tts.md) services.
4747

48+
It is important to note that each of sections has those fields.
49+
50+
* A `platform` field that designates the service protocol. A common example is `openai` for OpenAI compatible API endpoints.
51+
* A `url` field for the service URL endpoint. It is typically a `https://` or `wss://` URL. The latter is the Web Socket address for streaming services.
52+
* Optional fields that are specific to the `platform`. That includes `api_key`, `model`, and others.
53+
4854
## Complete Configuration Example
4955

5056
You will need a free [API key from Groq](https://console.groq.com/keys).
@@ -54,23 +60,25 @@ You will need a free [API key from Groq](https://console.groq.com/keys).
5460
addr = "0.0.0.0:8080"
5561
hello_wav = "hello.wav"
5662

57-
# Speech recognition
63+
# Speech recognition using the OpenAI transcriptions API, but hosted by Groq (instead of OpenAI)
5864
[asr]
65+
platform = "openai"
5966
url = "https://api.groq.com/openai/v1/audio/transcriptions"
6067
lang = "en"
6168
api_key = "gsk_your_api_key_here"
6269
model = "whisper-large-v3-turbo"
6370

64-
# Language model
71+
# Language model using the OpenAI chat completions API, but hosted by Groq (instead of OpenAI)
6572
[llm]
66-
llm_chat_url = "https://api.groq.com/openai/v1/chat/completions"
73+
platform = "openai_chat"
74+
url = "https://api.groq.com/openai/v1/chat/completions"
6775
api_key = "gsk_your_api_key_here"
6876
model = "gpt-oss-20b"
6977
history = 10
7078

71-
# Text-to-speech
79+
# Text-to-speech using the OpenAI speech API, but hosted by Groq (instead of OpenAI)
7280
[tts]
73-
platform = "Groq"
81+
platform = "openai"
7482
url = "https://api.groq.com/openai/v1/audio/speech"
7583
api_key = "gsk_your_api_key_here"
7684
model = "playai-tts"

doc/docs/config/llm-tools.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ Since it is a stateful API, the EchoKit server only needs to send the last user
1515

1616
```toml
1717
[llm]
18-
llm_chat_url = "https://api.openai.com/v1/responses"
18+
platform = "openai_responses"
19+
url = "https://api.openai.com/v1/responses"
1920
api_key = "sk_ABCD"
2021
model = "gpt-5-nano"
2122

@@ -43,7 +44,8 @@ The actual implementation of the `web_search_preview` tool is provided by OpenAI
4344

4445
```toml
4546
[llm]
46-
llm_chat_url = "https://api.openai.com/v1/responses"
47+
platform = "openai_responses"
48+
url = "https://api.openai.com/v1/responses"
4749
api_key = "sk_ABCD"
4850
model = "gpt-5-nano"
4951

@@ -69,7 +71,8 @@ provides a `x_search` tool to specifically search for posts in x.com.
6971

7072
```toml
7173
[llm]
72-
llm_chat_url = "https://api.x.ai/v1/responses"
74+
platform = "openai_responses"
75+
url = "https://api.x.ai/v1/responses"
7376
api_key = "xai_ABCD"
7477
model = "grok-4-1-fast-non-reasoning"
7578

@@ -95,7 +98,8 @@ Again the name of the build-in search tool is different. It is called `browser_s
9598

9699
```toml
97100
[llm]
98-
llm_chat_url = "https://api.groq.com/openai/v1/chat/responses"
101+
platform = "openai_responses"
102+
url = "https://api.groq.com/openai/v1/chat/responses"
99103
api_key = "gsk_ABCD"
100104
model = "openai/gpt-oss-20b"
101105

@@ -127,7 +131,8 @@ a response based on those tool call results.
127131

128132
```toml
129133
[llm]
130-
llm_chat_url = "https://api.x.ai/v1/responses"
134+
platform = "openai_responses"
135+
url = "https://api.x.ai/v1/responses"
131136
api_key = "xai_ABCD"
132137
model = "grok-4-1-fast-non-reasoning"
133138

doc/docs/config/llm.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,22 @@ sidebar_position: 3
55
# LLM services
66

77
The EchoKit server utilizes LLM services to generate responses to user queries.
8-
Most popular LLM services support OpenAI's `/v1/chat/completions` API.
8+
Most popular LLM services support OpenAI API.
9+
10+
| Platform | URL example | Notes |
11+
| ------------- | ------------- | ---- |
12+
| `openai_chat` | `https://api.openai.com/v1/chat/completions` | The stateless `/chat/completions` API. It is the most widely supported LLM API. |
13+
| `openai_responses` | `https://api.openai.com/v1/responses` | The stateful `/responses` API. Alpha feature. |
14+
915

1016
## Simple example
1117

1218
The following example configures the EchoKit server to use the OpenAI LLM service.
1319

1420
```toml
1521
[llm]
16-
llm_chat_url = "https://api.openai.com/v1/chat/completions"
22+
platform = "openai_chat"
23+
url = "https://api.openai.com/v1/chat/completions"
1724
api_key = "sk_ABCD"
1825
model = "gpt-5-nano"
1926
history = 5
@@ -78,7 +85,8 @@ We also tells the LLM to use the search tool when needed in the system prompt.
7885

7986
```toml
8087
[llm]
81-
llm_chat_url = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"
88+
platform = "openai_chat"
89+
url = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"
8290
api_key = "sk-API-KEY"
8391
model = "qwen-plus"
8492
history = 5
@@ -105,6 +113,6 @@ You can pass any JSON parameter supported by the LLM API provider in the `[llm.e
105113

106114
While the stateless `/v1/chat/completions` API is widely supported,
107115
OpenAI and many providers in the ecosystem have shifted their focus to the new stateful
108-
`/v1/responses` API. The new responses API makes it easier to support tools, icnluding web searches,
116+
`/v1/responses` API. The new responses API makes it easier to support tools, including web searches,
109117
in LLM applications.
110118

doc/docs/config/tts.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,16 @@ For interactive applications, you should select a TTS service that supports stre
99
Streaming allows the TTS to "speak" as the LLM returns text, instead of waiting for the LLM
1010
to complete and then for the TTS to synthesize the whole text.
1111

12+
13+
| Platform | URL example | Notes |
14+
| ------------- | ------------- | ---- |
15+
| `openai` | `https://api.openai.com/v1/audio/speech` | Supports endpoint URLs from any OpenAI-compatible services, such as Groq and Open Router. |
16+
| `elevenlabs` | `wss://api.elevenlabs.io/v1/text-to-speech` | Supports ElevenLabs TTS endpoint URL. |
17+
| `fish` | `https://api.fish.audio/v1/tts` | Supports Fish Audio TTS endpoint URL. |
18+
| `stream_gsv` | `http://localhost:9094/v1/audio/stream_speech` | Supports self-hosted GPT-SoVITS model API server. This is a streaming TTS endpoint. |
19+
| `gsv` | `http://localhost:9094/v1/audio/speech` | Supports self-hosted GPT-SoVITS model API server. |
20+
| `cosyvoice` | `wss://dashscope.aliyuncs.com/api-ws/v1/inference` | A Web socket streaming TTS service endpoint supported by the Ali Cloud. |
21+
1222
## ElevenLabs streaming service
1323

1424
ElevenLabs provide state-of-the-art TTS models for many languages. It also provides a large library
@@ -20,7 +30,8 @@ With an [API key from ElevenLabs](https://elevenlabs.io/app/developers/api-keys)
2030

2131
```toml
2232
[tts]
23-
platform = "Elevenlabs"
33+
platform = "elevenlabs"
34+
url = "wss://api.elevenlabs.io/v1/text-to-speech/"
2435
token = "sk_1234"
2536
voice = "YOUR-VOICE-ID"
2637
```
@@ -38,7 +49,7 @@ The example below shows a streaming GTP-SoVITS server running at local host port
3849

3950
```toml
4051
[tts]
41-
platform = "StreamGSV"
52+
platform = "stream_gsv"
4253
url = "http://localhost:9094/v1/audio/stream_speech"
4354
speaker = "texan"
4455
```
@@ -49,7 +60,8 @@ The [CosyVoice service](https://bailian.console.aliyun.com/) from Ali Cloud is a
4960

5061
```toml
5162
[tts]
52-
platform = "CosyVoice"
63+
platform = "cosyvoice"
64+
url = "wss://dashscope.aliyuncs.com/api-ws/v1/inference"
5365
token = "sk-API-KEY"
5466
speaker = "longhua_v2"
5567
```
@@ -63,7 +75,8 @@ OpenAI example
6375

6476
```toml
6577
[tts]
66-
platform = "OpenAI"
78+
platform = "openai"
79+
url = "https://api.openai.com/v1/audio/speech"
6780
model = "gpt-4o-mini-tts"
6881
api_key = "sk_ABCD"
6982
voice = "ash"
@@ -73,7 +86,8 @@ Groq example
7386

7487
```toml
7588
[tts]
76-
platform = "Groq"
89+
platform = "openai"
90+
url = "https://api.groq.com/openai/v1/audio/speech"
7791
model = "playai-tts"
7892
api_key = "gsk_ABCD"
7993
voice = "Fritz-PlayAI"

doc/docs/get-started/echokit-server.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ docker run --rm \
1919
The required `config.toml` file for the local EchoKit server could be the following. You will need
2020
free [Groq](https://console.groq.com/keys) and [ElevenLabs](https://elevenlabs.io/app/settings/api-keys) API keys.
2121

22+
> The `platform = "openai"` in the configuration refers to OpenAI-compatible service endpoints. Groq provides its inference service in the OpenAI protocol.
23+
2224
```
2325
addr = "0.0.0.0:8080"
2426
hello_wav = "hello.wav"

0 commit comments

Comments
 (0)