second-state
diff --git a/‎doc/docs/config/_category_.json‎
Lines changed: 8 additions & 0 deletions b/‎doc/docs/config/_category_.json‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎doc/docs/config/asr.md‎
Lines changed: 64 additions & 0 deletions b/‎doc/docs/config/asr.md‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎doc/docs/server/real-time-echokit.md‎ ‎doc/docs/config/gemini-live.md‎doc/docs/server/real-time-echokit.md renamed to doc/docs/config/gemini-live.md
Lines changed: 8 additions & 18 deletions b/‎doc/docs/server/real-time-echokit.md‎ ‎doc/docs/config/gemini-live.md‎doc/docs/server/real-time-echokit.md renamed to doc/docs/config/gemini-live.md
Lines changed: 8 additions & 18 deletions
diff --git a/‎doc/docs/config/intro.md‎
Lines changed: 94 additions & 0 deletions b/‎doc/docs/config/intro.md‎
Lines changed: 94 additions & 0 deletions
@@ -0,0 +1,8 @@
+{
+  "label": "Config guide",
+  "position": 3,
+  "link": {
+    "type": "generated-index",
+    "description": "In this chapter, you'll learn how to configure the EchoKit server."
+  }
+}
@@ -0,0 +1,64 @@
+---
+sidebar_position: 2
+---
+
+# Voice to text services (ASR)
+
+The EchoKit server supports popular ASR providers.
+
+| Platform  | URL example | Notes |
+| ------------- | ------------- | ---- |
+| `openai`  | `https://api.openai.com/v1/audio/transcriptions`  | Supports endpoint URLs from any OpenAI-compatible services, such as Groq and Open Router. |
+| `paraformer_v2`  | `wss://dashscope.aliyuncs.com/api-ws/v1/inference`  | A Web socket streaming ASR service endpoint supported by the ALi Cloud |
+
+
+## OpenAI and compatible services
+
+The OpenAI `/v1/audio/transcriptions` API is supported by OpenAI, Open Router, Groq, Azure, AWS and many other providers.
+This is a non-streaming service endpoint, meaning that EchoKit server must determine when the user is done
+talking (via an VAD service), and then submit the entire audio to get a transscription.
+
+OpenAI example
+
+```toml
+[asr]
+platform = "openai"
+url = "https://api.openai.com/v1/audio/transcriptions"
+api_key = "sk_ABCD"
+model = "gpt-4o-mini-transcribe"
+lang = "en"
+vad_url = "http://localhost:9093/v1/audio/vad"
+```
+
+Groq example
+
+```toml
+[asr]
+platform = "openai"
+url = "https://api.groq.com/openai/v1/audio/transcriptions"
+api_key = "gsk_ABCD"
+model = "whisper-large-v3"
+lang = "en"
+prompt = "Hello\n你好\n(noise)\n(bgm)\n(silence)\n"
+vad_url = "http://localhost:9093/v1/audio/vad"
+```
+
+Notice that in both examples, we are using a locally hosted VAD service to detect when the user is finished speaking. It is optional and you can [learn about it here](../server/vad.md).
+
+## Ali Cloud streaming ASR
+
+The [Bailian service](https://bailian.console.aliyun.com/) from Ali Cloud provides excellent ASR models for Chinese language recognition.
+It is also a streaming ASR service -- it would take an audio stream as input and 
+send back text and voice activity events as they happen. There is no need to a separate VAD service in this case. 
+
+```toml
+[asr]
+platform = "paraformer_v2"
+url = "wss://dashscope.aliyuncs.com/api-ws/v1/inference"
+paraformer_token = "sk-API-KEY"
+```
+
+## ElevenLabs streaming ASR
+
+Coming soon ...
+
@@ -4,9 +4,9 @@ sidebar_position: 5
 
 # Configure an End-to-End Pipeline for EchoKit
 
-In addition to the classic [ASR-LLM-TTS pipeline](./configure-echokit-server.md), EchoKit supports real-time models that can reduce latency. However, this approach has several limitations:
+EchoKit supports real-time models that can reduce latency. However, this approach has several limitations:
 
-* **High API costs** – OpenAI's real-time API can cost up to $25 per 100 tokens
+* **High API costs** – Real-time API can cost up to $25 per million tokens
 * **No voice customization** – You cannot modify the generated voice
 * **Limited knowledge integration** – External knowledge bases cannot be added to the model
 * **No MCP support** – Model Control Protocol is not supported in most cases
@@ -15,9 +15,9 @@ In addition to the classic [ASR-LLM-TTS pipeline](./configure-echokit-server.md)
 
 Before setting up your end-to-end pipeline, ensure you have:
 
-* **EchoKit server source code** – Follow the [guide](./echokit-server.md) if you haven't already
+* **EchoKit server source code** – Follow the [guide](../get-started/echokit-server.md) if you haven't already
 * **Gemini API key** – Obtain from [Google AI Studio](https://aistudio.google.com/)
-* **TTS service running** (optional) – If using custom voice synthesis
+* **TTS service** (optional)
 
 ## Gemini API Setup
 
@@ -36,7 +36,7 @@ Google's Gemini is one of the most advanced models supporting voice-to-voice int
 Here's the complete configuration file for Gemini:
 
 ```toml
-addr = "0.0.0.0:9090"
+addr = "0.0.0.0:8080"
 hello_wav = "hello.wav"
 
 [gemini]
@@ -49,16 +49,6 @@ You are a helpful assistant. Please answer user questions as concisely as possib
 """
 ```
 
-### Starting the Server
-
-After editing the configuration file, restart the EchoKit server to apply the changes.
-
-Since you're using a different `config.toml` file in a custom path, your restart command should look like this:
-
-```bash
-./target/release/echokit_server ./examples/gemini/chat/config.toml
-```
-
 ## Gemini + TTS (Custom Voice)
 
 While real-time models typically don't allow voice customization, EchoKit enables you to customize the voice even when using Gemini!
@@ -68,14 +58,14 @@ While real-time models typically don't allow voice customization, EchoKit enable
 Simply add TTS-related parameters to your `config.toml` file:
 
 ```toml
-addr = "0.0.0.0:9090"
+addr = "0.0.0.0:8080"
 hello_wav = "hello.wav"
 
 [gemini]
 api_key = "your_api_key_here"
 
 [tts]
-platform = "StreamGSV"
+platform = "stream_gsv"
 url = "http://localhost:9094/v1/audio/stream_speech"
 speaker = "cooper"
 
@@ -86,4 +76,4 @@ You are a helpful assistant. Please answer user questions as concisely as possib
 """
 ```
 
-With these TTS settings configured, you can now use your preferred custom voice.
+With these TTS settings configured, you can now use your preferred custom voice.
@@ -0,0 +1,94 @@
+---
+sidebar_position: 1
+---
+
+# EchoKit server config options
+
+The EchoKit server orchestrates multiple AI services to turn user voice input into voice responses.
+It generally takes two approaches.
+
+* The pipeline approach. It divides up the task into multiple steps, and use a different AI service to process each step.
+  * The [ASR service](asr.md) turns the user input voice audio into text.
+  * The [LLM service](llm.md) generates a text response to the user input. The LLM could be aided by [built-in tools, such as web searches](llm-tools.md) and [custom tools in MCP servers](mcp.md).
+  * The [TTS service](tts.md) converts the response text to voice.
+* The end-to-end real-time model approach. It utilizes multimodal models that could directly ingest voice input and generate voice output, such as [Google Gemini Live](gemini-live.md).
+
+The pipeline approach offers greater flexibility and customization - you can choose any voice, control costs by mixing different providers, integrate external knowledge, and run components locally for privacy. While end-to-end models can reduce the latency, the classic pipeline gives you full control over each component.
+
+You can configure how those AI services work together through EchoKit server's `config.toml` file.
+
+## Prerequisites
+
+* Started an EchoKit server. Follow [the quick start guide](../get-started/echokit-server.md) if needed
+* Obtained **API keys** for your favoriate AI API providers (OpenAI, Groq, xai, Open Router, ElevenLabs, Gemini etc.)
+
+
+## Configure server address and welcome audio
+
+```toml
+addr = "0.0.0.0:8080"
+hello_wav = "hello.wav"
+```
+
+* `addr`: The server's listening address and port
+  * Use `0.0.0.0` to accept connections from any network interface
+  * Make sure that your firewall allows incoming connections to the port (`8080` in this example)
+* `hello_wav`: Optional welcome audio file played when a device connects
+  * Supports 16kHz WAV format
+  * Make sure that the file is in the same folder as `config.toml`
+
+## Configure AI services
+
+The rest of the `config.toml` specifies how to use different AI services. Each service will be covered in its own chapter.
+
+* The `[asr]` section configures the [voice-to-text](asr.md) services.
+* The `[llm]` section configures the [large language model](llm.md) services, including [tools](llm-tools.md) and [MCP actions](mcp.md).
+* The `[tts]` section configures the [text-to-voice](tts.md) services.
+
+It is important to note that each of sections has those fields.
+
+* A `platform` field that designates the service protocol. A common example is `openai` for OpenAI compatible API endpoints.
+* A `url` field for the service URL endpoint. It is typically a `https://` or `wss://` URL. The latter is the Web Socket address for streaming services.
+* Optional fields that are specific to the `platform`. That includes `api_key`, `model`, and others.
+
+## Complete Configuration Example
+
+You will need a free [API key from Groq](https://console.groq.com/keys).
+
+```toml
+# Server settings
+addr = "0.0.0.0:8080"
+hello_wav = "hello.wav"
+
+# Speech recognition using the OpenAI transcriptions API, but hosted by Groq (instead of OpenAI)
+[asr]
+platform = "openai"
+url = "https://api.groq.com/openai/v1/audio/transcriptions"
+lang = "en"
+api_key = "gsk_your_api_key_here"
+model = "whisper-large-v3-turbo"
+
+# Language model using the OpenAI chat completions API, but hosted by Groq (instead of OpenAI)
+[llm]
+platform = "openai_chat"
+url = "https://api.groq.com/openai/v1/chat/completions"
+api_key = "gsk_your_api_key_here"
+model = "gpt-oss-20b"
+history = 10
+
+# Text-to-speech using the OpenAI speech API, but hosted by Groq (instead of OpenAI)
+[tts]
+platform = "openai"
+url = "https://api.groq.com/openai/v1/audio/speech"
+api_key = "gsk_your_api_key_here"
+model = "playai-tts"
+voice = "Cooper-PlayAI"
+
+# System personality
+[[llm.sys_prompts]]
+role = "system"
+content = """
+Your name is EchoKit, a helpful AI assistant. Provide clear, concise responses and maintain a friendly, professional tone. Keep answers brief but informative.
+"""
+```
+