You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/docs/config/gemini-live.md
+5-15Lines changed: 5 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ sidebar_position: 5
4
4
5
5
# Configure an End-to-End Pipeline for EchoKit
6
6
7
-
In addition to the classic [ASR-LLM-TTS pipeline](./configure-echokit-server.md), EchoKit supports real-time models that can reduce latency. However, this approach has several limitations:
7
+
EchoKit supports real-time models that can reduce latency. However, this approach has several limitations:
8
8
9
-
***High API costs** – OpenAI's real-time API can cost up to $25 per 100 tokens
9
+
***High API costs** – Real-time API can cost up to $25 per million tokens
10
10
***No voice customization** – You cannot modify the generated voice
11
11
***Limited knowledge integration** – External knowledge bases cannot be added to the model
12
12
***No MCP support** – Model Control Protocol is not supported in most cases
@@ -36,7 +36,7 @@ Google's Gemini is one of the most advanced models supporting voice-to-voice int
36
36
Here's the complete configuration file for Gemini:
37
37
38
38
```toml
39
-
addr = "0.0.0.0:9090"
39
+
addr = "0.0.0.0:8080"
40
40
hello_wav = "hello.wav"
41
41
42
42
[gemini]
@@ -49,16 +49,6 @@ You are a helpful assistant. Please answer user questions as concisely as possib
49
49
"""
50
50
```
51
51
52
-
### Starting the Server
53
-
54
-
After editing the configuration file, restart the EchoKit server to apply the changes.
55
-
56
-
Since you're using a different `config.toml` file in a custom path, your restart command should look like this:
Copy file name to clipboardExpand all lines: doc/docs/config/intro.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ It generally takes two approaches.
10
10
* The pipeline approach. It divides up the task into multiple steps, and use a different AI service to process each step.
11
11
* The [ASR service](asr.md) turns the user input voice audio into text.
12
12
* The [LLM service](llm.md) generates a text response to the user input. The LLM could be aided by [built-in tools, such as web searches](llm-tools.md) and [custom tools in MCP servers](mcp.md].
13
-
* The [TTS service](ttd.md) converts the response text to voice.
13
+
* The [TTS service](tts.md) converts the response text to voice.
14
14
* The end-to-end real-time model approach. It utilizes multimodal models that could directly ingest voice input and generate voice output, such as [Google Gemini Live](gemini-live.md).
15
15
16
16
The pipeline approach offers greater flexibility and customization - you can choose any voice, control costs by mixing different providers, integrate external knowledge, and run components locally for privacy. While end-to-end models can reduce the latency, the classic pipeline gives you full control over each component.
0 commit comments