Skip to content

Commit 21e827e

Browse files
authored
Merge pull request #10 from second-state/alabulei1-patch-6
Update README.md
2 parents 9d30eaf + ad43fc5 commit 21e827e

File tree

1 file changed

+41
-9
lines changed

1 file changed

+41
-9
lines changed

README.md

Lines changed: 41 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Set up the EchoKit server
1+
# EchoKit Server
22

33
EchoKit Server is the central component that manages communication between the [EchoKit device](https://echokit.dev/) and AI services. It can be deployed locally or connected to preset servers, allowing developers to customize LLM endpoints, plan the LLM prompt, configure speech models, and integrate additional AI features like MCP servers.
44

@@ -14,8 +14,40 @@ EchoKit Server is the central component that manages communication between the [
1414
You will need an [EchoKit device](https://echokit.dev/), or create your own ESP32 device with the [EchoKit firmware](https://github.com/second-state/echokit_box).
1515

1616

17+
## Features
1718

18-
## Build
19+
EchoKit Server powers the full voice–AI interaction loop, making it easy for developers to run end-to-end speech pipelines with flexible model choices and custom integrations.
20+
21+
### ASR → LLM → TTS Pipeline
22+
23+
Seamlessly connect **ASR → LLM → TTS** for real-time, natural conversations.
24+
Each stage can be configured independently with your preferred models or APIs.
25+
26+
#### Model Compatibility
27+
28+
* **ASR (Speech Recognition):** Works with any API that’s *OpenAI-compatible*.
29+
* **LLM (Language Model):** Connect to any *OpenAI-spec* endpoint — local or cloud.
30+
* **TTS (Text-to-Speech):** Use any *OpenAI-spec* voice model for flexible deployment.
31+
* ElevenLabs (Streaming Mode)
32+
33+
### End-to-End Model Pipelines
34+
35+
Out-of-the-box support for:
36+
37+
* **Gemini** — Google’s multimodal model
38+
* **Qwen Real-Time** — Alibaba’s powerful open LLM
39+
40+
### Developer Customization
41+
42+
* Deploy **locally** or connect to **remote inference servers**
43+
* Define your own **LLM prompts** and **response workflows**
44+
* Configure **speech and voice models** for different personas or use cases
45+
* Integrate **MCP servers** for extended functionality
46+
47+
48+
## Set up the EchoKit server
49+
50+
### Build
1951

2052
```
2153
git clone https://github.com/second-state/echokit_server
@@ -27,7 +59,7 @@ Edit `config.toml` to customize the VAD, ASR, LLM, TTS services, as well as prom
2759
cargo build --release
2860
```
2961

30-
## Configure AI services
62+
### Configure AI services
3163

3264
The `config.toml` can use any combination of open-source or proprietary AI services, as long as they offer OpenAI-compatible API endpoints. Here are instructions to start open source AI servers for the EchoKit server.
3365

@@ -40,18 +72,18 @@ Alternatively, you could use Google Gemini Live services for VAD + ASR + LLM, an
4072

4173
You can also [configure MCP servers](examples/gaia/mcp/config.toml) to give the EchoKit server tool use capabilities.
4274

43-
## Configure the voice prompt
75+
### Configure the voice prompt
4476

4577
The `hello.wav` file on the server is sent to the EchoKit device when it connects. It is the voice prompt the device will say to tell the user that it is ready.
4678

47-
## Run the EchoKit server
79+
### Run the EchoKit server
4880

4981
```
5082
export RUST_LOG=debug
5183
nohup target/release/echokit_server &
5284
```
5385

54-
## Test on a web page
86+
### Test on a web page
5587

5688
Go here: https://echokit.dev/chat/
5789

@@ -61,7 +93,7 @@ Double click the local `index.html` file and open it in your browser.
6193

6294
In the web page, set the URL to your own EchoKit server address, and start chatting!
6395

64-
## Configure a new device
96+
### Configure a new device
6597

6698
Go to web page: https://echokit.dev/setup/ and use Bluetooth to connect to the `GAIA ESP332` device.
6799

@@ -77,9 +109,9 @@ Configure WiFi and server
77109

78110
![Configure Wifi](https://hackmd.io/_uploads/HJkh5ZjVee.png)
79111

80-
## Use the device
112+
### Use the device
81113

82-
**Chat:** press the `K0` button once or multiple times util the screen shows "Listening ...". You can now speak and it will answer.
114+
**Chat:** press the `K0` button once or multiple times until the screen shows "Listening ...". You can now speak and it will answer.
83115

84116
**Record:** long press the `K0` until the screen shows "Recording ...". You can now speak and the audio will be recorded on the server.
85117

0 commit comments

Comments
 (0)