|
| 1 | +--- |
| 2 | +sidebar_position: 1 |
| 3 | +--- |
| 4 | + |
| 5 | +# EchoKit server config options |
| 6 | + |
| 7 | +The EchoKit server orchestrates multiple AI services to turn user voice input into voice responses. |
| 8 | +It generally takes two approaches. |
| 9 | + |
| 10 | +* The pipeline approach. It divides up the task into multiple steps, and use a different AI service to process each step. |
| 11 | + * The [ASR service](asr.md) turns the user input voice audio into text. |
| 12 | + * The [LLM service](llm.md) generates a text response to the user input. The LLM could be aided by [built-in tools, such as web searches](llm-tools.md) and [custom tools in MCP servers](mcp.md). |
| 13 | + * The [TTS service](tts.md) converts the response text to voice. |
| 14 | +* The end-to-end real-time model approach. It utilizes multimodal models that could directly ingest voice input and generate voice output, such as [Google Gemini Live](gemini-live.md). |
| 15 | + |
| 16 | +The pipeline approach offers greater flexibility and customization - you can choose any voice, control costs by mixing different providers, integrate external knowledge, and run components locally for privacy. While end-to-end models can reduce the latency, the classic pipeline gives you full control over each component. |
| 17 | + |
| 18 | +You can configure how those AI services work together through EchoKit server's `config.toml` file. |
| 19 | + |
| 20 | +## Prerequisites |
| 21 | + |
| 22 | +* Started an EchoKit server. Follow [the quick start guide](../get-started/echokit-server.md) if needed |
| 23 | +* Obtained **API keys** for your favoriate AI API providers (OpenAI, Groq, xai, Open Router, ElevenLabs, Gemini etc.) |
| 24 | + |
| 25 | + |
| 26 | +## Configure server address and welcome audio |
| 27 | + |
| 28 | +```toml |
| 29 | +addr = "0.0.0.0:8080" |
| 30 | +hello_wav = "hello.wav" |
| 31 | +``` |
| 32 | + |
| 33 | +* `addr`: The server's listening address and port |
| 34 | + * Use `0.0.0.0` to accept connections from any network interface |
| 35 | + * Make sure that your firewall allows incoming connections to the port (`8080` in this example) |
| 36 | +* `hello_wav`: Optional welcome audio file played when a device connects |
| 37 | + * Supports 16kHz WAV format |
| 38 | + * Make sure that the file is in the same folder as `config.toml` |
| 39 | + |
| 40 | +## Configure AI services |
| 41 | + |
| 42 | +The rest of the `config.toml` specifies how to use different AI services. Each service will be covered in its own chapter. |
| 43 | + |
| 44 | +* The `[asr]` section configures the [voice-to-text](asr.md) services. |
| 45 | +* The `[llm]` section configures the [large language model](llm.md) services, including [tools](llm-tools.md) and [MCP actions](mcp.md). |
| 46 | +* The `[tts]` section configures the [text-to-voice](tts.md) services. |
| 47 | + |
| 48 | +It is important to note that each of sections has those fields. |
| 49 | + |
| 50 | +* A `platform` field that designates the service protocol. A common example is `openai` for OpenAI compatible API endpoints. |
| 51 | +* A `url` field for the service URL endpoint. It is typically a `https://` or `wss://` URL. The latter is the Web Socket address for streaming services. |
| 52 | +* Optional fields that are specific to the `platform`. That includes `api_key`, `model`, and others. |
| 53 | + |
| 54 | +## Complete Configuration Example |
| 55 | + |
| 56 | +You will need a free [API key from Groq](https://console.groq.com/keys). |
| 57 | + |
| 58 | +```toml |
| 59 | +# Server settings |
| 60 | +addr = "0.0.0.0:8080" |
| 61 | +hello_wav = "hello.wav" |
| 62 | + |
| 63 | +# Speech recognition using the OpenAI transcriptions API, but hosted by Groq (instead of OpenAI) |
| 64 | +[asr] |
| 65 | +platform = "openai" |
| 66 | +url = "https://api.groq.com/openai/v1/audio/transcriptions" |
| 67 | +lang = "en" |
| 68 | +api_key = "gsk_your_api_key_here" |
| 69 | +model = "whisper-large-v3-turbo" |
| 70 | + |
| 71 | +# Language model using the OpenAI chat completions API, but hosted by Groq (instead of OpenAI) |
| 72 | +[llm] |
| 73 | +platform = "openai_chat" |
| 74 | +url = "https://api.groq.com/openai/v1/chat/completions" |
| 75 | +api_key = "gsk_your_api_key_here" |
| 76 | +model = "gpt-oss-20b" |
| 77 | +history = 10 |
| 78 | + |
| 79 | +# Text-to-speech using the OpenAI speech API, but hosted by Groq (instead of OpenAI) |
| 80 | +[tts] |
| 81 | +platform = "openai" |
| 82 | +url = "https://api.groq.com/openai/v1/audio/speech" |
| 83 | +api_key = "gsk_your_api_key_here" |
| 84 | +model = "playai-tts" |
| 85 | +voice = "Cooper-PlayAI" |
| 86 | + |
| 87 | +# System personality |
| 88 | +[[llm.sys_prompts]] |
| 89 | +role = "system" |
| 90 | +content = """ |
| 91 | +Your name is EchoKit, a helpful AI assistant. Provide clear, concise responses and maintain a friendly, professional tone. Keep answers brief but informative. |
| 92 | +""" |
| 93 | +``` |
| 94 | + |
0 commit comments