|
| 1 | +--- |
| 2 | +slug: echokit-30-days-day-13-local-llm |
| 3 | +title: "Day 13 — Running an LLM Locally for EchoKit | The First 30 Days with EchoKit" |
| 4 | +tags: [echokit30days] |
| 5 | +--- |
| 6 | + |
| 7 | +Over the last few days, we explored several cloud-based LLM providers — OpenAI, OpenRouter, and Grok. Each offers unique advantages, but today we’re doing something completely different: we’re running the open-source **Qwen3-4B** model *locally* and using it as EchoKit’s LLM provider. |
| 8 | + |
| 9 | + |
| 10 | +There’s no shortage of great open-source LLMs—Llama, Mistral, DeepSeek, Qwen, and many others—and you can pick whichever model best matches your use case. |
| 11 | + |
| 12 | +Likewise, you can run a local model in several different ways. For today’s walkthrough, though, we’ll focus on a clean, lightweight, and portable setup: |
| 13 | +**Qwen3-4B (GGUF) running inside a WASM LLM server powered by WasmEdge.** |
| 14 | +This setup exposes an OpenAI-compatible API, which makes integrating it with EchoKit simple and seamless. |
| 15 | + |
| 16 | +## Run the Qwen3-4B Model Locally |
| 17 | + |
| 18 | +### Step 1 — Install WasmEdge |
| 19 | + |
| 20 | +WasmEdge is a lightweight, secure WebAssembly runtime capable of running LLM workloads through the LlamaEdge extension. |
| 21 | + |
| 22 | +Install it: |
| 23 | + |
| 24 | +```bash |
| 25 | +curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s |
| 26 | +``` |
| 27 | + |
| 28 | +Verify the installation: |
| 29 | + |
| 30 | +```bash |
| 31 | +wasmedge --version |
| 32 | +``` |
| 33 | + |
| 34 | +You should see a version number printed. |
| 35 | + |
| 36 | + |
| 37 | + |
| 38 | +### Step 2 — Download Qwen3-4B in GGUF Format |
| 39 | + |
| 40 | +We’ll use a quantized version of Qwen3-4B, which keeps memory usage manageable while delivering strong performance. |
| 41 | + |
| 42 | +```bash |
| 43 | +curl -Lo Qwen3-4B-Q5_K_M.gguf https://huggingface.co/second-state/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q5_K_M.gguf |
| 44 | +``` |
| 45 | + |
| 46 | + |
| 47 | +### Step 3 — Download the LlamaEdge API Server (WASM) |
| 48 | + |
| 49 | +This small `.wasm` application loads GGUF models and exposes an **OpenAI-compatible chat API**, which EchoKit can connect to directly. |
| 50 | + |
| 51 | +```bash |
| 52 | +curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm |
| 53 | +``` |
| 54 | + |
| 55 | +### Step 4 — Start the Local LLM Server |
| 56 | + |
| 57 | +Now let’s launch the Qwen3-4B model locally and expose the `/v1/chat/completions` endpoint: |
| 58 | + |
| 59 | +```bash |
| 60 | +wasmedge --dir .:. \ |
| 61 | + --nn-preload default:GGML:AUTO:Qwen3-4B-Q5_K_M.gguf \ |
| 62 | + llama-api-server.wasm \ |
| 63 | + --model-name Qwen3-4B \ |
| 64 | + --prompt-template qwen3-no-think \ |
| 65 | + --ctx-size 4096 |
| 66 | +``` |
| 67 | + |
| 68 | +If everything starts up correctly, the server will be available at: |
| 69 | + |
| 70 | +``` |
| 71 | +http://localhost:8080 |
| 72 | +``` |
| 73 | + |
| 74 | +## Connect EchoKit to Your Local LLM |
| 75 | + |
| 76 | +Open your EchoKit server’s `config.toml` and update the LLM settings: |
| 77 | + |
| 78 | +```toml |
| 79 | +[llm] |
| 80 | +llm_chat_url = "http://localhost:8080/v1/chat/completions" |
| 81 | +api_key = "N/A" |
| 82 | +model = "Qwen3-4B" |
| 83 | +history = 5 |
| 84 | +``` |
| 85 | + |
| 86 | +Save the file and restart your EchoKit server. |
| 87 | + |
| 88 | +Next, pair your EchoKit device and connect it to your updated server. |
| 89 | + |
| 90 | +Now try speaking to your device: |
| 91 | + |
| 92 | +> “EchoKit, what do you think about running local models?” |
| 93 | +
|
| 94 | +Watch your terminal — you should see EchoKit sending requests to your local endpoint. |
| 95 | + |
| 96 | +Your EchoKit is now fully powered by a local Qwen3-4B model. |
| 97 | + |
| 98 | +Today we reached a major milestone: |
| 99 | +**EchoKit can now run entirely on your machine, with no external LLM provider required.** |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +This tutorial is only one small piece of what EchoKit can do. |
| 104 | +If you want to build your own voice AI device, try different LLMs, or run fully local models like Qwen — EchoKit gives you everything you need in one open-source kit. |
| 105 | + |
| 106 | +Want to explore more or share what you’ve built? |
| 107 | + |
| 108 | +* Join the **[EchoKit Discord](https://discord.gg/Fwe3zsT5g3)** |
| 109 | +* Show us your custom models, latency tests, and experiments — the community is growing fast. |
| 110 | + |
| 111 | +Ready to get your own EchoKit? |
| 112 | + |
| 113 | +* **EchoKit Box →** [https://echokit.dev/echokit_box.html](https://echokit.dev/echokit_box.html) |
| 114 | +* **EchoKit DIY Kit →** [https://echokit.dev/echokit_diy.html](https://echokit.dev/echokit_diy.html) |
| 115 | + |
| 116 | +**Start building your own voice AI agent today.** |
0 commit comments