feat: add convenience features and improve README

swernerx · swernerx · commit f5a162cf9c5b · 2026-01-06T00:07:27.000+01:00
New features:
- LLMEngine.listModels() - list all curated models
- LLMEngine.getModelForUseCase(useCase) - smart recommendations
- 'gemma-fast' alias for maximum speed model

README improvements:
- New tagline: 'The easiest way to run AI models locally'
- Added 'Without vs With native-llm' comparison table
- Showcased new static methods
- Clearer value proposition as convenience layer
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 <h1 align="center">native-llm</h1>
 
 <p align="center">
-  <strong>Run AI models locally. No cloud. No limits. No cost.</strong>
+  <strong>The easiest way to run AI models locally.</strong>
 </p>
 
 <p align="center">
@@ -25,20 +25,6 @@
 
 ---
 
-## 🎯 Why native-llm?
-
-|             | ☁️ Cloud AI              | 🏠 native-llm        |
-| ----------- | ------------------------ | -------------------- |
-| **Cost**    | $0.001 - $0.10 per query | **Free forever**     |
-| **Speed**   | 1-20 seconds             | **< 100ms**          |
-| **Privacy** | Data sent to servers     | **100% local**       |
-| **Limits**  | Rate limits & quotas     | **Unlimited**        |
-| **Offline** | ❌ Requires internet     | ✅ **Works offline** |
-
-**The bottom line:** Local models now achieve **91% of GPT-5's quality** — at zero cost.
-
----
-
 ## 🚀 Quick Start
 
 ```bash
@@ -48,67 +34,93 @@ npm install native-llm
 ```typescript
 import { LLMEngine } from "native-llm"
 
-// That's it. One line to load a model.
 const engine = new LLMEngine({ model: "gemma" })
 
 const result = await engine.generate({
   prompt: "Explain quantum computing to a 5-year-old"
 })
 
 console.log(result.text)
-// → "Imagine you have a magical coin that can be heads AND tails at the same time..."
 ```
 
-Models download automatically on first use. No setup. No configuration. Just works.
+**That's it.** Model downloads automatically. GPU detected automatically. Just works.
 
 ---
 
-## ⚡ Performance
+## 🎯 Why native-llm?
 
-Benchmarked on **Apple M1 Ultra** with Metal GPU acceleration:
+**A friendly wrapper around [llama.cpp](https://github.com/ggerganov/llama.cpp) that handles the
+hard parts:**
 
-| Model                 | Size  | Speed        | Best For          |
-| --------------------- | ----- | ------------ | ----------------- |
-| 🚀 **Gemma 3n E2B**   | 3 GB  | **36 tok/s** | Maximum speed     |
-| ⭐ **Gemma 3n E4B**   | 5 GB  | **18 tok/s** | Best balance      |
-| 💻 **Qwen 2.5 Coder** | 5 GB  | **23 tok/s** | Code generation   |
-| 🧠 **DeepSeek R1**    | 5 GB  | **9 tok/s**  | Complex reasoning |
-| 👑 **Gemma 3 27B**    | 18 GB | **5 tok/s**  | Maximum quality   |
+| Without native-llm         | With native-llm         |
+| -------------------------- | ----------------------- |
+| Find GGUF model URLs       | `model: "gemma"`        |
+| Configure HuggingFace auth | Auto from `HF_TOKEN`    |
+| 20+ lines of setup         | 3 lines                 |
+| Handle Qwen3 thinking mode | Automatic               |
+| Research model benchmarks  | Curated recommendations |
 
-> 💡 **Our pick:** Start with `gemma-3n-e4b` — it's the sweet spot of quality and speed.
+### Local vs Cloud
+
+|             | ☁️ Cloud AI              | 🏠 native-llm        |
+| ----------- | ------------------------ | -------------------- |
+| **Cost**    | $0.001 - $0.10 per query | **Free forever**     |
+| **Speed**   | 1-20 seconds             | **< 100ms**          |
+| **Privacy** | Data sent to servers     | **100% local**       |
+| **Limits**  | Rate limits & quotas     | **Unlimited**        |
+| **Offline** | ❌ Requires internet     | ✅ **Works offline** |
 
 ---
 
 ## 🎨 Models
 
-Use simple aliases — we handle the rest:
+### Simple Aliases
 
 ```typescript
-new LLMEngine({ model: "gemma" }) // Fast & efficient
-new LLMEngine({ model: "gemma-large" }) // Maximum quality
+new LLMEngine({ model: "gemma" }) // Best balance (default)
+new LLMEngine({ model: "gemma-fast" }) // Maximum speed
 new LLMEngine({ model: "qwen-coder" }) // Code generation
-new LLMEngine({ model: "deepseek" }) // Chain-of-thought reasoning
-new LLMEngine({ model: "phi" }) // STEM & science
+new LLMEngine({ model: "deepseek" }) // Complex reasoning
 ```
 
-Or use any of the **1000+ GGUF models** on HuggingFace:
+### Smart Recommendations
 
 ```typescript
-new LLMEngine({ model: "/path/to/any-model.gguf" })
+import { LLMEngine } from "native-llm"
+
+// Get the right model for your use case
+const model = LLMEngine.getModelForUseCase("code") // → qwen-2.5-coder-7b
+const model = LLMEngine.getModelForUseCase("fast") // → gemma-3n-e2b
+const model = LLMEngine.getModelForUseCase("quality") // → gemma-3-27b
+
+// List all available models
+const models = LLMEngine.listModels()
+// → [{ id: "gemma-3n-e4b", name: "Gemma 3n E4B", size: "5 GB", ... }, ...]
 ```
 
+### Performance (M1 Ultra)
+
+| Model                 | Size  | Speed        | Best For          |
+| --------------------- | ----- | ------------ | ----------------- |
+| 🚀 **Gemma 3n E2B**   | 3 GB  | **36 tok/s** | Maximum speed     |
+| ⭐ **Gemma 3n E4B**   | 5 GB  | **18 tok/s** | Best balance      |
+| 💻 **Qwen 2.5 Coder** | 5 GB  | **23 tok/s** | Code generation   |
+| 🧠 **DeepSeek R1**    | 5 GB  | **9 tok/s**  | Complex reasoning |
+| 👑 **Gemma 3 27B**    | 18 GB | **5 tok/s**  | Maximum quality   |
+
 ---
 
 ## ✨ Features
 
-| Feature               | Description                                                 |
-| --------------------- | ----------------------------------------------------------- |
-| 🔥 **Native Speed**   | Direct N-API bindings to llama.cpp — no subprocess overhead |
-| 🍎 **Metal GPU**      | Full Apple Silicon acceleration out of the box              |
-| 🖥️ **Cross-Platform** | macOS, Linux, Windows — CUDA support for NVIDIA             |
-| 📦 **Auto-Download**  | Models fetched from HuggingFace automatically               |
-| 🌊 **Streaming**      | Real-time token-by-token output                             |
-| 📝 **TypeScript**     | Full type definitions included                              |
+| Feature               | Description                                                |
+| --------------------- | ---------------------------------------------------------- |
+| 📦 **Zero Config**    | Models download automatically, GPU detected automatically  |
+| 🎯 **Smart Defaults** | Curated models, sensible parameters, thinking-mode handled |
+| 🔥 **Native Speed**   | Direct llama.cpp bindings — no Python, no subprocess       |
+| 🍎 **Metal GPU**      | Full Apple Silicon acceleration out of the box             |
+| 🖥️ **Cross-Platform** | macOS, Linux, Windows with CUDA support                    |
+| 🌊 **Streaming**      | Real-time token-by-token output                            |
+| 📝 **TypeScript**     | Full type definitions included                             |
 
 ---
 
@@ -126,8 +138,8 @@ Get yours in 30 seconds: [huggingface.co/settings/tokens](https://huggingface.co
 
 ## 📚 Documentation
 
-**[→ Full Documentation](https://sebastian-software.github.io/native-llm/)** — Benchmarks, model
-comparison, streaming, chat API, and more.
+**[→ Full Documentation](https://sebastian-software.github.io/native-llm/)** — Streaming, chat API,
+custom models, and more.
 
 <p align="center">
   <strong>MIT License</strong> · Made with ❤️ by <a href="https://sebastian-software.de">Sebastian Software</a>
diff --git a/src/engine.test.ts b/src/engine.test.ts
@@ -329,4 +329,51 @@ describe("Model resolution", () => {
     const info = engine.getModelInfo()
     expect(info).toEqual(MODELS["qwen-2.5-coder-7b"])
   })
+
+  it("should have gemma-fast alias point to gemma-3n-e2b", () => {
+    const engine = new LLMEngine({ model: "gemma-fast" })
+    const info = engine.getModelInfo()
+    expect(info).toEqual(MODELS["gemma-3n-e2b"])
+  })
+})
+
+describe("Static methods", () => {
+  describe("listModels", () => {
+    it("should return all models", () => {
+      const models = LLMEngine.listModels()
+      expect(models.length).toBeGreaterThan(0)
+      expect(models[0]).toHaveProperty("id")
+      expect(models[0]).toHaveProperty("name")
+      expect(models[0]).toHaveProperty("repo")
+    })
+
+    it("should include gemma-3n-e4b", () => {
+      const models = LLMEngine.listModels()
+      const gemma = models.find((m) => m.id === "gemma-3n-e4b")
+      expect(gemma).toBeDefined()
+      expect(gemma?.name).toBe("Gemma 3n E4B")
+    })
+  })
+
+  describe("getModelForUseCase", () => {
+    it("should return gemma-3n-e2b for fast", () => {
+      expect(LLMEngine.getModelForUseCase("fast")).toBe("gemma-3n-e2b")
+    })
+
+    it("should return gemma-3n-e4b for balanced", () => {
+      expect(LLMEngine.getModelForUseCase("balanced")).toBe("gemma-3n-e4b")
+    })
+
+    it("should return qwen-2.5-coder-7b for code", () => {
+      expect(LLMEngine.getModelForUseCase("code")).toBe("qwen-2.5-coder-7b")
+    })
+
+    it("should return deepseek-r1-14b for reasoning", () => {
+      expect(LLMEngine.getModelForUseCase("reasoning")).toBe("deepseek-r1-14b")
+    })
+
+    it("should return gemma-3-27b for quality", () => {
+      expect(LLMEngine.getModelForUseCase("quality")).toBe("gemma-3-27b")
+    })
+  })
 })
diff --git a/src/engine.ts b/src/engine.ts
@@ -438,4 +438,62 @@ export class LLMEngine {
     this.session = null
     this.llama = null
   }
+
+  // ============================================
+  // Static Methods
+  // ============================================
+
+  /**
+   * List all available curated models
+   *
+   * @returns Array of model information objects
+   *
+   * @example
+   * ```typescript
+   * const models = LLMEngine.listModels()
+   * models.forEach(m => console.log(`${m.id}: ${m.name} (${m.parameters})`))
+   * ```
+   */
+  static listModels(): ({ id: string } & (typeof MODELS)[ModelId])[] {
+    return Object.entries(MODELS).map(([id, info]) => ({
+      id,
+      ...info
+    }))
+  }
+
+  /**
+   * Get recommended model for a specific use case
+   *
+   * @param useCase - One of: fast, balanced, quality, edge, multilingual, reasoning, code, longContext
+   * @returns Model ID string
+   *
+   * @example
+   * ```typescript
+   * const modelId = LLMEngine.getModelForUseCase("code")
+   * const engine = new LLMEngine({ model: modelId })
+   * ```
+   */
+  static getModelForUseCase(
+    useCase:
+      | "fast"
+      | "balanced"
+      | "quality"
+      | "edge"
+      | "multilingual"
+      | "reasoning"
+      | "code"
+      | "longContext"
+  ): ModelId {
+    const recommendations: Record<string, ModelId> = {
+      fast: "gemma-3n-e2b",
+      balanced: "gemma-3n-e4b",
+      quality: "gemma-3-27b",
+      edge: "gemma-3n-e2b",
+      multilingual: "qwen3-8b",
+      reasoning: "deepseek-r1-14b",
+      code: "qwen-2.5-coder-7b",
+      longContext: "gemma-3-27b"
+    }
+    return recommendations[useCase] ?? "gemma-3n-e4b"
+  }
 }
diff --git a/src/types.ts b/src/types.ts
@@ -187,6 +187,7 @@ export type ModelInfo = (typeof MODELS)[ModelId]
 export const MODEL_ALIASES: Record<string, ModelId> = {
   // Gemma
   gemma: "gemma-3n-e4b",
+  "gemma-fast": "gemma-3n-e2b",
   "gemma-large": "gemma-3-27b",
 
   // GPT-OSS (experimental)