Skip to content

Commit f5a162c

Browse files
committed
feat: add convenience features and improve README
New features: - LLMEngine.listModels() - list all curated models - LLMEngine.getModelForUseCase(useCase) - smart recommendations - 'gemma-fast' alias for maximum speed model README improvements: - New tagline: 'The easiest way to run AI models locally' - Added 'Without vs With native-llm' comparison table - Showcased new static methods - Clearer value proposition as convenience layer
1 parent 2b05846 commit f5a162c

File tree

4 files changed

+163
-45
lines changed

4 files changed

+163
-45
lines changed

README.md

Lines changed: 57 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<h1 align="center">native-llm</h1>
66

77
<p align="center">
8-
<strong>Run AI models locally. No cloud. No limits. No cost.</strong>
8+
<strong>The easiest way to run AI models locally.</strong>
99
</p>
1010

1111
<p align="center">
@@ -25,20 +25,6 @@
2525

2626
---
2727

28-
## 🎯 Why native-llm?
29-
30-
| | ☁️ Cloud AI | 🏠 native-llm |
31-
| ----------- | ------------------------ | -------------------- |
32-
| **Cost** | $0.001 - $0.10 per query | **Free forever** |
33-
| **Speed** | 1-20 seconds | **< 100ms** |
34-
| **Privacy** | Data sent to servers | **100% local** |
35-
| **Limits** | Rate limits & quotas | **Unlimited** |
36-
| **Offline** | ❌ Requires internet |**Works offline** |
37-
38-
**The bottom line:** Local models now achieve **91% of GPT-5's quality** — at zero cost.
39-
40-
---
41-
4228
## 🚀 Quick Start
4329

4430
```bash
@@ -48,67 +34,93 @@ npm install native-llm
4834
```typescript
4935
import { LLMEngine } from "native-llm"
5036

51-
// That's it. One line to load a model.
5237
const engine = new LLMEngine({ model: "gemma" })
5338

5439
const result = await engine.generate({
5540
prompt: "Explain quantum computing to a 5-year-old"
5641
})
5742

5843
console.log(result.text)
59-
// → "Imagine you have a magical coin that can be heads AND tails at the same time..."
6044
```
6145

62-
Models download automatically on first use. No setup. No configuration. Just works.
46+
**That's it.** Model downloads automatically. GPU detected automatically. Just works.
6347

6448
---
6549

66-
## ⚡ Performance
50+
## 🎯 Why native-llm?
6751

68-
Benchmarked on **Apple M1 Ultra** with Metal GPU acceleration:
52+
**A friendly wrapper around [llama.cpp](https://github.com/ggerganov/llama.cpp) that handles the
53+
hard parts:**
6954

70-
| Model | Size | Speed | Best For |
71-
| --------------------- | ----- | ------------ | ----------------- |
72-
| 🚀 **Gemma 3n E2B** | 3 GB | **36 tok/s** | Maximum speed |
73-
| **Gemma 3n E4B** | 5 GB | **18 tok/s** | Best balance |
74-
| 💻 **Qwen 2.5 Coder** | 5 GB | **23 tok/s** | Code generation |
75-
| 🧠 **DeepSeek R1** | 5 GB | **9 tok/s** | Complex reasoning |
76-
| 👑 **Gemma 3 27B** | 18 GB | **5 tok/s** | Maximum quality |
55+
| Without native-llm | With native-llm |
56+
| -------------------------- | ----------------------- |
57+
| Find GGUF model URLs | `model: "gemma"` |
58+
| Configure HuggingFace auth | Auto from `HF_TOKEN` |
59+
| 20+ lines of setup | 3 lines |
60+
| Handle Qwen3 thinking mode | Automatic |
61+
| Research model benchmarks | Curated recommendations |
7762

78-
> 💡 **Our pick:** Start with `gemma-3n-e4b` — it's the sweet spot of quality and speed.
63+
### Local vs Cloud
64+
65+
| | ☁️ Cloud AI | 🏠 native-llm |
66+
| ----------- | ------------------------ | -------------------- |
67+
| **Cost** | $0.001 - $0.10 per query | **Free forever** |
68+
| **Speed** | 1-20 seconds | **< 100ms** |
69+
| **Privacy** | Data sent to servers | **100% local** |
70+
| **Limits** | Rate limits & quotas | **Unlimited** |
71+
| **Offline** | ❌ Requires internet |**Works offline** |
7972

8073
---
8174

8275
## 🎨 Models
8376

84-
Use simple aliases — we handle the rest:
77+
### Simple Aliases
8578

8679
```typescript
87-
new LLMEngine({ model: "gemma" }) // Fast & efficient
88-
new LLMEngine({ model: "gemma-large" }) // Maximum quality
80+
new LLMEngine({ model: "gemma" }) // Best balance (default)
81+
new LLMEngine({ model: "gemma-fast" }) // Maximum speed
8982
new LLMEngine({ model: "qwen-coder" }) // Code generation
90-
new LLMEngine({ model: "deepseek" }) // Chain-of-thought reasoning
91-
new LLMEngine({ model: "phi" }) // STEM & science
83+
new LLMEngine({ model: "deepseek" }) // Complex reasoning
9284
```
9385

94-
Or use any of the **1000+ GGUF models** on HuggingFace:
86+
### Smart Recommendations
9587

9688
```typescript
97-
new LLMEngine({ model: "/path/to/any-model.gguf" })
89+
import { LLMEngine } from "native-llm"
90+
91+
// Get the right model for your use case
92+
const model = LLMEngine.getModelForUseCase("code") // → qwen-2.5-coder-7b
93+
const model = LLMEngine.getModelForUseCase("fast") // → gemma-3n-e2b
94+
const model = LLMEngine.getModelForUseCase("quality") // → gemma-3-27b
95+
96+
// List all available models
97+
const models = LLMEngine.listModels()
98+
// → [{ id: "gemma-3n-e4b", name: "Gemma 3n E4B", size: "5 GB", ... }, ...]
9899
```
99100

101+
### Performance (M1 Ultra)
102+
103+
| Model | Size | Speed | Best For |
104+
| --------------------- | ----- | ------------ | ----------------- |
105+
| 🚀 **Gemma 3n E2B** | 3 GB | **36 tok/s** | Maximum speed |
106+
|**Gemma 3n E4B** | 5 GB | **18 tok/s** | Best balance |
107+
| 💻 **Qwen 2.5 Coder** | 5 GB | **23 tok/s** | Code generation |
108+
| 🧠 **DeepSeek R1** | 5 GB | **9 tok/s** | Complex reasoning |
109+
| 👑 **Gemma 3 27B** | 18 GB | **5 tok/s** | Maximum quality |
110+
100111
---
101112

102113
## ✨ Features
103114

104-
| Feature | Description |
105-
| --------------------- | ----------------------------------------------------------- |
106-
| 🔥 **Native Speed** | Direct N-API bindings to llama.cpp — no subprocess overhead |
107-
| 🍎 **Metal GPU** | Full Apple Silicon acceleration out of the box |
108-
| 🖥️ **Cross-Platform** | macOS, Linux, Windows — CUDA support for NVIDIA |
109-
| 📦 **Auto-Download** | Models fetched from HuggingFace automatically |
110-
| 🌊 **Streaming** | Real-time token-by-token output |
111-
| 📝 **TypeScript** | Full type definitions included |
115+
| Feature | Description |
116+
| --------------------- | ---------------------------------------------------------- |
117+
| 📦 **Zero Config** | Models download automatically, GPU detected automatically |
118+
| 🎯 **Smart Defaults** | Curated models, sensible parameters, thinking-mode handled |
119+
| 🔥 **Native Speed** | Direct llama.cpp bindings — no Python, no subprocess |
120+
| 🍎 **Metal GPU** | Full Apple Silicon acceleration out of the box |
121+
| 🖥️ **Cross-Platform** | macOS, Linux, Windows with CUDA support |
122+
| 🌊 **Streaming** | Real-time token-by-token output |
123+
| 📝 **TypeScript** | Full type definitions included |
112124

113125
---
114126

@@ -126,8 +138,8 @@ Get yours in 30 seconds: [huggingface.co/settings/tokens](https://huggingface.co
126138

127139
## 📚 Documentation
128140

129-
**[→ Full Documentation](https://sebastian-software.github.io/native-llm/)**Benchmarks, model
130-
comparison, streaming, chat API, and more.
141+
**[→ Full Documentation](https://sebastian-software.github.io/native-llm/)**Streaming, chat API,
142+
custom models, and more.
131143

132144
<p align="center">
133145
<strong>MIT License</strong> · Made with ❤️ by <a href="https://sebastian-software.de">Sebastian Software</a>

src/engine.test.ts

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,4 +329,51 @@ describe("Model resolution", () => {
329329
const info = engine.getModelInfo()
330330
expect(info).toEqual(MODELS["qwen-2.5-coder-7b"])
331331
})
332+
333+
it("should have gemma-fast alias point to gemma-3n-e2b", () => {
334+
const engine = new LLMEngine({ model: "gemma-fast" })
335+
const info = engine.getModelInfo()
336+
expect(info).toEqual(MODELS["gemma-3n-e2b"])
337+
})
338+
})
339+
340+
describe("Static methods", () => {
341+
describe("listModels", () => {
342+
it("should return all models", () => {
343+
const models = LLMEngine.listModels()
344+
expect(models.length).toBeGreaterThan(0)
345+
expect(models[0]).toHaveProperty("id")
346+
expect(models[0]).toHaveProperty("name")
347+
expect(models[0]).toHaveProperty("repo")
348+
})
349+
350+
it("should include gemma-3n-e4b", () => {
351+
const models = LLMEngine.listModels()
352+
const gemma = models.find((m) => m.id === "gemma-3n-e4b")
353+
expect(gemma).toBeDefined()
354+
expect(gemma?.name).toBe("Gemma 3n E4B")
355+
})
356+
})
357+
358+
describe("getModelForUseCase", () => {
359+
it("should return gemma-3n-e2b for fast", () => {
360+
expect(LLMEngine.getModelForUseCase("fast")).toBe("gemma-3n-e2b")
361+
})
362+
363+
it("should return gemma-3n-e4b for balanced", () => {
364+
expect(LLMEngine.getModelForUseCase("balanced")).toBe("gemma-3n-e4b")
365+
})
366+
367+
it("should return qwen-2.5-coder-7b for code", () => {
368+
expect(LLMEngine.getModelForUseCase("code")).toBe("qwen-2.5-coder-7b")
369+
})
370+
371+
it("should return deepseek-r1-14b for reasoning", () => {
372+
expect(LLMEngine.getModelForUseCase("reasoning")).toBe("deepseek-r1-14b")
373+
})
374+
375+
it("should return gemma-3-27b for quality", () => {
376+
expect(LLMEngine.getModelForUseCase("quality")).toBe("gemma-3-27b")
377+
})
378+
})
332379
})

src/engine.ts

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -438,4 +438,62 @@ export class LLMEngine {
438438
this.session = null
439439
this.llama = null
440440
}
441+
442+
// ============================================
443+
// Static Methods
444+
// ============================================
445+
446+
/**
447+
* List all available curated models
448+
*
449+
* @returns Array of model information objects
450+
*
451+
* @example
452+
* ```typescript
453+
* const models = LLMEngine.listModels()
454+
* models.forEach(m => console.log(`${m.id}: ${m.name} (${m.parameters})`))
455+
* ```
456+
*/
457+
static listModels(): ({ id: string } & (typeof MODELS)[ModelId])[] {
458+
return Object.entries(MODELS).map(([id, info]) => ({
459+
id,
460+
...info
461+
}))
462+
}
463+
464+
/**
465+
* Get recommended model for a specific use case
466+
*
467+
* @param useCase - One of: fast, balanced, quality, edge, multilingual, reasoning, code, longContext
468+
* @returns Model ID string
469+
*
470+
* @example
471+
* ```typescript
472+
* const modelId = LLMEngine.getModelForUseCase("code")
473+
* const engine = new LLMEngine({ model: modelId })
474+
* ```
475+
*/
476+
static getModelForUseCase(
477+
useCase:
478+
| "fast"
479+
| "balanced"
480+
| "quality"
481+
| "edge"
482+
| "multilingual"
483+
| "reasoning"
484+
| "code"
485+
| "longContext"
486+
): ModelId {
487+
const recommendations: Record<string, ModelId> = {
488+
fast: "gemma-3n-e2b",
489+
balanced: "gemma-3n-e4b",
490+
quality: "gemma-3-27b",
491+
edge: "gemma-3n-e2b",
492+
multilingual: "qwen3-8b",
493+
reasoning: "deepseek-r1-14b",
494+
code: "qwen-2.5-coder-7b",
495+
longContext: "gemma-3-27b"
496+
}
497+
return recommendations[useCase] ?? "gemma-3n-e4b"
498+
}
441499
}

src/types.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,7 @@ export type ModelInfo = (typeof MODELS)[ModelId]
187187
export const MODEL_ALIASES: Record<string, ModelId> = {
188188
// Gemma
189189
gemma: "gemma-3n-e4b",
190+
"gemma-fast": "gemma-3n-e2b",
190191
"gemma-large": "gemma-3-27b",
191192

192193
// GPT-OSS (experimental)

0 commit comments

Comments
 (0)