Goal: Understanding, not production.
This repository teaches AI product development by building a real Communication Mirror agent - a tool that analyzes the emotional impact of messages. You'll learn:
- How to structure multi-step AI reasoning
- Prompt engineering that actually works
- Local LLM integration with node-llama-cpp
- Building without frameworks (so you understand what's happening)
Not included: Production concerns, scalability, frameworks, deployment
What you get: Deep understanding of how AI agents work under the hood
The Communication Mirror interface showing message analysis with intent, tone, impact, and alternative phrasings.
The application during requesting the large language model, loading state.
The logs interface showing LLM requests, responses, and validation details for debugging.
This repository includes 10 comprehensive lessons that guide you through building an AI product from scratch. Follow them sequentially to build a complete understanding:
| Lesson | Title | What You'll Learn |
|---|---|---|
| 01 | What is an AI Product and Why This Project? | Understand what makes AI products different, the project's goals, and what's included/excluded |
| 02 | Understanding Local LLMs and GGUF Models | Learn about local LLMs, GGUF format, quantization, and how to load models with node-llama-cpp |
| 03 | How to Write Effective Prompts | Master prompt engineering with calibration rules, error handling, and techniques that actually work |
| 04 | JSON Schemas and Grammar Constraints | Enforce structured output using JSON schema grammars and validation layers |
| 05 | Breaking Complex Tasks into Steps | Decompose complex AI tasks into parallel analysis steps for efficient execution |
| 06 | Building REST APIs for AI Services | Design REST APIs that expose AI capabilities with proper error handling and middleware |
| 07 | Connecting UI to AI Backend | Build a React frontend that consumes AI APIs with proper state management |
| 08 | Managing Conversation Context | Maintain conversation context across interactions for context-aware analysis |
| 09 | Ensuring Quality Outputs | Validate LLM outputs, handle errors gracefully, and implement retry logic |
| 10 | What We Learned and What We Skipped | Reflect on design decisions, trade-offs, and what would be different in production |
Start with Lesson 01 to begin your learning journey.
Before getting started, ensure you have:
- Node.js >= 18.0.0 (check with
node --version) - npm (comes with Node.js)
- TypeScript knowledge: Basic understanding of types, interfaces, and async/await
- React knowledge: Familiarity with components, hooks, and JSX (for frontend)
- Prompt you heard of prompts
- System Requirements:
- RAM:
- Minimum: 4GB (for 1.7B Q4_K_M model)
- Recommended: 8GB (for 4B Q6_K model)
- Best: 12GB+ (for 8B Q6_K model)
- Disk Space: 2-5GB for GGUF model files (depending on model size)
- OS: macOS, Linux, or Windows (WSL recommended for Windows)
- GPU: Optional (CPU-only works fine, GPU can speed up inference)
- RAM:
Important: RAM ≠ VRAM
The numbers above assume CPU-only inference.
If your system has no dedicated GPU (e.g. Intel integrated graphics), your available VRAM is usually very limited (often ~1–2 GB), even if you have plenty of system RAM.
In that case, larger models or large context sizes (e.g. 4096 tokens) can fail with errors like:
InsufficientMemoryError: context size too large for available VRAMIf this happens, use:
- smaller models (1.7B)
- lower quantization (Q4 / Q5)
- smaller context sizes (2048 or 1024)
npm installThis installs dependencies for all workspaces (backend, frontend, shared).
The project uses local LLMs in GGUF format (not PyTorch, not safetensors). You need to download a compatible model file.
Tested and Recommended:
- Qwen3-1.7B-Q8_0 (~2GB file, ~2GB RAM) - Best for beginners
- Qwen3-4B-Instruct-2507-Q6_K (~3GB file, ~3GB RAM) - Best balance
- Qwen3-8B-Q6_K (~5GB file, ~5GB RAM) - Best quality
Other Compatible Models:
- Any Qwen3 model in GGUF format
- Any Llama 3 model in GGUF format
- Any Mistral model in GGUF format
- Any model compatible with
node-llama-cpp(must be GGUF format)
Models That Won't Work:
- PyTorch models (
.bin,.pth,.pt) - Safetensors models (
.safetensors) - ONNX models (
.onnx) - Models not in GGUF format
Quantization reduces model size and RAM usage by using fewer bits per weight. Here's what the codes mean:
| Quantization | Quality | File Size | RAM Usage | Speed | Best For |
|---|---|---|---|---|---|
| Q8_0 | Highest | Largest | Highest | Fastest | Best quality, 8GB+ RAM |
| Q6_K | High | Medium | Medium | Fast | Recommended - good balance |
| Q5_K_M | Good | Medium | Medium | Fast | Good balance |
| Q4_K_M | Fair | Small | Low | Medium | Limited RAM (4-6GB) |
| Q3_K_M | Lower | Smaller | Lower | Slower | Very limited RAM |
| Q2_K | Lowest | Smallest | Lowest | Slowest | Minimal RAM only |
Recommendation: Start with Q6_K or Q8_0 for best results. Use Q4_K_M only if you have limited RAM.
| Model Size | Minimum RAM | Recommended RAM | Disk Space | Inference Speed* |
|---|---|---|---|---|
| 1.7B (Q8_0) | 4GB | 6GB | 2GB | ~2-3 sec/analysis |
| 1.7B (Q6_K) | 3GB | 5GB | 1.5GB | ~2-3 sec/analysis |
| 4B (Q6_K) | 6GB | 8GB | 3GB | ~3-5 sec/analysis |
| 8B (Q6_K) | 8GB | 12GB | 5GB | ~5-8 sec/analysis |
*Speed varies by CPU, RAM speed, and model quantization
VRAM (GPU) Requirements:
- CPU-only (default): Uses system RAM, no GPU needed
- GPU acceleration: Optional, requires CUDA/ROCm compatible GPU
- To enable GPU: Set
gpuLayersinbackend/lib/llm-service.ts(advanced, not required)
Option 1: Hugging Face (Recommended)
- Go to Hugging Face Models
- Search for "Qwen3" and filter by "GGUF"
- Look for models from unsloth (most reliable quantizations)
- Download the
.gguffile, Q6_K or Q8_0 quantization (not the entire repository)
Option 2: Unsloth AI's Models (Easiest)
- Go to UnSloth's Qwen3 Models
- Search for "Qwen3"
- Click on a model (e.g., "Qwen3-1.7B-Instruct-GGUF")
- Download the
.gguffile from "Files and versions" tab - Choose quantization level (Q8_0, Q6_K, etc.)
Direct Download Links:
Using Browser:
- Click the download link above
- Wait for download to complete (may take several minutes for large files)
- Move the
.gguffile tobackend/models/directory
Using Command Line (Linux/macOS):
# Create models directory
mkdir -p backend/models
# Download Qwen3-1.7B Q6_K (recommended)
cd backend/models
wget https://huggingface.co/unsloth/Qwen3-1.7B-GGUF/resolve/main/Qwen3-1.7B-Q6_K.gguf
# Or using curl
curl -L -o qwen3-1.7b-instruct.Q6_K.gguf \
https://huggingface.co/unsloth/Qwen3-1.7B-GGUF/resolve/main/Qwen3-1.7B-Q6_K.ggufVerify Download:
# Check file exists and size
ls -lh backend/models/*.gguf
# Should show file size matching expected (e.g., ~1.5GB for Q6_K 1.7B model)Create a .env file in the backend/ directory (optional - defaults are provided):
# backend/.env
MODEL_PATH=./models/qwen3-1.7b-instruct.Q6_K.gguf # Path to your GGUF model fileImportant Notes:
- The
MODEL_PATHshould match the exact filename you downloaded - File names may vary (e.g.,
qwen3-1.7b-instruct.Q6_K.ggufvsQwen3-1.7B-Q6_K.gguf) - Use the actual filename from your download, not the example above
- If
MODEL_PATHis not set, it defaults to./models/model.gguf(which won't exist unless you rename your file)
Option A: Run both backend and frontend together:
npm run devOption B: Run separately:
# Terminal 1: Backend
cd backend
npm run dev
# Terminal 2: Frontend
cd frontend
npm run dev- Frontend: http://localhost:5173 (Vite default port)
- Backend API: http://localhost:3001
- API Documentation: http://localhost:3001/api-docs
- Status Endpoint: http://localhost:3001/api/status (check model loading progress)
Note: The model loads in the background. Wait for it to finish loading before making analysis requests. Check the status endpoint to see loading progress.
Not sure which model to choose? Use this guide:
| Your RAM | Recommended Model | Download Link |
|---|---|---|
| 4-6GB | Qwen3-1.7B Q4_K_M | Download Q4_K_M |
| 6-8GB | Qwen3-1.7B Q6_K | Download Q6_K |
| 8GB+ | Qwen3-4B Q6_K | Download 4B Q6_K |
| 12GB+ | Qwen3-8B Q6_K | Download 8B Q6_K |
First time? Start with Qwen3-1.7B Q6_K - it's a good balance of quality and resource usage.
For detailed troubleshooting help, see TROUBLESHOOTING.md.
Quick fixes:
- Model not loading? Verify
MODEL_PATHpoints to a valid.gguffile - Out of memory? Try a smaller quantized model (Q8_0 or Q6_K instead of Q4_K_M)
- Port already in use? Change
PORTin.envor kill the process using that port - Dependencies issues? Run
npm run clean && npm installto reinstall everything - Validation errors? Check
/api/logsendpoint orbackend/logs/directory for LLM output - Model still loading? Check
/api/statusendpoint to see loading progress



