A Node.js server that hosts a local LLaMA AI model for mobile applications.
- Node.js 18+ installed on your PC
- At least 4GB RAM (8GB+ recommended for larger models)
- LLaMA model file (.gguf format)
-
Clone/Download the server code
# If using git git clone [your-repo-url] agent-server cd agent-server
-
Install dependencies
npm install
-
Run setup script
npm run setup
-
Download an AI model
Choose one based on your PC's capabilities:
Lightweight (1-2GB RAM):
# Download LLaMA 3.2 1B (fastest, good quality) curl -L -o models/llama-3.2-1b-q4.gguf \ "https://huggingface.co/lmstudio-community/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf"
Balanced (4-6GB RAM):
# Download LLaMA 3.2 3B (better quality, slower) curl -L -o models/llama-3.2-3b-q4.gguf \ "https://huggingface.co/lmstudio-community/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf"
-
Configure environment
Edit
.envfile and update:# Find your PC's IP address # Windows: ipconfig # Mac/Linux: ifconfig # Update ALLOWED_ORIGINS with your network IP ALLOWED_ORIGINS=exp://192.168.1.XXX:8081,http://localhost:8081
-
Start the server
# Development mode (auto-restart on changes) npm run dev # Production mode npm start
GET /api/health- Basic health checkGET /api/health/detailed- Detailed system infoGET /api/health/model- AI model statusPOST /api/health/test- Test AI response
POST /api/chat/message- Send message, get AI responsePOST /api/chat/stream- Send message, get streaming responseGET /api/chat/suggestions- Get conversation starters
Send a chat message:
curl -X POST http://YOUR_PC_IP:3001/api/chat/message \
-H "Content-Type: application/json" \
-d '{"message": "Help me be more productive with my daily tasks"}'Response:
{
"success": true,
"response": "Here are some strategies to improve productivity...",
"metadata": {
"responseTime": "1250ms",
"timestamp": "2025-07-14T10:30:00Z"
}
}| Variable | Description | Default |
|---|---|---|
PORT |
Server port | 3001 |
NODE_ENV |
Environment | development |
ALLOWED_ORIGINS |
CORS origins for mobile app | See .env.example |
AI_MAX_TOKENS |
Max AI response length | 150 |
AI_TEMPERATURE |
AI creativity (0.0-1.0) | 0.7 |
The server automatically finds models in:
./models/directory./assets/models/directory
Supported formats:
.gguf(recommended).bin(legacy)
- Node.js 18+
- 2GB RAM
- 2GB disk space
- Node.js 20+
- 8GB RAM
- 5GB disk space
- SSD storage
The server includes:
- Rate limiting (10 requests/minute per IP for chat)
- CORS protection
- Request validation
- Helmet security headers
-
Find your PC's IP address:
# Windows ipconfig # Mac/Linux ifconfig
-
Update firewall settings:
- Windows: Allow Node.js through Windows Firewall
- Mac: System Preferences > Security & Privacy > Firewall
- Linux: Configure iptables/ufw
-
Test connectivity:
# From another device on your network curl http://YOUR_PC_IP:3001/api/health
-
Configure router port forwarding:
- Forward external port (e.g., 8080) to internal port 3001
- Point to your PC's local IP
-
Update DNS:
- Point your domain to your public IP
- Update ALLOWED_ORIGINS in .env
-
Consider security:
- Use HTTPS with SSL certificates
- Implement authentication if needed
- Monitor access logs
Model not loading:
- Check model file exists in correct directory
- Verify model file isn't corrupted
- Check available RAM
Mobile app can't connect:
- Verify PC IP address in mobile app
- Check firewall settings
- Confirm CORS origins in .env
Slow responses:
- Try smaller model (1B instead of 3B)
- Check available RAM
- Monitor CPU usage
Check console output for detailed error messages:
npm run dev # Shows real-time logsYour mobile app should connect to:
http://YOUR_PC_IP:3001
Required headers:
{
"Content-Type": "application/json",
"x-session-id": "unique-session-id" // Optional
}To update the server:
- Pull latest code
- Run
npm installfor new dependencies - Restart server
MIT License - see LICENSE file for details
Need help? Check the troubleshooting section or create an issue in the repository.