Note -> This app was completely built using Cursor composer with Anthropic's Claude 3.5-sonnet and GPT-4o for documention generation. I expect there to be some imperfections, but happy to review pr's to make it better!
local-llm-benchmark.mov
Note: currently this model is optimized for apple silicon
A simple interactive benchmarking application for tracking local LLM inferences and token usage.
- Real-time token counting
- Inference time tracking
- Chat history with metrics
- Transaction history
- Dark mode interface
- Support for local LLM models
-
Install dependencies:
npm install
-
Download a GGUF model (example using Mistral 7B):
# Create models directory mkdir models # Download model using aria2c (recommended for large files) aria2c -d models -x 16 -s 16 https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
-
Create .env file:
MODEL_PATH=./models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
-
Run the development server:
npm run dev
- Enter your prompt in the text area
- Click "Run Benchmark"
- View real-time:
- Token generation
- Inference speed
- Response metrics
- Transaction history
- Node.js 18+
- 8GB+ RAM (depending on model size)
- M1/M2 Mac recommended for optimal performance
interactive-llm-benchmark/ ├── src/ │ └── app/ │ ├── components/ # React components │ ├── utils/ # Utility functions │ ├── types/ # TypeScript types │ └── api/ # API routes ├── models/ # LLM model files (gitignored) ├── .env # Environment variables └── README.md # This file
- Models are not included in the repository
- Build files and models are gitignored
- Metrics are tracked per transaction
- Supports streaming responses
-
Model not found:
- Verify
MODEL_PATHin.env - Check models directory exists
- Ensure model file is downloaded
- Verify
-
Performance issues:
- Check available RAM
- Adjust context size in configuration
- Consider using a smaller model
-
Build errors:
- Clear
.nextdirectory - Reinstall dependencies
- Update Next.js if needed
- Clear