Skip to content

BudEcosystem/simulator

Β 
Β 

Repository files navigation

LLM Performance Simulator

A comprehensive ecosystem for simulating and analyzing Large Language Model (LLM) performance across diverse hardware platforms. This repository contains two main projects that work together to provide accurate memory estimation and performance modeling for LLM deployment.

πŸ—οΈ Repository Structure

simulator/
β”œβ”€β”€ BudSimulator/           # Full-stack web application for LLM analysis
β”‚   β”œβ”€β”€ frontend/          # React TypeScript UI
β”‚   β”œβ”€β”€ apis/              # FastAPI backend
β”‚   └── Website/           # Streamlit dashboard
β”‚
└── llm-memory-calculator/  # Core LLM performance modeling engine
    └── src/               # Python package with GenZ framework

πŸ“¦ Projects Overview

1. llm-memory-calculator

The core engine that provides:

  • Memory Calculation: Precise memory requirements for any LLM architecture
  • Performance Modeling: Latency and throughput estimation
  • Architecture Support: Transformers, Mamba, Hybrid, Diffusion models
  • Hardware Abstraction: CPU and GPU performance modeling
  • Parallelism Strategies: Tensor, Pipeline, Data, and Expert parallelism

View llm-memory-calculator README

2. BudSimulator

A full-stack web application built on top of llm-memory-calculator:

  • Web Dashboard: Interactive React frontend
  • REST API: FastAPI backend with comprehensive endpoints
  • Database: SQLite with pre-populated hardware configs
  • Model Management: HuggingFace integration
  • Use Case Scenarios: Deployment planning tools

View BudSimulator README

πŸš€ Quick Start

Option 1: Full Stack Application (BudSimulator)

cd BudSimulator
python setup.py  # Automated setup script

This will install everything and launch the web application.

Option 2: Python Package Only (llm-memory-calculator)

cd llm-memory-calculator
pip install -e .

Then use in Python:

from llm_memory_calculator import estimate_memory

config = {
    "hidden_size": 4096,
    "num_hidden_layers": 32,
    "vocab_size": 32000
}
report = estimate_memory(config, seq_length=2048)
print(f"Total memory: {report.total_memory_gb:.2f} GB")

Option 3: Docker Deployment (Recommended for Production)

# Build and run with Docker
docker build -t budsimulator .
docker run -p 8000:8000 -p 3000:3000 budsimulator

# Or use docker-compose
docker-compose up --build

# For domain deployment (e.g., simulator.bud.studio)
./deploy-domain.sh

🎯 Use Cases

For ML Engineers

  • Estimate GPU memory requirements before training/deployment
  • Compare memory usage across different model architectures
  • Plan distributed training strategies

For Infrastructure Teams

  • Get hardware recommendations for specific models
  • Optimize resource allocation
  • Plan scaling strategies

For Researchers

  • Analyze performance characteristics of new architectures
  • Compare efficiency across different model designs
  • Benchmark hardware platforms

πŸ“Š Key Features

Memory Analysis

  • Weight memory calculation
  • KV cache estimation
  • Activation memory tracking
  • Gradient memory (for training)
  • Optimizer state sizing

Performance Metrics

  • TTFT (Time to First Token)
  • TPOT (Time Per Output Token)
  • Throughput (tokens/second)
  • Latency breakdown by component

Hardware Support

  • NVIDIA GPUs (A100, H100, etc.)
  • AMD GPUs (MI300X, etc.)
  • Google TPUs
  • Intel/AMD CPUs
  • Custom hardware configs

Model Architectures

  • Transformer models (GPT, LLaMA, etc.)
  • Mixture of Experts (MoE)
  • State Space Models (Mamba)
  • Hybrid architectures
  • Diffusion models

πŸ”§ Architecture

System Design

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   React Web UI  │────▢│  FastAPI Backend β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ llm-memory-calculator  β”‚
                    β”‚   (Core Engine)        β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό                        β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ GenZ Models  β”‚        β”‚  Hardware    β”‚
            β”‚              β”‚        β”‚  Configs     β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Technology Stack

  • Frontend: React, TypeScript, Tailwind CSS
  • Backend: FastAPI, SQLAlchemy, Pydantic
  • Core Engine: Python, NumPy, Pandas
  • Visualization: Plotly, Streamlit
  • Database: SQLite
  • Testing: Pytest, Jest

πŸ“– Documentation

πŸ§ͺ Examples

Calculate Memory for LLaMA Model

from llm_memory_calculator import ModelMemoryCalculator

calculator = ModelMemoryCalculator()
config = {
    "model_type": "llama",
    "hidden_size": 4096,
    "num_hidden_layers": 32,
    "num_attention_heads": 32,
    "vocab_size": 32000
}

report = calculator.calculate_memory(
    config,
    seq_length=2048,
    batch_size=4,
    precision="fp16"
)

print(f"Total Memory: {report.total_memory_gb:.2f} GB")
print(f"Breakdown: {report.memory_breakdown}")

Performance Estimation

from llm_memory_calculator import estimate_prefill_performance

result = estimate_prefill_performance(
    model="llama2_7b",
    batch_size=1,
    input_tokens=2048,
    system_name="A100_80GB",
    tensor_parallel=2
)

print(f"Latency: {result['Latency']:.2f} ms")
print(f"Throughput: {result['Throughput']:.2f} tokens/s")

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Clone the repository
git clone https://github.com/BudEcosystem/simulator.git
cd simulator

# Install development dependencies
pip install -e llm-memory-calculator/
cd BudSimulator && pip install -r requirements.txt

Running Tests

# Test llm-memory-calculator
cd llm-memory-calculator
pytest tests/

# Test BudSimulator
cd BudSimulator
python comprehensive_api_test.py

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Built on the GenZ-LLM Analyzer framework
  • Inspired by MLPerf benchmarks
  • Hardware specs from official documentation
  • Model configs from HuggingFace

πŸ“ž Support

πŸ—ΊοΈ Roadmap

Near Term

  • Support for quantized models (GGUF, AWQ, GPTQ)
  • Cloud cost estimation
  • Inference server integration (vLLM, TGI)
  • Multi-node simulation

Long Term

  • Training performance estimation
  • Fine-tuning recommendations
  • Energy efficiency metrics
  • Real-time monitoring integration

Built with ❀️ by the Bud Ecosystem team

About

Decoder only, Encoder only, Diffusion model simulation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 81.0%
  • TypeScript 17.9%
  • Shell 0.6%
  • HTML 0.2%
  • Batchfile 0.1%
  • CSS 0.1%
  • Other 0.1%