Skip to content

DataBoySu/MyGPU

Repository files navigation

MyGPU logo

MyGPU: Lightweight GPU Management Utility: a compact nvidia-smi wrapper with an elegant web dashboard.

License Python Version Platform cuda 12.x

Gallery

Web Dashboard
CLI

Why use this?

  • Lightweight: Minimal resource footprint.
  • Flexible: Runs as a CLI tool, or a full-featured Web Dashboard.
  • Admin-Centric: Includes features like VRAM Enforcement (auto-kill processes exceeding limits) and Watchlists.
  • Developer-Friendly: Built-in benchmarking and stress-testing tools (GEMM, Particle Physics) to validate system stability.

Features

  • Real-time Monitoring:

    • Detailed GPU metrics (Utilization, VRAM, Power, Temp).
    • System metrics (CPU, RAM, etc.).
  • Admin & Enforcement:

    • VRAM Caps: Set hard limits on VRAM usage per GPU.
    • Auto-Termination: Automatically terminate processes that violate VRAM policies (Admin only).
    • Watchlists: Monitor specific PIDs or process names.
  • Benchmarking & Simulation:

    • Stress Testing: Configurable GEMM workloads to test thermal throttling and stability.
    • Visual Simulation: Interactive 3D particle physics simulation to visualize GPU load.

Roadmap & Future Work

Contributions are welcome! Main future points to cover would be:

  • Multi-GPU Support: Enhanced handling for multi-card setups and NVLink topologies.
  • Containerization: Official Docker support for easy deployment in containerized environments.
  • Remote Access: SSH tunneling integration and secure remote management.
  • Cross-Platform:
    • Linux Support (Ubuntu/Debian focus).
    • macOS Support (Apple Silicon monitoring).
  • Hardware Agnostic:
    • AMD ROCm support.
    • Intel Arc support.
  • Multi-Language Documentation: Supporting most popular GitHub languages.

See CONTRIBUTING.md for how to get involved.


Requirements

  • OS: Windows 10/11
  • Python: 3.10+
  • Hardware: NVIDIA GPU with installed drivers.
  • CUDA: Toolkit 12.x (Strictly required for Benchmarking/Simulation features).
    • Note: If CUDA 12.x is not detected, GPU-specific benchmarking features will be disabled.

Installation

The tool supports modular installation to fit your needs:

1. Minimal (CLI Only)

Best for headless servers or background monitoring.

  • Command-line interface.
  • Basic system/GPU metrics.

2. Standard (CLI + Web UI)

Best for most users.

  • Includes Web Dashboard.
  • REST API endpoints.
  • Real-time charts.
  • But no Simulation or benchmarking.

3. Full (Standard + Visualization)

Best for development and stress testing.

  • Includes Simulation.
  • PyTorch/CuPy dependencies for benchmarking.

Quick Start

  1. Download the latest release or clone the repo.
  2. Run Setup:
.\setup.ps1
  1. Launch:
# Start the web dashboard (Standard/Full)
python health_monitor.py web

# Start the CLI
python health_monitor.py cli

License

See LICENSE for details.