Skip to content

Kretski/ScalePredict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡ ScalePredict

Run a 2-min local benchmark → predict how long your AI job will take on cloud GPU. "Premium soon: Tiny Transformer proxy for LLMs + better accuracy + real cloud prices. Email for early access No guessing. No wasted money.

GitHub Stars   Live Demo   Calculator

⭐ If this saved you from a wrong GPU choice — star the repo.


The Problem

You have 1 million images to process with AI.
You open AWS and see:

T4   GPU  →  $0.52/hr
V100 GPU  →  $1.80/hr  
A100 GPU  →  $3.20/hr

You don't know which one to pick.
You don't know how many hours you'll need.
You guess. You pay. Sometimes you're wrong.

"I chose V100 for a job that turned out to be too easy —
could have done it on T4 for half the price."

— Reddit user, r/learnmachinelearning

-ScalePredict Update – March 2026

337 views, 30 testers and 140 clones in the last 14 days.

People most often go straight to “Run a 2-min benchmark” — this is the best signal that the idea resonates.

User feedback (very accurate):

“ResNet-18 is good for regular models, but for transformers with long context the prediction will be less accurate.”

I agree 100%. That’s why I’m adding it as a known limitation in the documentation.

What’s coming soon:

  • Tiny Transformer proxy (nanoGPT-style) — specifically for LLM and long-context tasks
  • Long-context correction factor (quadratic attention)
  • Real-time cloud prices + recommendation “V100 or T4 is enough?”
  • Parameter-count fallback for quick checks

If you tested — please share:

  • What error did you get (predicted vs real)?

  • On what model/job (ResNet, Llama, diffusion…)?

Repo: https://github.com/Kretski/ScalePredict Demo: https://scalepredict.streamlit.app/calculator

Thanks to everyone who tried! ⚡--

The Solution

Option A — Calculator (no install, 30 seconds):

Open scalepredict.streamlit.app/calculator, enter your data type, file count and model → see runtime instantly.

Option B — Full benchmark (2 minutes, more accurate):

python run_benchmark.py

Measures your actual machine. Then:

⚡ A100  →  0.4h   fastest
   V100  →  0.8h
   A10G  →  1.1h
   T4    →  2.3h

Look up the price yourself. Multiply. Done.


Quick Start

# Install
pip install -r requirements.txt

# Step 1 — measure your machine (2 min)
python run_benchmark.py

# Step 2 — open dashboard
streamlit run scalepredict_app.py

Opens at http://localhost:8501


Tested on Real Hardware

All three machines ran the same run_benchmark.py — no simulated data.

Machine CPU/GPU Throughput W Score Ratio vs Lenovo
Lenovo L14 (Ryzen 7 Pro) AMD CPU 58 img/s +0.054 1.0x baseline
Fujitsu H710 (Sandy Bridge) Intel CPU 14 img/s -0.165 4.8x slower
Xeon + Quadro M4000 Intel + GPU 639 img/s +0.730 7.6x faster

Cross-Machine Correlations

Pair Pearson r Spearman ρ
Lenovo ↔ Fujitsu 0.9977 1.0000
Lenovo ↔ Xeon+GPU 0.9971 1.0000
Fujitsu ↔ Xeon+GPU 0.9998 1.0000

Spearman ρ = 1.000 across all pairs — measured, not theoretical.


How It Works

run_benchmark.py
  → measures latency across batch sizes [1, 8, 32, 64, 128]
  → removes GPU warmup outliers automatically
  → computes W score = Q·D - T
  → saves scalepredict_profile.json

scalepredict_app.py
  → reads your profile
  → applies k(t,d) scaling model
  → predicts runtime on T4 / V100 / A100 / A10G

The k(t,d) Model

k(t,d) = k₀ · e^(−αt) · (1 + β/d)

t  = batch size
d  = latency proxy (ms × 1000)
k₀ = architecture constant

Not a lookup table. Not a heuristic.
Original formula — cross-architecture scaling model.


Known Limitations

  • Optimized for CNN inference (ResNet, YOLO, image classification)
  • Transformer models with long context may show different memory access patterns — prediction less accurate for sequences > 512 tokens
  • Prediction accuracy decreases for models with irregular memory access
  • GPU warmup outliers are removed automatically (first batch excluded)

Privacy

The scalepredict_profile.json contains:

  • CPU model name
  • RAM size
  • Core count
  • Benchmark results (latency, throughput)

No usernames. No location. No personal data.
Open it in any text editor to verify before uploading.


Files

ScalePredict/
├── run_benchmark.py      ← run this on your machine
├── scalepredict_app.py   ← Streamlit dashboard
├── calculator.py         ← simple calculator, no benchmark needed
├── requirements.txt      ← dependencies
└── README.md

Roadmap

  • CPU benchmark (Lenovo L14)
  • CPU benchmark (Fujitsu H710)
  • GPU benchmark (Xeon + Quadro M4000)
  • Streamlit dashboard
  • Simple calculator (no install)
  • r > 0.997 on all 3 machine pairs
  • Known limitations documented
  • Privacy notice
  • Transformer workload support
  • GCP / Azure pricing links
  • arXiv preprint
  • pip package

License

MIT — use freely.


3 machines. 3 real benchmarks. Spearman ρ = 1.000.
⭐ Star the repo if it helped you.

About

Run a 2-min local benchmark → predict how long your AI job will take on cloud GPU (T4/V100/A100). No guessing, no wasted money.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages