Run a 2-min local benchmark → predict how long your AI job will take on cloud GPU. "Premium soon: Tiny Transformer proxy for LLMs + better accuracy + real cloud prices. Email for early access No guessing. No wasted money.
⭐ If this saved you from a wrong GPU choice — star the repo.
You have 1 million images to process with AI.
You open AWS and see:
T4 GPU → $0.52/hr
V100 GPU → $1.80/hr
A100 GPU → $3.20/hr
You don't know which one to pick.
You don't know how many hours you'll need.
You guess. You pay. Sometimes you're wrong.
"I chose V100 for a job that turned out to be too easy —
could have done it on T4 for half the price."
— Reddit user, r/learnmachinelearning
-ScalePredict Update – March 2026
337 views, 30 testers and 140 clones in the last 14 days.
People most often go straight to “Run a 2-min benchmark” — this is the best signal that the idea resonates.
User feedback (very accurate):
“ResNet-18 is good for regular models, but for transformers with long context the prediction will be less accurate.”
I agree 100%. That’s why I’m adding it as a known limitation in the documentation.
What’s coming soon:
- Tiny Transformer proxy (nanoGPT-style) — specifically for LLM and long-context tasks
- Long-context correction factor (quadratic attention)
- Real-time cloud prices + recommendation “V100 or T4 is enough?”
- Parameter-count fallback for quick checks
If you tested — please share:
-
What error did you get (predicted vs real)?
-
On what model/job (ResNet, Llama, diffusion…)?
Repo: https://github.com/Kretski/ScalePredict Demo: https://scalepredict.streamlit.app/calculator
Thanks to everyone who tried! ⚡--
Option A — Calculator (no install, 30 seconds):
Open scalepredict.streamlit.app/calculator, enter your data type, file count and model → see runtime instantly.
Option B — Full benchmark (2 minutes, more accurate):
python run_benchmark.pyMeasures your actual machine. Then:
⚡ A100 → 0.4h fastest
V100 → 0.8h
A10G → 1.1h
T4 → 2.3h
Look up the price yourself. Multiply. Done.
# Install
pip install -r requirements.txt
# Step 1 — measure your machine (2 min)
python run_benchmark.py
# Step 2 — open dashboard
streamlit run scalepredict_app.pyOpens at http://localhost:8501
All three machines ran the same run_benchmark.py — no simulated data.
| Machine | CPU/GPU | Throughput | W Score | Ratio vs Lenovo |
|---|---|---|---|---|
| Lenovo L14 (Ryzen 7 Pro) | AMD CPU | 58 img/s | +0.054 | 1.0x baseline |
| Fujitsu H710 (Sandy Bridge) | Intel CPU | 14 img/s | -0.165 | 4.8x slower |
| Xeon + Quadro M4000 | Intel + GPU | 639 img/s | +0.730 | 7.6x faster |
| Pair | Pearson r | Spearman ρ |
|---|---|---|
| Lenovo ↔ Fujitsu | 0.9977 | 1.0000 |
| Lenovo ↔ Xeon+GPU | 0.9971 | 1.0000 |
| Fujitsu ↔ Xeon+GPU | 0.9998 | 1.0000 |
Spearman ρ = 1.000 across all pairs — measured, not theoretical.
run_benchmark.py
→ measures latency across batch sizes [1, 8, 32, 64, 128]
→ removes GPU warmup outliers automatically
→ computes W score = Q·D - T
→ saves scalepredict_profile.json
scalepredict_app.py
→ reads your profile
→ applies k(t,d) scaling model
→ predicts runtime on T4 / V100 / A100 / A10G
k(t,d) = k₀ · e^(−αt) · (1 + β/d)
t = batch size
d = latency proxy (ms × 1000)
k₀ = architecture constant
Not a lookup table. Not a heuristic.
Original formula — cross-architecture scaling model.
- Optimized for CNN inference (ResNet, YOLO, image classification)
- Transformer models with long context may show different memory access patterns — prediction less accurate for sequences > 512 tokens
- Prediction accuracy decreases for models with irregular memory access
- GPU warmup outliers are removed automatically (first batch excluded)
The scalepredict_profile.json contains:
- CPU model name
- RAM size
- Core count
- Benchmark results (latency, throughput)
No usernames. No location. No personal data.
Open it in any text editor to verify before uploading.
ScalePredict/
├── run_benchmark.py ← run this on your machine
├── scalepredict_app.py ← Streamlit dashboard
├── calculator.py ← simple calculator, no benchmark needed
├── requirements.txt ← dependencies
└── README.md
- CPU benchmark (Lenovo L14)
- CPU benchmark (Fujitsu H710)
- GPU benchmark (Xeon + Quadro M4000)
- Streamlit dashboard
- Simple calculator (no install)
- r > 0.997 on all 3 machine pairs
- Known limitations documented
- Privacy notice
- Transformer workload support
- GCP / Azure pricing links
- arXiv preprint
- pip package
MIT — use freely.
3 machines. 3 real benchmarks. Spearman ρ = 1.000.
⭐ Star the repo if it helped you.