ProphitGPT

GPT‑style next‑token forecasting for crypto OHLC sequences. Tokens are produced by K‑Means over 5D [open, high, low, close, direction = close−open]. The model is a lean GPT‑2–like decoder upgraded with RoPE (relative positions), RMSNorm, and SwiGLU.

What’s included

A minimal, ready‑to‑train dataset for BTCUSDT:
- data/price/btcusdt.csv
- data/norm/btcusdt.csv (normalized OHLC plus price_min/price_max per row; reversible)
- data/tokens/btcusdt.csv
Tokenizer assets (generated by src/data/train_tokenizer.py):
tokenizer/kmeans_centers.npy, kmeans_meta.json, kmeans_model.joblib, vocab.json

With these files you can train immediately—no flags, just run the scripts below in order.

Setup

# 1) Create a fresh environment (recommended)
python -m venv .venv
# Windows
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate

# 2) Install dependencies
pip install -r requirements.txt

# 3) (GPU on Windows) Install a CUDA build of PyTorch (example: CUDA 12.6 wheels)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Quick check:

python - <<'PY'
import torch
print("cuda.is_available:", torch.cuda.is_available())
if torch.cuda.is_available(): print("device:", torch.cuda.get_device_name(0))
PY

Run order (no CLI flags; run these files in sequence)

A) (Optional) Build more data

Use this if you want symbols beyond the included BTCUSDT sample.

# 1) Download raw OHLC
python src/data/get_price.py

# 2) Normalize with rolling window (persists price_min/price_max per row)
python src/data/get_norm.py

# 3) Train tokenizer (K‑Means over [O,H,L,C,dir])
python src/data/train_tokenizer.py

# 4) Assign tokens
python src/data/get_tokens.py

Notes

Binance pair renames (e.g., SUSDT, POLUSDT) are already handled in the data scripts.
Normalization is reversible because price_min and price_max are stored for each row.

B) Train

python src/train/train.py

Configuration lives in src/train/config.py. Defaults: sequence_length=512, activation="swiglu", norm="rmsnorm", and RoPE enabled in the model.
Checkpoints are written under runs/.../checkpoints/ and auto‑resume on re‑run.
CSV logs go to logs/.

C) Test / visualize

python src/train/prediction_visualizer_gpt.py

Loads a checkpoint, reconstructs the exact normalization and tokenization steps, and plots predicted vs actual candles.

Architectural changes vs OpenAI GPT‑2

Area	OpenAI GPT‑2 (pretrained architecture)	ProphitGPT	Rationale
Positional encoding	Absolute learned position embeddings added to tokens	RoPE (rotary relative) ✅	Markets depend on relative distances in time. RoPE compares tokens by offset, reducing context drift and improving multi‑step consistency.
Normalization	LayerNorm	RMSNorm ✅	Fewer ops, stable at our widths, slightly faster.
MLP activation	GELU	SwiGLU ✅	Gated MLP improves expressivity per parameter; works well with RMSNorm/RoPE.
Attention	Causal self‑attention	Causal self‑attention (simplified) ✅	Kept causal decoder; simplified internals and weight init for stability.
Embedding tying	Optional (varies by impl)	Tied ✅	Fewer params; couples input/output spaces for token prediction.
Tokenization	BPE over text	K‑Means over 5D OHLC+dir ✅	Deterministic candle tokens that capture range/body shape.
Objective	Next‑token CE	Next‑token CE (optional tiny z‑loss) ✅	Same core objective; minimal regularization.
Positional length sensitivity	Brittle outside trained absolute indices	More robust with RoPE ✅	Relative encoding reduces brittleness when sliding windows.

✅ = used now

Why RoPE instead of absolute positions

Absolute positions force learning of fixed indices. When the window slides, the same pattern appears at a new index.
RoPE rotates queries/keys by a phase proportional to relative distance, so the same pattern displaced in time looks the same to attention.

Data and labels

Tokens: nearest‑center index from K‑Means over [open, high, low, close, direction] (float32). Direction encodes candle body sign/size, letting a smaller vocab represent candles well.
Normalization: rolling min–max (stochastic window) with per‑row price_min/price_max persisted for exact reversibility and zero look‑ahead.
Splits: date‑based boundaries are set in config.py (train_val_boundary, val_start, val_end, test_start).

From token → normalized candle → price candle

When the model outputs a token id k for the next step:

Lookup prototype (normalized) candle
vocab.json maps each token id to a prototype with average normalized OHLC plus a string direction label ("bullish", "bearish", or "neutral"). Example:
```
{
  "k": {
    "open": 0.41,
    "high": 0.55,
    "low": 0.33,
    "close": 0.47,
    "direction": "bullish"
  }
}
```
Use open/high/low/close from the prototype. The direction field in vocab.json is for readability and metrics; it is not used in denormalization.
If you need the numeric direction feature (close - open) for research, read it from the 5th column of kmeans_centers.npy.
Choose the denorm window
For step t → t+1, use the latest available row’s price_min_t and price_max_t (the same window the model saw for the input ending at t). This avoids look‑ahead.
Denormalize each OHLC field
Let Δ = price_max_t − price_min_t. Convert normalized x_norm ∈ [0,1] to price x_price via:
```
x_price = price_min_t + x_norm * Δ
```
Do this for open/high/low/close from the prototype to get a full price candle forecast.
Optional: probability‑weighted decoding
If you have the full softmax over tokens, form an expected normalized candle by weighting each prototype OHLC by its probability, then apply the same denorm step. This yields a smoother forecast than argmax sampling.

Notes on training defaults

sequence_length = 512 to match the visualizer and data windows.
Warmup: 1000 steps, then cosine decay with floor; LR 3e‑5 for stability.
Regularization off initially (dropouts 0) to validate the loop, then tune.
Mixed precision is opt‑in; defaults to fp32 for easier debugging.

Background and related work

GPT‑2 (OpenAI): decoder‑only LM with next‑token objective. We keep the objective and causal attention; we replace absolute positions with RoPE and use RMSNorm+SwiGLU.
Chronos‑T5 (Amazon): tokenized time‑series forecasting (univariate input). Here we tokenize OHLC+direction to match how traders reason about ranges/bodies.

Credit: GPT‑2 (OpenAI) and Chronos‑T5 informed the design choices here.

Disclaimer

ProphitGPT is a research project and does not provide financial advice. The model’s forecasts are experimental and not guaranteed to be accurate. Past performance, backtests, or examples do not guarantee future results.
Use at your own risk. You are solely responsible for any decisions made using this software and any data it produces. The authors and contributors assume no responsibility and are not liable for any losses or damages (direct, indirect, incidental, consequential, or otherwise) arising from use of this project.
This software is provided “as is” without warranty of any kind, express or implied. Do your own research, employ proper risk management, and comply with all applicable laws, regulations, and exchange/broker terms.

License

See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProphitGPT

What’s included

Setup

Run order (no CLI flags; run these files in sequence)

A) (Optional) Build more data

B) Train

C) Test / visualize

Architectural changes vs OpenAI GPT‑2

Why RoPE instead of absolute positions

Data and labels

From token → normalized candle → price candle

Notes on training defaults

Background and related work

Disclaimer

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
logs		logs
src		src
tokenizer		tokenizer
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

rsenatorov/ml-prophit-gpt

Folders and files

Latest commit

History

Repository files navigation

ProphitGPT

What’s included

Setup

Run order (no CLI flags; run these files in sequence)

A) (Optional) Build more data

B) Train

C) Test / visualize

Architectural changes vs OpenAI GPT‑2

Why RoPE instead of absolute positions

Data and labels

From token → normalized candle → price candle

Notes on training defaults

Background and related work

Disclaimer

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages