GPT‑style next‑token forecasting for crypto OHLC sequences. Tokens are produced by K‑Means over 5D [open, high, low, close, direction = close−open]. The model is a lean GPT‑2–like decoder upgraded with RoPE (relative positions), RMSNorm, and SwiGLU.
- A minimal, ready‑to‑train dataset for BTCUSDT:
data/price/btcusdt.csvdata/norm/btcusdt.csv(normalized OHLC plusprice_min/price_maxper row; reversible)data/tokens/btcusdt.csv
- Tokenizer assets (generated by
src/data/train_tokenizer.py):
tokenizer/kmeans_centers.npy,kmeans_meta.json,kmeans_model.joblib,vocab.json
With these files you can train immediately—no flags, just run the scripts below in order.
# 1) Create a fresh environment (recommended)
python -m venv .venv
# Windows
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate
# 2) Install dependencies
pip install -r requirements.txt
# 3) (GPU on Windows) Install a CUDA build of PyTorch (example: CUDA 12.6 wheels)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126Quick check:
python - <<'PY'
import torch
print("cuda.is_available:", torch.cuda.is_available())
if torch.cuda.is_available(): print("device:", torch.cuda.get_device_name(0))
PYUse this if you want symbols beyond the included BTCUSDT sample.
# 1) Download raw OHLC
python src/data/get_price.py
# 2) Normalize with rolling window (persists price_min/price_max per row)
python src/data/get_norm.py
# 3) Train tokenizer (K‑Means over [O,H,L,C,dir])
python src/data/train_tokenizer.py
# 4) Assign tokens
python src/data/get_tokens.pyNotes
- Binance pair renames (e.g.,
SUSDT,POLUSDT) are already handled in the data scripts. - Normalization is reversible because
price_minandprice_maxare stored for each row.
python src/train/train.py- Configuration lives in
src/train/config.py. Defaults:sequence_length=512,activation="swiglu",norm="rmsnorm", and RoPE enabled in the model. - Checkpoints are written under
runs/.../checkpoints/and auto‑resume on re‑run. - CSV logs go to
logs/.
python src/train/prediction_visualizer_gpt.py- Loads a checkpoint, reconstructs the exact normalization and tokenization steps, and plots predicted vs actual candles.
| Area | OpenAI GPT‑2 (pretrained architecture) | ProphitGPT | Rationale |
|---|---|---|---|
| Positional encoding | Absolute learned position embeddings added to tokens | RoPE (rotary relative) ✅ | Markets depend on relative distances in time. RoPE compares tokens by offset, reducing context drift and improving multi‑step consistency. |
| Normalization | LayerNorm | RMSNorm ✅ | Fewer ops, stable at our widths, slightly faster. |
| MLP activation | GELU | SwiGLU ✅ | Gated MLP improves expressivity per parameter; works well with RMSNorm/RoPE. |
| Attention | Causal self‑attention | Causal self‑attention (simplified) ✅ | Kept causal decoder; simplified internals and weight init for stability. |
| Embedding tying | Optional (varies by impl) | Tied ✅ | Fewer params; couples input/output spaces for token prediction. |
| Tokenization | BPE over text | K‑Means over 5D OHLC+dir ✅ | Deterministic candle tokens that capture range/body shape. |
| Objective | Next‑token CE | Next‑token CE (optional tiny z‑loss) ✅ | Same core objective; minimal regularization. |
| Positional length sensitivity | Brittle outside trained absolute indices | More robust with RoPE ✅ | Relative encoding reduces brittleness when sliding windows. |
✅ = used now
- Absolute positions force learning of fixed indices. When the window slides, the same pattern appears at a new index.
- RoPE rotates queries/keys by a phase proportional to relative distance, so the same pattern displaced in time looks the same to attention.
- Tokens: nearest‑center index from K‑Means over
[open, high, low, close, direction](float32). Direction encodes candle body sign/size, letting a smaller vocab represent candles well. - Normalization: rolling min–max (stochastic window) with per‑row
price_min/price_maxpersisted for exact reversibility and zero look‑ahead. - Splits: date‑based boundaries are set in
config.py(train_val_boundary,val_start,val_end,test_start).
When the model outputs a token id k for the next step:
-
Lookup prototype (normalized) candle
vocab.jsonmaps each token id to a prototype with average normalized OHLC plus a string direction label ("bullish","bearish", or"neutral"). Example:{ "k": { "open": 0.41, "high": 0.55, "low": 0.33, "close": 0.47, "direction": "bullish" } }Use
open/high/low/closefrom the prototype. Thedirectionfield invocab.jsonis for readability and metrics; it is not used in denormalization.
If you need the numeric direction feature (close - open) for research, read it from the 5th column ofkmeans_centers.npy. -
Choose the denorm window
For stept → t+1, use the latest available row’sprice_min_tandprice_max_t(the same window the model saw for the input ending att). This avoids look‑ahead. -
Denormalize each OHLC field
LetΔ = price_max_t − price_min_t. Convert normalizedx_norm ∈ [0,1]to pricex_pricevia:x_price = price_min_t + x_norm * ΔDo this for open/high/low/close from the prototype to get a full price candle forecast.
-
Optional: probability‑weighted decoding
If you have the full softmax over tokens, form an expected normalized candle by weighting each prototype OHLC by its probability, then apply the same denorm step. This yields a smoother forecast than argmax sampling.
sequence_length = 512to match the visualizer and data windows.- Warmup: 1000 steps, then cosine decay with floor; LR
3e‑5for stability. - Regularization off initially (dropouts 0) to validate the loop, then tune.
- Mixed precision is opt‑in; defaults to fp32 for easier debugging.
- GPT‑2 (OpenAI): decoder‑only LM with next‑token objective. We keep the objective and causal attention; we replace absolute positions with RoPE and use RMSNorm+SwiGLU.
- Chronos‑T5 (Amazon): tokenized time‑series forecasting (univariate input). Here we tokenize OHLC+direction to match how traders reason about ranges/bodies.
Credit: GPT‑2 (OpenAI) and Chronos‑T5 informed the design choices here.
- ProphitGPT is a research project and does not provide financial advice. The model’s forecasts are experimental and not guaranteed to be accurate. Past performance, backtests, or examples do not guarantee future results.
- Use at your own risk. You are solely responsible for any decisions made using this software and any data it produces. The authors and contributors assume no responsibility and are not liable for any losses or damages (direct, indirect, incidental, consequential, or otherwise) arising from use of this project.
- This software is provided “as is” without warranty of any kind, express or implied. Do your own research, employ proper risk management, and comply with all applicable laws, regulations, and exchange/broker terms.
See LICENSE.
