Skip to content

GPT‑style next‑token forecasting for crypto OHLC sequences. Tokens are produced by K‑Means over 5D [open, high, low, close, direction = close−open]. The model is a lean GPT‑2–like decoder upgraded with RoPE (relative positions), RMSNorm, and SwiGLU.

License

Notifications You must be signed in to change notification settings

rsenatorov/ml-prophit-gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProphitGPT

GPT‑style next‑token forecasting for crypto OHLC sequences. Tokens are produced by K‑Means over 5D [open, high, low, close, direction = close−open]. The model is a lean GPT‑2–like decoder upgraded with RoPE (relative positions), RMSNorm, and SwiGLU.

example_prophit


What’s included

  • A minimal, ready‑to‑train dataset for BTCUSDT:
    • data/price/btcusdt.csv
    • data/norm/btcusdt.csv (normalized OHLC plus price_min/price_max per row; reversible)
    • data/tokens/btcusdt.csv
  • Tokenizer assets (generated by src/data/train_tokenizer.py):
    tokenizer/kmeans_centers.npy, kmeans_meta.json, kmeans_model.joblib, vocab.json

With these files you can train immediately—no flags, just run the scripts below in order.


Setup

# 1) Create a fresh environment (recommended)
python -m venv .venv
# Windows
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate

# 2) Install dependencies
pip install -r requirements.txt

# 3) (GPU on Windows) Install a CUDA build of PyTorch (example: CUDA 12.6 wheels)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Quick check:

python - <<'PY'
import torch
print("cuda.is_available:", torch.cuda.is_available())
if torch.cuda.is_available(): print("device:", torch.cuda.get_device_name(0))
PY

Run order (no CLI flags; run these files in sequence)

A) (Optional) Build more data

Use this if you want symbols beyond the included BTCUSDT sample.

# 1) Download raw OHLC
python src/data/get_price.py

# 2) Normalize with rolling window (persists price_min/price_max per row)
python src/data/get_norm.py

# 3) Train tokenizer (K‑Means over [O,H,L,C,dir])
python src/data/train_tokenizer.py

# 4) Assign tokens
python src/data/get_tokens.py

Notes

  • Binance pair renames (e.g., SUSDT, POLUSDT) are already handled in the data scripts.
  • Normalization is reversible because price_min and price_max are stored for each row.

B) Train

python src/train/train.py
  • Configuration lives in src/train/config.py. Defaults: sequence_length=512, activation="swiglu", norm="rmsnorm", and RoPE enabled in the model.
  • Checkpoints are written under runs/.../checkpoints/ and auto‑resume on re‑run.
  • CSV logs go to logs/.

C) Test / visualize

python src/train/prediction_visualizer_gpt.py
  • Loads a checkpoint, reconstructs the exact normalization and tokenization steps, and plots predicted vs actual candles.

Architectural changes vs OpenAI GPT‑2

Area OpenAI GPT‑2 (pretrained architecture) ProphitGPT Rationale
Positional encoding Absolute learned position embeddings added to tokens RoPE (rotary relative) Markets depend on relative distances in time. RoPE compares tokens by offset, reducing context drift and improving multi‑step consistency.
Normalization LayerNorm RMSNorm Fewer ops, stable at our widths, slightly faster.
MLP activation GELU SwiGLU Gated MLP improves expressivity per parameter; works well with RMSNorm/RoPE.
Attention Causal self‑attention Causal self‑attention (simplified) ✅ Kept causal decoder; simplified internals and weight init for stability.
Embedding tying Optional (varies by impl) Tied Fewer params; couples input/output spaces for token prediction.
Tokenization BPE over text K‑Means over 5D OHLC+dir Deterministic candle tokens that capture range/body shape.
Objective Next‑token CE Next‑token CE (optional tiny z‑loss) ✅ Same core objective; minimal regularization.
Positional length sensitivity Brittle outside trained absolute indices More robust with RoPE ✅ Relative encoding reduces brittleness when sliding windows.

✅ = used now

Why RoPE instead of absolute positions

  • Absolute positions force learning of fixed indices. When the window slides, the same pattern appears at a new index.
  • RoPE rotates queries/keys by a phase proportional to relative distance, so the same pattern displaced in time looks the same to attention.

Data and labels

  • Tokens: nearest‑center index from K‑Means over [open, high, low, close, direction] (float32). Direction encodes candle body sign/size, letting a smaller vocab represent candles well.
  • Normalization: rolling min–max (stochastic window) with per‑row price_min/price_max persisted for exact reversibility and zero look‑ahead.
  • Splits: date‑based boundaries are set in config.py (train_val_boundary, val_start, val_end, test_start).

From token → normalized candle → price candle

When the model outputs a token id k for the next step:

  1. Lookup prototype (normalized) candle
    vocab.json maps each token id to a prototype with average normalized OHLC plus a string direction label ("bullish", "bearish", or "neutral"). Example:

    {
      "k": {
        "open": 0.41,
        "high": 0.55,
        "low": 0.33,
        "close": 0.47,
        "direction": "bullish"
      }
    }

    Use open/high/low/close from the prototype. The direction field in vocab.json is for readability and metrics; it is not used in denormalization.
    If you need the numeric direction feature (close - open) for research, read it from the 5th column of kmeans_centers.npy.

  2. Choose the denorm window
    For step t → t+1, use the latest available row’s price_min_t and price_max_t (the same window the model saw for the input ending at t). This avoids look‑ahead.

  3. Denormalize each OHLC field
    Let Δ = price_max_t − price_min_t. Convert normalized x_norm ∈ [0,1] to price x_price via:

    x_price = price_min_t + x_norm * Δ
    

    Do this for open/high/low/close from the prototype to get a full price candle forecast.

  4. Optional: probability‑weighted decoding
    If you have the full softmax over tokens, form an expected normalized candle by weighting each prototype OHLC by its probability, then apply the same denorm step. This yields a smoother forecast than argmax sampling.


Notes on training defaults

  • sequence_length = 512 to match the visualizer and data windows.
  • Warmup: 1000 steps, then cosine decay with floor; LR 3e‑5 for stability.
  • Regularization off initially (dropouts 0) to validate the loop, then tune.
  • Mixed precision is opt‑in; defaults to fp32 for easier debugging.

Background and related work

  • GPT‑2 (OpenAI): decoder‑only LM with next‑token objective. We keep the objective and causal attention; we replace absolute positions with RoPE and use RMSNorm+SwiGLU.
  • Chronos‑T5 (Amazon): tokenized time‑series forecasting (univariate input). Here we tokenize OHLC+direction to match how traders reason about ranges/bodies.

Credit: GPT‑2 (OpenAI) and Chronos‑T5 informed the design choices here.


Disclaimer

  • ProphitGPT is a research project and does not provide financial advice. The model’s forecasts are experimental and not guaranteed to be accurate. Past performance, backtests, or examples do not guarantee future results.
  • Use at your own risk. You are solely responsible for any decisions made using this software and any data it produces. The authors and contributors assume no responsibility and are not liable for any losses or damages (direct, indirect, incidental, consequential, or otherwise) arising from use of this project.
  • This software is provided “as is” without warranty of any kind, express or implied. Do your own research, employ proper risk management, and comply with all applicable laws, regulations, and exchange/broker terms.

License

See LICENSE.

About

GPT‑style next‑token forecasting for crypto OHLC sequences. Tokens are produced by K‑Means over 5D [open, high, low, close, direction = close−open]. The model is a lean GPT‑2–like decoder upgraded with RoPE (relative positions), RMSNorm, and SwiGLU.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages