Skip to content

Releases: snowfluke/paperium

v2.1.0: Entry Price, UX Improvements & Performance Boost

31 Dec 17:12

Choose a tag to compare

What's New

Entry Price Support

  • Limit Order Entry Pricing: XGBoost now calculates optimal entry prices for limit orders (1x ATR below current price)
  • Market vs Limit Orders: Automatic order type detection based on confidence (≥85% = MARKET, <85% = LIMIT)
  • Entry Price Display: New Entry column in signals table showing recommended entry price with discount percentage
  • Accurate Allocation: Position sizing and SL/TP calculations now based on entry price, not current price

Display Improvements

  • Cleaner Table Layout: Separated SL, TP, Trail, Est Profit, and Est Loss into individual columns
  • Full Rupiah Format: Changed Est P/L from confusing millions notation (+0.17M) to full Rupiah (+Rp 170,000)
  • Removed Shares Column: Simplified display - allocation amount is more important than share count
  • Max Hold Days Reminder: Added configuration display showing 5-day time stop limit
  • Better Allocation Summary: Removed hardcoded "(3%)" text, now shows dynamic percentages

Performance Optimizations

  • Batch Database Loading: Reduced from ~80 individual queries per day to 1 batch query (10-12x faster)
  • Skip Held Positions: No longer rescans stocks already in portfolio during signal scanning
  • Faster Backtest: Ensemble backtest now completes in ~5-10 minutes instead of 50+ minutes

Progress Tracking

  • Real-time Stats: Replaced misleading time estimates with useful metrics (active positions, closed trades)
  • Current Date Display: Shows which date is being processed during backtest
  • Accurate Progress: Better visibility into backtest execution state

Technical Details

Modified Files

  • `ml/xgb_inference.py`: Added entry_pct calculation (0.0 for MARKET, 1.0x ATR for LIMIT)
  • `scripts/signals.py`: Entry price display, separated columns, full Rupiah format, max hold days
  • `scripts/eval_ensemble.py`: Batch loading, skip held tickers, real-time progress stats

Configuration

  • Max Hold Days: 5 days (from `config.ml.tbl_horizon`)
  • Entry Price Logic:
    • High confidence (≥85%): Enter at market price
    • Moderate confidence (<85%): Enter 1x ATR below current price

Upgrade Notes

All changes are backward compatible. Existing XGBoost models from v2.0.0 will work with this release.

Full Changelog

Entry Price

  • Calculate entry price based on order type (MARKET/LIMIT)
  • Use entry price for position sizing in signals and backtest
  • Display entry price with percentage discount in table

Display

  • Separate SL/TP/Trail into individual columns with percentages
  • Separate Est Profit/Est Loss into individual columns
  • Change format from "+0.17M" to "+Rp 170,000" for Indonesian users
  • Remove Shares column for cleaner table
  • Add "Max Hold Days: 5 days (time stop)" to configuration display
  • Fix allocation summary labels (remove hardcoded "3%")

Performance

  • Add `load_all_tickers_batch()` for single-query batch loading
  • Filter out already-held tickers in `scan_signals()`
  • Optimize database queries with window functions

Progress

  • Replace `TimeRemainingColumn` with real-time stats display
  • Show "Positions: X | Trades: Y" during backtest
  • Show current date being processed

Full Diff: v2.0.0...v2.1.0

Paperium v2.0.0 - LSTM Deep Learning Edition

31 Dec 12:55

Choose a tag to compare

Major architectural transition from traditional ML to Deep Learning for IHSG quantitative trading.

Overview

Paperium v2 represents a complete paradigm shift in our approach to stock prediction:

  • From: XGBoost + Hand-crafted Technical Indicators (RSI, MACD, Bollinger Bands)
  • To: PyTorch LSTM + Raw OHLCV Sequences

This change is based on the hypothesis that neural networks can learn better feature representations directly from raw price data than human-engineered indicators.

Architecture Changes

Model

  • New: 2-layer LSTM (Hidden Size: 8)
    • Input: 100-day sequences of raw OHLCV data
    • Output: 3-class classification (Loss/Neutral/Profit)
  • Old: XGBoost with 20+ hand-crafted features

Labeling System

  • New: Triple Barrier Method (TBL)
    • ±3% price barriers with 5-day holding horizon
    • Path-dependent classification
    • Class 0: Hit stop-loss (-3%)
    • Class 1: Time expired (neutral)
    • Class 2: Hit take-profit (+3%)
  • Old: Simple close-to-close returns

Feature Engineering

  • New: Raw OHLCV normalization only
    • Price: Normalized relative to first day of window
    • Volume: Log-normalized
  • Old: 20+ technical indicators (RSI, MACD, ATR, BB, OBV, etc.)

New Features

Signal Generation & Allocation

  • Confidence-Weighted Capital Allocation: Higher confidence signals automatically receive proportionally larger allocations
    • Formula: allocation_i = total_capital × (confidence_i / sum_of_confidences)
  • Blacklist Filtering: Automatically excludes 72 illiquid/suspended stocks
  • Flexible Output Modes: Show all signals or only allocated positions with P/L estimates
  • Live Data Fetching: --fetch-latest flag to pull current market data from Yahoo Finance

Performance Optimizations

  • Sequence Caching: 45x speedup on training data preparation
    • First run: ~45 seconds (957 tickers)
    • Subsequent runs: ~1 second (cache hit)
    • Intelligent cache invalidation based on DB version and config changes
  • Batch Progress Tracking: Real-time updates every 10 batches during training

User Experience

  • Timestamped Logging: All scripts now show [MM:SS | +Δs] timestamps
  • Training Dashboard: Live batch-level progress (127/200 (63%))
  • Fresh vs Retrain: Choose to start new model or continue from checkpoint
  • Interactive CLI: Rich terminal UI with tables, panels, and progress bars

Breaking Changes

Removed Features

  • Portfolio management system (max positions, owned stocks, slots)
  • All technical indicator calculations
  • XGBoost model and dependencies
  • morning_signals.py (replaced by signals.py)

File Changes

  • morning_signals.pysignals.py
  • Removed: ml/ensemble.py, ml/meta_labeling.py, portfolio/, strategy/
  • Added: utils/logger.py for timestamped logging

API Changes

  • signals.py now requires:
    • --capital <amount> (optional): Total capital to allocate
    • --num-stock <n> (optional): Number of stocks to buy
    • --fetch-latest (optional): Fetch current market data
  • train.py now supports:
    • --retrain: Continue from best_lstm.pt
    • --epochs <n>: Custom epoch count

Migration Guide

For Users of Gen 4 (XGBoost Version)

The XGBoost-based system is still available in the Git history. To access it:

git checkout 388d491  # Last XGBoost commit

Upgrading to v2

  1. Pull latest code: git pull origin main
  2. Install dependencies: uv sync
  3. Train new LSTM model: python run.py → Option 2
  4. Generate signals: python run.py → Option 1

Performance Metrics

Based on 2024-01-01 to 2025-09-30 backtest:

  • Model Accuracy: ~60% on validation set
  • Win Rate: 50-55% (signals that hit +3% target)
  • Training Time: ~5-10 minutes (with cache)
  • Inference Speed: ~5-10 seconds for full universe

Technical Details

Dependencies

  • PyTorch 2.x (MPS/CUDA/CPU auto-detection)
  • Rich library for terminal UI
  • scikit-learn for metrics
  • yfinance for data fetching

Data Pipeline

  1. Fetch daily OHLCV from Yahoo Finance
  2. Store in SQLite with indexing
  3. Generate 100-day rolling sequences
  4. Apply Triple Barrier Labeling
  5. Train LSTM with early stopping

Configuration

All parameters in config.py:

  • Window size: 100 days
  • TBL horizon: 5 days
  • TBL barrier: 3.0%
  • Batch size: 64
  • Learning rate: 0.001
  • Hidden size: 8 (2 layers)

Research Reference

This implementation is inspired by research on Triple Barrier Labeling and meta-labeling:
https://arxiv.org/pdf/2504.02249v1

Contributors

Special thanks to all contributors who helped test and refine this major release.


Full Changelog: v1.0.0...v2.0.0