Bitcoin Prediction

End-to-end pipeline for forecasting Bitcoin’s next-day close. The project ingests market data, engineers technical + macro features, trains a small ensemble of classical models, and surfaces predictions in both a CLI and a Streamlit dashboard (with optional LLM commentary).

Highlights

Multi-source ingestion: OHLCV via ccxt, Google News sentiment with optional kk08/CryptoBERT, and US Treasury yield curves (Fiscal Data API).
Consistent feature store: rolling stats, RSI, volatility, momentum, interest rate pivots, and day-ahead sentiment alignment.
Model ensemble: Linear Regression, Ridge, Random Forest, and XGBoost with shared scaler + feature list artifacts.
Streamlit dashboard: run forecasts, trigger fresh ingestion, compare model outputs, chart price history in UTC or Pacific time, and request Gemini commentary with citations.
Reproducible CLI + notebooks: train/predict scripts, analysis notebooks, and timestamped data/ + models/ directories for every run.
Quantitative Performance: Validation across multiple models using Mean Absolute Error (MAE)
Interpretability: Dives into model feature influence using SHAP
Model Fit: Visual comparison showing how the best-performing models track the actual historical close price.

Getting Started

1. Create an environment

conda env create -f environment.yml
conda activate crypto_env
# or
python -m venv .venv
.venv\Scripts\activate  # Windows
pip install -r requirements.txt

2. Configure secrets (optional)

Create .env in the repo root:

GEMINI_API_KEY=your-gemini-key

Gemini is only required if you want AI commentary inside Streamlit.

Streamlit Dashboard

streamlit run streamlit_app.py

Key workflow:

Pick a saved models/ run; paths auto-populate.
Choose an existing data/ snapshot or toggle “Ingest fresh data (~10-15 min)” to pull a new 20-day window.
Adjust the history slider and timezone (UTC or Pacific with DST) for the chart + preview table.
Run the forecast. Results persist in-session so you can tweak settings without re-running the pipeline.
(Optional) Toggle AI prediction and commentary to call Gemini; citations are appended automatically via Google Search tools.

Outputs include:

Summary metrics (latest close, ensemble average, model spread).
Chart with historical closes plus the Linear Regression prediction marker.
Downloadable prediction table and feature preview with both UTC and PST timestamps.

CLI Usage

The CLI mirrors the dashboard but is script-friendly.

Train (fresh ingest + save artifacts)

python main.py train --ingest --save-models \
    --lookback-days 1095 --hours 26280 --ir-lookback-days 1095

Predict with fresh ingestion

python main.py predict --models-dir models/20251112_170018 --ingest

Predict from curated CSV paths

#refer to train_with_training_data.ipynb

Repository Tour

Path	Purpose
`main.py`	CLI entrypoint (train, predict, forecast)
`streamlit_app.py`	Dashboard with ingestion toggle, charting, downloads, Gemini commentary
`main_scripts/train.py`	Preprocess, engineer features, train/evaluate, persist artifacts
`main_scripts/test.py`	Feature rebuild + model inference
`helpers/data_ingestation.py`	Independent quant/news/interest ingestion commands
`helpers/feature_engineering.py`	RSI, lags, rolling stats, volatility, momentum features
`helpers/llm_support.py`	Prompt builder + citation helper for Gemini
`helpers/queries.py`	Default news/search query lists
`analysis/*.ipynb`	Evaluation visuals (`plots_for_readers.ipynb`, `true_vs_predicted.ipynb`, etc.)
`standalone_training/`	Sandbox scripts/notebooks for reproducible experiments
`data/{timestamp}/`	Saved CSV snapshots (quant, sentiment, interest)
`models/{timestamp}/`	Serialized models (`mlr`, `ridge`, `rf`, `xgb`), scaler, feature list

Data + Artifacts

Every ingest run creates:

data/{YYYYMMDD_HHMMSS}/quant/quant_bitcoin_test_*.csv
data/{YYYYMMDD_HHMMSS}/sentiment/google_news_sentiment_*.csv
data/{YYYYMMDD_HHMMSS}/interest/interest_rates_test_*.csv

Training saves to:

models/{YYYYMMDD_HHMMSS}/mlr_model.joblib
models/{YYYYMMDD_HHMMSS}/ridge_model.joblib
models/{YYYYMMDD_HHMMSS}/rf_model.joblib
models/{YYYYMMDD_HHMMSS}/xgb_model.joblib
models/{YYYYMMDD_HHMMSS}/scaler.joblib
models/{YYYYMMDD_HHMMSS}/feature_list.joblib

The Streamlit app and CLI auto-discover these directories, so keeping timestamped folders untouched preserves reproducibility.

Troubleshooting

Sentiment model missing: if kk08/CryptoBERT cannot be loaded, sentiment defaults to 0.0; warnings are logged.
Feature guard: predictions need at least 15 historical rows after preprocessing; otherwise the pipeline aborts early.
Gemini errors: ensure GEMINI_API_KEY is set; the dashboard will gracefully disable commentary when unavailable.
Widget deprecations: the app already uses the new width parameter (replacing use_container_width) to stay compatible with Streamlit >= 1.40.

Notebooks

analysis/plots_for_readers.ipynb – curated visuals for presentations/blog posts.
analysis/true_vs_predicted.ipynb – compares model forecasts vs actual closes.
next_day_forecast_llm.ipynb – scripted forecast run with LLM prompt generation.
standalone_training/train_with_training_data.ipynb – reproducible training experiments outside the CLI.

Roadmap Ideas

Expand model zoo (CatBoost/LightGBM, LSTM or Transformer for Hourly/Minute Predictions).
Deploy Streamlit as a scheduled Cloud Run/Spaces app.
Add automated backtesting metrics and alerting hooks.

helpers/llm_support.py

get_prompt(predictions, yesterdays_close): structured forecasting prompt for LLM
add_citations(response): attach citation links if metadata available

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bitcoin Prediction

Highlights

Getting Started

1. Create an environment

2. Configure secrets (optional)

Streamlit Dashboard

CLI Usage

Train (fresh ingest + save artifacts)

Predict with fresh ingestion

Predict from curated CSV paths

Repository Tour

Data + Artifacts

Troubleshooting

Notebooks

Roadmap Ideas

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
analysis		analysis
archive		archive
helpers		helpers
logs		logs
main_scripts		main_scripts
models		models
notes		notes
standalone_training		standalone_training
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
main.py		main.py
next_day_forecast_llm.ipynb		next_day_forecast_llm.ipynb
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

Bitcoin Prediction

Highlights

Getting Started

1. Create an environment

2. Configure secrets (optional)

Streamlit Dashboard

CLI Usage

Train (fresh ingest + save artifacts)

Predict with fresh ingestion

Predict from curated CSV paths

Repository Tour

Data + Artifacts

Troubleshooting

Notebooks

Roadmap Ideas

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages