Skip to content

samluxenberg1/end-to-end-mlb-betting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README in progress...

📌 Project Title

End-to-End MLB Moneyline Betting System

📖 Table of Contents

  1. Overview
  2. Project Structure
  3. Data Sources
  4. ETL Pipeline
  5. Database Schema
  6. Data Validation & Quality Checks
  7. Modeling Pipeline
  8. How to Run
  9. Environment Setup
  10. Future Work
  11. Acknowledgments
  12. License

🧠 Overview

  • What does the project do? This project aims to create a production-level (or close to production level) system to aid in MLB moneyline betting. As a result, I hope that this project becomes fully self-contained from data storage to ETL/ELT, modeling, backtesting, monitoring, and deploying. The initial version of this project will be completed within a span of 2 months, so while I don't expect the modeling to be optimal, I hope to get something "good enough" while standing up the deployment aspect. Additional model refinement can come later.

While this is a personal project, which I plan to test out with "skin in the game", I suspect that the insights from this work will be useful for teaching (I teach "Data Science for Sports" at the GW School of Business).

Another aim of this project is to gain additional experience in Docker, ETL/ELT, and deployment to supplement my current role as a data scientist.

🗂️ Project Structure

end-to-end-mlb-betting/
├── data/
│   ├── raw_games.csv
│   ├── raw_team_stats.csv
│   └── raw_player_stats.csv
├── dags/
│   ├── airflow_dag.py
├── db/
│   ├── schema.sql
│   ├── data_validation_checks.sql
├── docker/
├── etl/
│   ├── utils.py
│   ├── extract_games.py
│   ├── extract_team_stats.py
│   ├── extract_player_stats.py
│   ├── extract_odds.py
│   ├── load_to_db.py
│   ├── transform_games_clean.py
│   └── update_all_data.py
├── models/
│   ├── evaluate.py
│   ├── train_model.py
├── serving/
│   ├── api.py
│   ├── Dockerfile
├── tests/
    ├── test_features.py
├── tracking/
│   ├── mlflow_config
├── validate/
│   └── run_data_checks.py

⚾ Data Sources

  • MLBStats API
  • See documentation here: ...

🔁 ETL Pipeline

  • Break into steps: extract, transform, load
    • Note that to get updated data, run python -m src.etl.update_all_data
  • Briefly describe each stage, CLI args, file outputs.

🧱 Database Schema

Diagrams and/or descriptions of your tables: games, team stats, player stats, etc.

✅ Data Validation & Quality Checks

Describe the validation logic, examples of checks, and how to run them

📈 Modeling Pipeline

Overview of modeling workflow: features, target variable, cross-validation approach

🚀 How to Run

Commands to run: - historical extract - load to DB - validation - modeling Include CLI examples and notes

🛠️ Environment Setup

Required packages, Python version, virtualenv/conda, Docker (if used)

🔮 Future Work

Ideas for extending the pipeline or improving the model (e.g., lineup changes, player absences)

🙏 Acknowledgments

Credits to data providers, libraries, or collaborators

📄 License

If public, state license type.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors