This project implements a Monte Carlo simulation for baseball games to predict win probabilities between two teams (Chicago Cubs vs. Chicago White Sox). The simulation uses player statistics (Batting Average, On-Base Percentage, Slugging Percentage for batters; ERA for pitchers) loaded from CSV data files. It runs a large number of simulated games (e.g., 10,000) to derive statistical outcomes.
The primary outputs include:
- Text-based win percentages for each team.
- A comparative bar chart showing the score distribution for both teams.
- A line chart illustrating the cumulative win percentage for each team as more simulations are run.
This project was developed as part of the "Discussion 8 | Monte Carlo Simulation" assignment, emphasizing code organization, data management, and project management practices, including collaboration with AI (Gemini).
The repository is organized as follows:
-
baseball-simulation-project/
README.md
(This file)documentation_and_project_management_artifacts/
Activity_List.md
Functional_Specs.md
Product_Backlog.md
Roadmap.md
Status_Log.md
Work_Breakdown_Structure.md
functional_code/
baseball_simulation.py
prepared_data/
cubs_standard_batting_clean.csv
cubs_standard_pitching_clean.csv
whitesox_standard_batting_clean.csv
whitesox_standard_pitching_clean.csv
results/
comparative_score_distribution.png
cumulative_win_percentage.png
-
/functional_code/
: Contains the Python script (baseball_simulation.py
) that runs the simulation. -
/prepared_data/
: Contains the cleaned CSV data files for player and pitcher statistics used by the simulation. These files were sourced from thepitch-by-pitch-pro
repository. -
/documentation_and_project_management_artifacts/
: Houses all project planning and documentation. See links below for details. -
/results/
: Stores the visual outputs (plots) generated by the simulation. -
README.md
: Provides an overview of the project (this file).
- Prerequisites: Ensure you have Python installed, along with the
pandas
,matplotlib
, andnumpy
libraries.pip install pandas matplotlib numpy
- Clone the repository (if you haven't already):
git clone https://github.com/hongyu-liao/baseball-simulation-project cd baseball-simulation-project
- Ensure Data is Present: The
prepared_data
folder should contain the four necessary CSV files:cubs_standard_batting_clean.csv
cubs_standard_pitching_clean.csv
whitesox_standard_batting_clean.csv
whitesox_standard_pitching_clean.csv
- Run the Simulation: Navigate to the
functional_code
directory and execute the Python script:cd functional_code python baseball_simulation.py
- View Results:
- Text output showing win percentages will be printed to the console.
- Image files (
comparative_score_distribution.png
andcumulative_win_percentage.png
) will be saved in theresults
directory (which will be created in thefunctional_code
directory if it doesn't exist there, or you can adjust the save path in the script to point to the rootresults/
folder).
Detailed project management artifacts provide insight into the planning, execution, and collaboration involved in this project.
- Functional Specifications: Describes user stories, requirements, and acceptance criteria.
- Work Breakdown Structure (WBS): Outlines the major tasks and subtasks.
- Product Backlog: Lists prioritized features and tasks.
- Status Log: Tracks project progress, decisions, and issues over time.
- Activity List: Provides a timestamped log of development activities.
- Roadmap: Shows a high-level timeline of milestones.
Generative AI (Gemini) was utilized as a co-developer in this project to assist with:
- Initial code structure and boilerplate generation.
- Drafting and refining project management documentation.
- Brainstorming solutions and debugging code.
- Generating code snippets for specific functionalities (e.g., plotting).
Details of this collaboration are reflected in the Status Log and Activity List.
Group Members:
- Hongyu Liao
- Yiwei Li
- Ziyang Huang