Skip to content

PDDhillon/premier-league-prediction-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

premier-league-prediction-model (WIP)

Football is a simple sport with very complex variables that can affect the game. A teams history of goals for and against may not tell the full story of how a team has either won or lost a game. With such complex statistics such as xG and possession and intangible affects such as home advantage, football is a very complex problem to solve. The goal of this project is to define the architecture/train a neural network to perform binary classification and accurately predict the outcomes of football matches. This will culminate in running a trained model for the current Premier League season to accurately predict results and the final table standings.

Frameworks used: pandas, BeautifulSoup, PyTorch

Web Scraping

A neural network is only as powerful as the data that is fed into it. The first task was to create a web scraper to retrieve the potential features that would help train the neural network. Thanfully, fbref provide comprehensive statistics on historical football matches, across a multitude of competitions. Using requests and BeautifulSoup in tandem, the pages html was parsed and filtered to provide us with the html table representation of a single teams data for a single season. This data was then concatenated into a pandas dataframe, providing the training match data. This can be seen inside of the web_scraping folder. Specificallyscrape_match_data.ipynb and matches.csv. Initially, the last 3 seasons of data was used, purely for speed of training on the intial architecture of the model.

Data pre-processing

Once the data was retrieved inital Exploratory Data Analysis was performed to analyse the data and ascertain potential features that could be used to train the neural network. The inital 6 features decided on were: Venue Code (Home/Away), Opponent Code (categorical numeric for each team), the hour of day the match was played, the day of the week the match was played and the goals for and against. These 6 data points were chosen to provide a baseline of features, so as to create an initial architecture. Using pandas, the data was converted to be numeric, as required by the neural network. This can be found in the match_prediction_model folder. Specifically prediction.ipynb.

About

Football is a simple sport with very complex variables that can affect the game. The goal of this project is to define the architecture and train a neural network to accurately predict the outcomes of football matches.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors