This is the project webpage for MSCS 5610 Data Mining (Spring 2019) Team GottaGoFast.
For this project we are exploring the dataset from Formula 1 racing available from http://ergast.com/mrd/. The Ergast Developer API is an experimental web service which provides a historical record of motor racing data for non-commercial purposes. The API provides data for the Formula One series, from the beginning of the world championships in 1950. The data was originally gathered and published to the public domain by Chris Newell. Formula One (also Formula 1 or F1 and officially the FIA Formula One World Championship) is the highest class of single-seat auto racing that is sanctioned by the Fédération Internationale de l'Automobile (FIA). The FIA Formula One World Championship has been one of the premier forms of racing around the world since its inaugural season in 1950.
This dataset contains data from 1950 all the way through the 2018 season, and consists of tables describing race results, constructors results, constructors, race drivers, lap times, pit stops, qulification results, and many more as per the scheme provided at http://ergast.com/schemas/f1db_schema.txt
As the project proceeds, this page will be updated. The repository contain the original data download and additional folders include various stages of the Data Mining process as utilized with this data set.
The exploratory data analysis Jupyter notebook can be see under the "Reports and Reading" folder or at: https://github.com/dvermagithub/GottaGoFast/blob/master/Reports%20and%20Reading/F1%20Exploratory%20Analysis%20Full%20Data.ipynb
The midway progress report is located at under the "Report and Reading" folder or at: https://github.com/dvermagithub/GottaGoFast/blob/master/Reports%20and%20Reading/Midway_Report_GottaGoFast-41409.docx
Classfication Analayis which included Linear Regression, Decision Tree - Entroty, Decision Tree - Gini, Naive Bayes, MLP and KNN was performed on the data. Additionally Clustering analysis was also performed. The results are located as follows: Classification using various algorithms (by Deepak) - https://github.com/dvermagithub/GottaGoFast/blob/master/04-%20Clustering%20and%20Classification%20Analysis/F1%20-%20Supervised%20Learning.ipynb
KNN Classification (by Lezeh) - https://github.com/dvermagithub/GottaGoFast/blob/master/04-%20Clustering%20and%20Classification%20Analysis/KNN%20Classification%20Model%20with%20results_full.ipynb
Clustering (by Zack) - https://github.com/dvermagithub/GottaGoFast/blob/master/04-%20Clustering%20and%20Classification%20Analysis/F1%20Clustering%20Analysis%20with%20results_full_4v3.ipynb
The final presentation that outlines the obversation and conclusion is included in the folder named '05- Final Presentation and Report" along with the final project report.