Idea: create machine learning models with different algorithms (ANN with deep learning, ANN, decisiton trees, random forest used) and compare result depending on data amount used.
Application of tensorflow on higgs data. Based on:
Run step by step:
- Clone repository
- You need to download higgs data (HIGGS.csv.gz) and put it to directory ./data/ Download is possible using downloadHiggs.sh or you can do it manually.
- Run main.py using: 'python main.py low' or 'python main.py high'. Low stands for the low level (21 features) features from dataset, high for high leve features (7 features).
- Optionally run 'python plot_all.py' to plot everything in pdf file.
In Configuration.py there's 'HIGGS_FRACS' array. It says what fractions of higgs data are going to be used for learning and evaluating. Each run of main.py will loop over HIGGS_FRACS array. For the first run subsets of HIGGS dataset will be created and saved into data as .npy files. As a consequence, first run for given fraction requires more RAM memory.
Results are saved in results_low and results_high. Script saves dictionaries with results but also plots all resutls seperately. In order to plot everything to one file so as to compare side by side you can run 'python plot_all.py'. It uses Configuration.py file with higgs_frac array.