UNM CS 429/529 Machine Learning Project 1: Decision Trees
Details about this project can be found on Kaggle
NOTE: This code will work with either python 2 or python 3.
The main entry point for this project is dtree.py. Use the -h flag from any command to see help:
>>> python dtree.py -h
usage: dtree.py [-h] [--training-data TRAINING_DATA]
[--testing-data TESTING_DATA]
[--classification-file CLASSIFICATION_FILE] [--gini-index]
[--alpha ALPHA]
Creates decision trees based on a given data file.
optional arguments:
-h, --help show this help message and exit
--training-data TRAINING_DATA
Path to the training data file.
--testing-data TESTING_DATA
Path to the test data file.
--classification-file CLASSIFICATION_FILE
Path to the classification file to write the results
of the testing data.
--gini-index Whether or not to use gini-index instead of
information gain.
--alpha ALPHA Alpha to use in chi-squared calculations.Create a decision tree for a given data file:
python dtree.py --training-data data/training.csv --testing-data data/testing.csvBy default, information gain is used with an alpha vaue of 0.05. You can change these from the command line to use gini-index or a different alpha value:
To use gini-index, simply add the --gini-index flag to any command:
python dtree.py --training-data data/training.csv --testing-data data/testing.csv --gini-indexIf you want to use an alpha value other than 0.05, you can specify that with the --alpha parameters as well:
python dtree.py --training-data data/training.csv --testing-data data/testing.csv --alpha 0.01You can use both the --gini-index and --alpha flag together as well"
python dtree.py --training-data data/training.csv --testing-data data/testing.csv --alpha 0.1 --gini-indexThis module uses documentation complied by sphinx located in the docs/ directory. First, Shpinx needs to be installed into a virtual env:
First, you need to initialize the virtualenv:
virtualenv .venvNext, activate the virtualenv in your current shell:
source .venv/bin/activateNow, install the python requirements:
pip install -r requirements.txtYou can deactivate the virtualenv with the following command, however, make sure the virtualenv is active when you build the documentation:
deactivateNow you can build the documentation. To build the documentation, run the Makefile:
source .venv/bin/activate
make docsOnce the documentation is built, it can be viewed in your brower by running the open-docs.py script:
python open-docs.py- - Implement ID3
- - Implement Gini Index
- - Implement Information Gain
- - Implement Chi-Square testing in ID3
- - Write up final report