GitHub - avhn/deu-bil3003-ps2: Decision tree generation with CART algorithm (Introduction to Data Mining / Problem set 2)

Introduction to Data Mining - Problem set 2

Goal is to generate and prune binary split classification trees with an implementation of CART algorithm. See problem set description.

Technologies used in this project:

Python 3.8 (Runs with 3.7)
GitLab CI
Git

Running

To run, use the main.py file, there's no dependencies:

$ python3.8 main.py

Output

# Decision Tree #
(credit_history in {delayed previously, existing paid, critical/other existing credit})
├(T)─ (credit_amount <= 7882.0)
│     ├(T)─ (credit_history in {delayed previously, existing paid})
│     │     ├(T)─ (property_magnitude in {real estate})
│     │     │     ├(T)─ (credit_amount <= 1768.0)
│     │     │     │     ├(T)─ good
│     │     │     │     └(F)─ (age <= 21.0)
│     │     │     │           ├(T)─ bad
│     │     │     │           └(F)─ good
│     │     │     └(F)─ good
│     │     └(F)─ (age <= 34.0)
│     │           ├(T)─ (employment in {1<=X<4, 4<=X<7})
│     │           │     ├(T)─ good
│     │           │     └(F)─ (credit_amount <= 2578.0)
│     │           │           ├(T)─ (age <= 28.0)
│     │           │           │     ├(T)─ good
│     │           │           │     └(F)─ bad
│     │           │           └(F)─ (employment in {<1, unemployed})
│     │           │                 ├(T)─ (property_magnitude in {real estate, no known property})
│     │           │                 │     ├(T)─ good
│     │           │                 │     └(F)─ bad
│     │           │                 └(F)─ good
│     │           └(F)─ good
│     └(F)─ bad
└(F)─ bad


# Test Result #
Accuracy: 0.72
TP rate: 0.7345132743362832
TN rate: 0.5833333333333334
TP count: 166
TN count: 14

Notes

Parsing aggregates records to a set, non-linear data structure. Therefore if there's multiple best splits with same gain, sometimes different and sometimes same �trees will show up for different executions.�
It is assumed that the .csv file indicates class tag as the last value.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
cart		cart
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
DESCRIPTION.pdf		DESCRIPTION.pdf
README.md		README.md
main.py		main.py
test.py		test.py
test_set.csv		test_set.csv
train_set.csv		train_set.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction to Data Mining - Problem set 2

Running

Notes

About

Uh oh!

Releases

Packages

Languages

avhn/deu-bil3003-ps2

Folders and files

Latest commit

History

Repository files navigation

Introduction to Data Mining - Problem set 2

Running

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages