Animal-adoption-EDA-ML---Capstone-project

Thank you! https://www.kaggle.com/jinbonnie/animal-data/code

Problem statement

need to maximise adoption rates to ensure adoption shelters are not completely dependent on government funding as well as to improve the quality of life of animals and pet owners.
This is based off our label: (Adopted, adopted and Returned & Not adopted)

Goal

to assist shelters in identifying features to take note of when trying to improve adoption rates.

EDA (exploratory data analysis) interesting points:

Side note: The most notable issue with this dataset is that it does not have any numerical/continuous data for me to manipulate. I must adjust everything to turn them into numbers

Black cats and dogs have the highest abandon and adoption rates
cats & dogs are the most popular animals to be abandoned
Adoption rate is not time sensitive
The most abandoned breed is the domestic shorthair in both cats and dogs (so much so that we can shorten them to a new column)

Data pre-processing

Data cleaning

Missing values

Such as deceaseddate, returneddate & identichip indicate alive, not returned and not chipped respectively
Other items missing will have their entire row dropped

Data engineering

Data needs to be in numbers for the Machine to read, these numbers shoud also make sense in the real world.

Feature engineering

Total time spent in shelter

getting Total time spent in shelter with intake and movement date
Negative timing is due to the same day dates being reversed is terms of newer and older dates. easily solved with abs() function

Adoption rate

Obtained via dividing current adopted with total number of animals. Caveat is that first few entries will be very skewed and jumpy on the graph

Example of features with data set errors

animalage

Dataset column is riddled with:
- strings where days, weeks & months need to be converted into years
  - Regular expression usage to achieve this
- outlier errors which need to make sense. Data that is on extreme ends can affect certain ML models
  - better to keep the data's SD relatively low
  - Here i chose to replace the outlier age (100 year old cats/ dogs) with the googled average maximum age of a cat/dog

Before and after data-preprocessing

Label manipulation before training models

SMOTEtomek was chosen to remove the nearest neighbours smoted to ensure data is not to0 cluttered

ML models used

Utilizing various Machine learning algorithmns and libraries to get the best model

ML EDA

MIP plot (we can see that the top 4 features that affected the Model the most were:
- returned reason
- Time spent in shelter
- chipped or not
- animal's age
Ultimately the Support vector Machine model was the best:
- 100% recall on adopted & not adopted labels
- 100% precision on adopted & returned label
- ROC OVO (micro on graph) 98%
- log loss very very low at 0.119 where logloss = 1.1 is when i myself randomly choose and label the data without any model.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Dataset_(Lim Kiat Hao)_Capstone DS106.csv		Dataset_(Lim Kiat Hao)_Capstone DS106.csv
LICENSE		LICENSE
Project_Presentation_(Lim Kiat Hao)_CAPSTONE PROJECT (DS 106).pdf		Project_Presentation_(Lim Kiat Hao)_CAPSTONE PROJECT (DS 106).pdf
Project_Proposal_(Lim Kiat Hao)_Capstone DS106.pdf		Project_Proposal_(Lim Kiat Hao)_Capstone DS106.pdf
Project_Script_(Lim Kiat Hao)_Capstone DS106 - EDA portion of ML pt2 (2).zip		Project_Script_(Lim Kiat Hao)_Capstone DS106 - EDA portion of ML pt2 (2).zip
Project_Script_(Lim Kiat Hao)_Capstone DS106 - EDA portion of ML.ipynb		Project_Script_(Lim Kiat Hao)_Capstone DS106 - EDA portion of ML.ipynb
Project_Script_(Lim Kiat Hao)_Capstone DS106 - ML portion of ML pt3.ipynb		Project_Script_(Lim Kiat Hao)_Capstone DS106 - ML portion of ML pt3.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Animal-adoption-EDA-ML---Capstone-project

Table of contents

Problem statement

Goal

EDA (exploratory data analysis) interesting points:

Side note: The most notable issue with this dataset is that it does not have any numerical/continuous data for me to manipulate. I must adjust everything to turn them into numbers

Data pre-processing

Data cleaning

Missing values

Data engineering

Feature engineering

Total time spent in shelter

Adoption rate

Example of features with data set errors

animalage

Before and after data-preprocessing

Label manipulation before training models

ML models used

ML EDA

About

Uh oh!

Releases

Packages

Languages

License

Kayden-lolasery/Animal-adoption-EDA-ML---Capstone-project

Folders and files

Latest commit

History

Repository files navigation

Animal-adoption-EDA-ML---Capstone-project

Table of contents

Problem statement

Goal

EDA (exploratory data analysis) interesting points:

Side note: The most notable issue with this dataset is that it does not have any numerical/continuous data for me to manipulate. I must adjust everything to turn them into numbers

Data pre-processing

Data cleaning

Missing values

Data engineering

Feature engineering

Total time spent in shelter

Adoption rate

Example of features with data set errors

animalage

Before and after data-preprocessing

Label manipulation before training models

ML models used

ML EDA

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages