A DataCamp project
- Overview
- Project Structure
- Data Source(s)
- Installation
- Usage
- Conclusions
- Technologies Used
- Contributing
- License
- Contact
This project performs cleaning and analysis using Python. This is done on three datasets supplied by a software developer: animal activity, animal health, and owner information. The goal is to merge the datasets into a single dataset usable by the development team and provide some basic insights from their data.
This is a portfolio project created to demonstrate my proficiency in data cleaning, analysis, and visualization, as well as creating functions to support that workflow using Python. It highlights my ability to work with real-world datasets, derive meaningful insights, and communicate results clearly through code and visualizations.
└── 📁pet-software-developer-pipeline
└── 📁assets
├── image.png
└── 📁code
└── 📁utilities
├── __init__.py
├── config.py
├── features.py
├── processes.py
├── visuals.py
├── notebook.ipynb
└── 📁data
├── cleaned.csv
├── pet_activities.csv
├── pet_health.csv
├── users.csv
└── 📁products
└── 📁images
├── Activity counts (non-health).jpg
├── Average monthly activity counts by owner age group.jpg
├── Average monthly activity counts by pet type.jpg
├── Distribution of activity counts per month by pet type.jpg
├── Distribution of activity counts per month.jpg
├── Distribution of health visits by pet type.jpg
├── Distribution of health visits for all pet types.jpg
├── Distribution of monthly health visits by pet type.jpg
├── Distribution of monthly health visits for all pet types.jpg
├── Distribution of time between health visits - Annual Checkup (owner age group, pet type).jpg
├── Monthly average activity count by owner age group and pet type.jpg
├── Pet counts by owner age group.jpg
├── Pet proportions by owner age group.jpg
├── Unique pet counts.jpg
├── report.md
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt
- File(s):
pet_activities.csv,pet_health.csv,users.csv - Source: DataCamp
- Description: Data on pet activities, pet health, and pet owners.
- Python 3.11+
- pip (Python package manager)
Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
pip install -r requirements.txtClone the repository and install required packages:
git clone https://github.com/kozmik-moore/pet-software-developer-pipeline.git
cd pet-software-developer-pipeline
pip install -r requirements.txtStart the Jupyter server:
jupyter notebookOpen and run notebooks from the /code directory to explore data and generate visualizations.
See full visual report in /products/report.md.
- Python 3.11+
- pandas – for data manipulation
- seaborn – for statistical data visualization
- matplotlib – for low-level plotting
- Jupyter Notebook – for interactive analysis
Contributions are welcome. To contribute:
- Fork the repository
- Create a new branch (
git checkout -b feature-branch) - Make your changes
- Commit your changes (
git commit -m "Add feature") - Push to your branch (
git push origin feature-branch) - Open a pull request
This project is licensed under the MIT License. See the LICENSE file for details.
Kozmik Moore
Email: [email protected]
GitHub: @kozmik-moore
LinkedIn: @kozmik-moore