Skip to content

azizp128/data-science-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

208 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Data Analysis

Project Description
Financial Managament Dashboard A modern, interactive dashboard for analyzing and visualizing financial data, built with Streamlit and Plotly. This tool helps businesses and analysts gain actionable insights into revenue, expenses, profit margins, and more.
Talent Match Intelligence A data-driven AI-powered application for identifying high-potential employees based on competency and psychometric benchmarks.
Supplement Sales Analysis A data-driven sales analysis report evaluating the effectiveness of promotional discounts across regions and store types. By leveraging historical sales data, the report identifies key factors driving promotional success, providing actionable insights for WOMart to optimize campaign strategies and enhance revenue performance.
Telco Customer Churn Analysis A data-driven analytical report examining customer churn patterns in the telecommunications industry. Using historical customer data, the analysis identifies key factors driving churn versus retention, providing actionable insights to design targeted retention strategies, improve service offerings, and support long-term revenue growth.
Sales Report Business Intelligence Analyst A business intelligence project that analyzes transactional sales data from multiple cities and product categories to uncover revenue patterns, identify top-performing markets, and highlight underperforming areas. By integrating SQL-based data extraction and interactive dashboard visualization, the system enables management to make data-driven decisions, optimize marketing strategies, improve inventory allocation, and maximize overall sales performance.
A/B Testing the Effect of Campaign Frequency on Deposit Account Opening An A/B testing analysis designed to evaluate the impact of campaign frequency on deposit account openings in a banking context. Using bank dataset, the report examines whether higher campaign exposure influences user conversion rates, providing evidence-based insights to optimize marketing strategies and improve customer acquisition.
Youtube Channel Dashboard Report A data visualization dashboard designed to monitor YouTube channel performance, focusing on engagement metrics and audience growth trends. The dashboard provides insights for content strategists, marketing teams, and channel managers by highlighting key performance indicators, regional engagement distribution, and content effectiveness, enabling data-driven decisions to optimize growth and audience retention.
Reddit Post Time Analysis (Israel-Palestine Conflict) An analytical report exploring posting activity on Reddit discussions of the Israel-Palestine conflict. Using Tableau visualizations, the project identifies the most popular posting hours to infer potential author locations. Findings are compiled and published in a GitBook blog, providing accessible insights into posting behavior and regional engagement patterns.
History of Nobel Prize Winners A data analysis project using Nobel Prize laureate data from Kaggle to explore historical trends and patterns. The analysis visualizes winner demographics, prize categories, and country distributions, uncovering insights into representation, repeat winners, and global shifts in scientific recognition over time.
NYC Public School Test Result Scores A data analysis project examining SAT performance across New York City public schools using the schools.csv dataset. The analysis explores reading, math, and writing scores to uncover patterns in student achievement and answer key questions relevant to policymakers, educators, researchers, and parents, supporting data-driven decisions in education.
Crime in Los Angeles A data analysis project using crime records from the Los Angeles Open Data portal to uncover patterns in criminal activity across the city. The analysis provides actionable insights to support the LAPD in resource allocation, helping identify crime hotspots, temporal trends, and behavioral patterns to improve public safety strategies.
Airbnb Market Trends A data analysis project exploring New York City’s Airbnb market by integrating datasets from multiple file formats (.csv, .tsv, .xlsx). The analysis uncovers patterns in listings, pricing, and availability, providing insights into market dynamics and lodging trends in one of the world’s most visited cities.
Netflix Movies An exploratory data analysis project using netflix dataset to investigate trends in Netflix’s 1990s movie catalog. The analysis uncovers patterns in genres, release years, and movie attributes, providing insights for a production company focused on nostalgic content strategies.

🤖 Machine Learning

Project Description
Hiring Decision Prediction A machine learning-based prediction system designed to assess candidate suitability in the recruitment process. It uses historical applicant data to predict hiring outcomes, helping streamline decision-making and improve the accuracy of candidate selection.
Customer Personality Segmentation A machine learning-based customer segmentation system that analyzes demographic, financial, and behavioral data to predict customer clusters, enabling businesses to optimize marketing strategies, boost retention, and drive revenue growth.
Customer Clicked Ads Classification A machine learning-based advertisement click prediction system that analyzes demographic, behavioral, and online activity data to classify customers as likely or unlikely to click on ads, enabling businesses to optimize targeting strategies, reduce marketing costs, improve conversion rates, and maximize campaign profitability.
Employee Attrition Prediction A machine learning-based employee attrition prediction system that analyzes demographic, performance, and engagement data to identify employees at risk of resigning, enabling organizations to take proactive actions, improve retention strategies, reduce turnover costs, and foster a more stable and productive workforce.
Telco Customer Churn Prediction A machine learning-based predictive model designed to identify customers at high risk of churn in the telecommunications industry. Leveraging historical customer data, the model applies classification techniques and evaluates performance using F1-score, Precision, Recall, and ROC AUC. This enables proactive and data-driven retention strategies, helping the company reduce churn and strengthen customer loyalty.
YouTube Channel Views Prediction A machine learning-based regression project designed to predict YouTube video views using metadata and engagement statistics. The analysis includes data cleaning, feature engineering, and model experimentation with Linear Regression, Random Forest, and XGBoost. Model performance is evaluated with RMSE, MAE, and R² Score, while feature importance analysis provides insights into the key factors influencing video popularity.
Categorical Feature Encoding Challenge II A machine learning project from the Kaggle ‘Categorical Feature Encoding Challenge II’, focused on handling diverse categorical variables including binary, nominal, ordinal, and cyclical features. The challenge involves experimenting with encoding techniques, addressing missing data, and exploring feature interactions to evaluate model performance across different algorithms.
Predicting Movie Rental Durations A machine learning regression project developed for a DVD rental company to predict rental durations based on customer and transaction features. By experimenting with multiple regression models and optimizing for a mean squared error (MSE) of 3 or less, the project supports efficient inventory planning and improved operational decision-making.
Categorical Feature Encoding Challenge Kaggle challenge on categorical encoding methods.
Credit Card Approvals A machine learning project that automates the prediction of credit card application approvals, replicating processes used by commercial banks. Using applicant financial and demographic attributes, the model streamlines decision-making by reducing manual review, minimizing errors, and improving efficiency in evaluating creditworthiness.
Income Prediction A machine learning project leveraging the UCI Adult Income dataset to predict whether an individual earns over $50K annually based on demographic and employment features. The pipeline integrates preprocessing and model training with XGBoost, supporting both single-input and batch CSV predictions. Beyond model development, the project demonstrates deployment best practices, showcasing production-ready solutions through Flask (Heroku) and an interactive Streamlit application.
Customer Analytics: Preparing Data A data preparation project focused on optimizing large-scale customer datasets for efficient machine learning modeling. Using customer dataset from Training Data Ltd., the project applies data cleaning and transformation techniques to improve storage efficiency and readiness for predictive modeling, enabling faster execution without compromising dataset integrity.
Car Purchase Prediction A machine learning application built with XGBoost to predict car purchase likelihood based on age, gender, and annual income. Deployed via Streamlit, the app provides personalized insights by recommending suitable cars for positive predictions or suggesting career paths to improve income for negative predictions, enabling data-driven decision support for potential buyers.
Titanic Survival Prediction An experiment with machine learning model deployment using the Django framework on Heroku. This project features a simple machine learning model that utilizes the Logistic Regression algorithm to predict the survival of passengers aboard the Titanic.

📝 NLP

Project Description
IndoBERT Emotion Prediction A deep learning-based emotion classification model using IndoBERT, trained on Twitter data to predict six emotions (anger, sadness, happiness, love, fear, disgust), and deployed via Hugging Face API with Streamlit.
IndoBERT Javanese Sentiment Analysis A deep learning-based sentiment analysis model using IndoBERT, trained on Javanese Ngoko Lugu tweets, deployed via Hugging Face API and Streamlit.

About

List of all Data Science projects I have worked on (and published).

Topics

Resources

Stars

Watchers

Forks

Contributors