Welcome to my Portfolio where I list the projects that I did outside of my work.
Don't hesitate to contact me on LinkedIn if you want additional information regarding some of the projects.
Current level: Advanced
Skills: Machine Learning, Deep Learning, Data analysis, Data cleaning, Data visualization, Natural Language Processing (NLP), Computer Vision (CV).
| Project Name | Area | Description | Libraries / Packages |
|---|---|---|---|
| ClassifyOps - a MLOps Text Classification Project | MLOps, CI/CD, Orchestration, DataOps, Machine Learning, Natural Language Processing | Deployment of a text classification model using modern MLOps practices | _Airflow, Airbyte, Feast, BigQuery, DBT, DVC, Pre-commit, nltk, scikit-learn, great-expectations, fastAPI, MLFlow, Optuna _ |
| Designing a Flight Booking Chatbot | Machine Learning, Natural Language Processing, AI | Development of a responsive chatbot using the Microsoft Botframework SDK. | Microsoft BotFramework, Azure LUIS, Azure Monitor, Opencensus, pandas, numpy, matplotlib |
| Image Segmentation Model | Machine Learning, Computer Vision, Image Segmentation | As part of a Self Driving Vehicle, design of an Image Segmentation API deployed on Flask. | Tensorflow, Keras, Flask, Scikit-Image, Albumentations, Open CV, squeezenet, pandas, numpy, matplotlib |
| Yelp reviews Image Classification and Sentiment Analysis | Image Classification, Natural Language Processing, Topic Identification, Clustering | The goal of this project was to classify Yelp review images as well as find topics of dissatisfaction among negative comments. | Tensorflow, Keras, Scikit-Learn, Vader Sentiment, BERTopic, HDBScan, Spacy, UMAP-learn, Open CV, category encoders, plotly, pandas, numpy, matplotlib |
| Borrower Scoring Algorithm | Credit Scoring, Machine Learning, Data Preprocessing, Class Imbalance, Hyperparameter Optimization | Creation of a Scoring Algorithm for a Credit Company from a dirty and imbalanced dataset. | Scikit-learn, Boruta-python, Scikit-Optimize, Imbalanced-learn, Category-encoders, pandas, numpy, matplotlib, seaborn |
| Twitter Sentiment Analysis | Natural Language Processing, Deep Learning, Sentiment Analysis | In this project, I develop 3 tiers of Sentiment Analysis tools to classify Twitter posts | Tensorflow, Keras, BERT, Azure Machine Learning Studio, Scikit-learn, spacy, pandas, numpy, matplotlib |
| E-Commerce Customer Segmentation | Clustering, Hyperparameter Optimization, Data Cleaning | Segmenting a customer Database in order to identify actionnable characteristics that can be used in a marketing campaign | Scikit-learn, geopy, clusteval, yellowbrick, pandas, numpy, matplotlib, seaborn |
| Building an Article Recommendation Application | Recommendation model, Collaborative Filtering, Machine Learning | Creating a recommendation model deployed on a smart phone application for a content creation website. | Tensorflow, Keras, Scikit-learn, Surprise, UMAP-learn, pandas, numpy, matplotlib, seaborn |
| Detecting Counterfeit Notes | Machine Learning, Logistic Regression | The goal of this project was to use a database of genuine and fake bank notes to build an algorithm to detect counterfeit notes. | pandas, numpy, scikit-learn, seaborn, scipy, statsmodels, matplotlib |
| Predicting a Child's Income | Machine Learning, Stochastic Gradient Descent, Linear Regression | During this project based on scientific research, I used data from the Worldbank databank to create a model to predict the income of a child based on the parent's income and the country data (mean income and gini index) | pandas, sklearn, seaborn, scipy, statsmodels, matplotlib, tqdm |
| Project Name | Area | Description | Libraries / Packages |
|---|---|---|---|
| Analyzing Food Data | Data Cleaning, Data Analysis, Data Visualization | Analyzing open source food data to identify health trends | pandas, numpy, plotly, scipy, seaborn, matplotlib |
| Paris Tree Data Analysis | Data Analysis, Data Visualization | Analyzing Paris tree data to identify trends about trees | pandas, numpy, scikit-learn, scipy, seaborn, matplotlib |
| Analyzing Sales Data (link to notebook) | Data Analysis, Data Visualization | Analyzing company sales data to highlight areas for improvement and identify potential new markets | pandas, numpy, sklearn, seaborn, scipy, matplotlib |
| Conducting a Market Analysis (link to notebook) | Hierarchical Clustering, Principal Component Analysis (PCA) | Using data gathered from the FAO website and clustering methods, the goal of this project was to identify candidate countries for expanding a Chicken export company | pandas, numpy, scikit-learn, seaborn, scipy, statsmodels, matplotlib |
| Conducting a Public Health Study (link to notebook) | Data Analysis, Data Visualization | In this project, I used FAO data to analyze the widespread malnutrition epidemic and communicate my analysis in a Powerpoint presentation. | pandas, numpy, matplotlib |
Current level: Advanced
Databases used: MySQL, PostgreSQL, BigQuery, Apache.
Disclaimer : Most of my experience in SQL was during my work as a Data Analyst for the French Navy. For obvious security reasons, I am not able to disclose the code for these projects.
| Project Name | Description | SQL Level |
|---|---|---|
| Cyclistic Bike Share Analysis | This is my version of the capstone project for Google's Data Analytics Course, where I use open source data to provide insights about a bike sharing company. SQL was used to extract from a 2GB dataset the most relevant data for further analysis with R. | Intermediate |
| Improving a Media Strategy | This project was the first project of my Bachelor's Degree, it showcases my ability to write beginner level SQL queries, its goal is to increase the audience of a Media Company. | Beginner |
Level : Intermediate
| Project Name (Link to Tableau Public) | Description |
|---|---|
| Dental Pharma KPI Dashboard (link to Tableau Public dashboard) | The goal of this project was to create a dashboard for a Pharmaceutical company by displaying 3 KPIs : costs, delays and deliverables delivered. One of the requirements was interactive thresholds that could allow the leaders of the company to identify the countries not respecting the KPI thresholds |
| Cyclistic Bike Share Analysis Viz(link to Tableau Public dashboard) | This is my version of the capstone project for Google's Data Analytics Course, where I use open source data to provide insights about a bike sharing company. Tableau was used during this project to produce heatmaps of the most often used stations. |
Level: Beginner
Packages used: Tidyverse (dplyr, ggplot), Seaborn, bigrquery
| Project Name | Area | Description | Packages |
|---|---|---|---|
| Cyclistic Bike Share Analysis | Data Analysis, Data visualization | This is my version of the capstone project for Google's Data Analytics Course, where I use open source data to provide insights about a bike sharing company. R was used to perform data analysis during this project. | tidyverse, bigrquery, wk, dplyr, seaborn, ggplot |