Hello! I'm a passionate Data Scientist and Machine Learning Engineer currently pursuing a Master's in Machine Learning Engineering (Pos-Tech / FIAP). I build end-to-end ML systems, production pipelines, generative AI applications, and scalable data solutions — from experimentation and model validation to MLOps, feature stores, and deployment.
I enjoy breaking things to understand them deeply (skew, leakage, latency, drift...) and documenting the fixes. Open-source learner & builder.
- Languages: Python, C#, C++, SQL, Scala, DAX, M Language
- ML/DL Frameworks: Scikit-learn, TensorFlow, PyTorch, Keras, XGBoost, NumPy, Pandas, Polars, Catboost...
- MLOps & Data Engineering: Feast (Feature Store), MLflow, Databricks (PySpark), PyDeequ, Airflow/Prefect/Dagster/N8N, Docker, CI/CD (GitHub Actions), ETL/ELT (Requests, Selenium, APIs), Data Governance/Auditing/Validation (Evidently, Great Expectations style)
- LLMs & Generative AI: LangChain, LlamaIndex, CrewAI (agents), OpenAI/Gemini/Claude, RAG pipelines, fine-tuning, vector DBs (MongoDB/Neo4j), prompt engineering, LLMOps (RAGAS, Langfuse)
- Computer Vision & Advanced ML: OpenCV, Reinforcement Learning (Gymnasium), Time Series (LSTM), CNNs/RNNs, Clustering/Classification/Regression
- Data Viz & BI: Power BI, Matplotlib, Seaborn, Plotly, Streamlit, GeoPandas, MS Excel, Google Sheets
- Cloud & Databases: Azure, AWS (SageMaker, etc.), Google Cloud (BigQuery), Heroku, MySQL, PostgreSQL, MongoDB, Neo4j
- Other: Jupyter/VS Code/Anaconda, CUDA (GPU), Linux, Git/GitHub, Bitrix/CRM/Trello, Agile/Scrum, Web Scraping, Data Integration
Check out my full portfolio below — 115+ public repos!
- School Lag Predictor – Master's Thesis — Predictive model for educational delay/lag
- LSTM Stock Price Prediction – Time Series Project — End-to-end forecasting pipeline
- Tech Challenge ML Engineer – Full Multi-Phase Portfolio — Complete end-to-end ML engineering case
- Tech Challenge Phase 3 — Advanced implementation stage
- Deep Learning Coursework — Neural network experiments
- Machine Learning Engineer Coursework — Core MLE topics & assignments
- Feature Store Crash Course – Feast Hands-On (Fraud Detection) — Skew, leakage, latency, duplication, governance
- CD4ML – Continuous Delivery for ML Workshop — Continuous Intelligence & CD4ML practices
- MLOps – ML & APIs — Early MLOps exploration
- Godot + RAG Project — LLM-powered features in Godot engine
- LLM Fine-Tuning Experiments — Fine-tuning workflows
- LlamaIndex + CrewAI Multi-Agent System — Agent orchestration
- LlamaIndex Vector DB RAG Pipeline — Standard RAG implementation
- LlamaIndex Introduction — Getting started with LlamaIndex
- LangChain LLM Development — LangChain-based apps
- Lunar Lander v3 – Reinforcement Learning — Policy gradients / DQN-style solving
- Databricks Notebooks & Pipelines Collection — Modern Databricks workflows
- Databricks Azure Pipeline (Scala) — Azure-integrated pipeline
- Databricks MLlib Recommendation Pipeline — Collaborative filtering
- PyDeequ – Data Quality Checks — Great Expectations-style validation
- Data Profiler & Privacy Techniques — Profiling + anonymization
- Metadata Extraction in Python — File/ data metadata handling
- Various ETL Pipelines — MongoDB, MySQL, API ingestion
- API Requests Pipeline — Web-to-storage flows
- Basic Data Pipelines — Foundational ETL
- Databricks Intro & File Types — Onboarding + formats
- Databricks Data Analysis — Exploratory analysis
- AWS Data Pipeline Exploration — AWS basics
- Motion Detection – OpenCV — Real-time motion tracker
- License Plate OCR — Text detection on plates
- OpenCV OCR Basics — Text extraction
- Face Analysis & Classification – OpenCV — Face detection/recognition
- Hand Tracking — Gesture/hand detection
- Twitter Image Recognition – Computer Vision API — API-based image labeling
- PyTorch – Zero to Mastery Course — Full deep learning with PyTorch
- Scikit-Learn – Regression, Classification, Clustering — Core algorithms
- Advanced Classification & ML Types — Multi-label, imbalanced, etc.
- Recommendation Systems Intro — Collaborative & content-based
- Multilabel Text Classification – NLP — Multi-context NLP
- Decision Trees Deep Dive — Tree-based models
- Credit Scoring Model — Risk modeling
- Hyperparameter Optimization — Grid/random search
- Model Validation Techniques — Cross-validation, metrics
- Clustering – k-means, DBSCAN, Mean Shift — Unsupervised basics
- High-Dimensional Data Handling — Dimensionality reduction
- Statistics with Python – Full Series — Probability, hypothesis testing, correlation
- Pandas Advanced & IO — Data wrangling mastery
- Data Visualization – Matplotlib & Seaborn — Plotting essentials
- GeoPandas – Geospatial Data — Maps & spatial analysis
- Time Series – COVID-19 Analysis — Forecasting basics
- Word2Vec & Embeddings — NLP embeddings
- NLP Intro – Regex & Sentiment — Text basics
- Convolutional & Recurrent Networks – PyTorch — CNNs & RNNs
- And many more foundational notebooks (1–87 series, Alura/Alura-like courses)...
- Streamlit Course Projects — Interactive dashboards
- ChatGPT / OpenAI API Experiments — Early LLM integrations
- Web Scraping with Python — Data collection
- Plotly 3D Globe Map — Interactive geo viz
- MetaTrader 5 – Python Integration — Trading analysis
- JSON to CSV Converter — Data format utils
- Bitrix24 CRM Download Script — API extraction
(Profile config, tests, early experiments, niche or non-technical repos)
- GitHub Profile Config
- Medium Repo Test
- Olist Dataset Exploration
- Financial Markets Analysis
- Iceland Renewable Energy
- Renewable Energy General
- Heritage Foundation Data
- Italian Words List
- Tankard Lyrics Beer Theme
- Python Dashboards (private-turned-public?)
- ...and remaining course/experiment repos not listed above (many 1–87 numbered ones fit fundamentals)
- Medium: https://igorcomune.medium.com/
- LinkedIn: https://www.linkedin.com/in/igor-comune/
Last updated 12/feb/2026
