This repository organizes my Machine Learning studies with a focus on practical applications. The goal is to learn how to solve real-world problems efficiently, prioritizing what produces the most impact.
- Data exploration using pandas, matplotlib, and seaborn X
- Cleaning: handling missing values, duplicates, and outliers X
- Normalization and standardization X
- Encoding categorical variables (one-hot, label encoding)
- Train/validation/test split X
- Logistic Regression — simple classification X
- Random Forest — robust, good for tabular data X
- XGBoost / LightGBM — strong performance on structured data X
- 👉 Compare baselines vs ensembles, using cross-validation
- Metrics: accuracy, precision, recall, F1-score, ROC-AUC X
- Choose metrics according to the problem type X
- Techniques: regularization, cross-validation, early stopping
- Focus: train well and generalize to unseen data
- Tools: GridSearchCV, RandomizedSearchCV, Optuna
- Apply to RandomForest / XGBoost / LightGBM ✅
- Neurons, layers, activations, loss functions, optimizers ✅
- Backpropagation (high-level understanding) ✅
- Overfitting/underfitting in neural nets ✅
- TensorFlow (Keras API)
- Building dense feedforward networks ✅
- Using callbacks (early stopping, checkpointing) ✅
- Training on structured/tabular data ✅
- PyTorch
- Manual training loops vs high-level API (Lightning/FastAI)
- Understanding tensors & autograd
- Implementing a basic neural net from scratch
👉 Project: train the same simple NN in both frameworks, compare coding style
- Vectorization: TF-IDF, word embeddings, BERT embeddings
- Classification tasks (reviews, spam detection)
- Try both scikit-learn + TF-IDF and PyTorch/TensorFlow embeddings
- Collect and prepare data
- Split into train/validation/test sets
- Train ML or DL models
- Evaluate and adjust
- Deploy (pickle/joblib for ML, SavedModel/torchscript for DL)
- Serve via FastAPI/Flask
- Very deep neural nets (CNNs/RNNs) without tabular basics
- Reinforcement learning, GANs, diffusion models (too advanced for now)
- MLOps heavy tools (Kubeflow, Airflow)
📂 Next Repo Section:
/deep_learning_basics → notebooks comparing TensorFlow and PyTorch on the same problems.
Clone the repository and follow the notebooks and scripts in the /notebooks folder for exercises and mini-projects.