Instructor: Shyam Rajagopalan
Institute: Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru
Term: January – April 2025
This repository contains lab notes, exercises, and Python implementations for the Machine Learning Laboratory (BDBP-207) course. It spans foundational concepts to advanced topics in machine learning, including supervised and unsupervised learning, ensemble models, support vector machines, Bayesian learning, and explainable AI.
- Lab 1: Functions and Derivatives
- Lab 2: scikit-learn Basics – California Housing Dataset
- Lab 3: Linear Regression
- Lab 4: Gradient Descent and Normal Equations
- Lab 5: Logistic Regression and SGD
- Lab 6: k-Fold Cross-Validation and Model Selection
- Lab 7: Data Preprocessing – Normalization and Standardization
- Lab 8: Regularization and Encoding for Categorical Data
- Lab 9: Decision Trees with scikit-learn
- Lab 10: Decision Tree Components – Entropy, Information Gain
- Lab 11: Decision Tree Classifier from Scratch
- Lab 12: Decision Tree Regressor from Scratch
- Lab 13: Bagging and Random Forest
- Lab 14: AdaBoost
- Lab 15: Gradient Boosting
- Lab 16: XGBoost
- Lab 17: Kernel Methods – Feature Mapping and Kernel Trick
- Lab 18: RBF Kernel and Support Vector Machines
- Lab 19: Evaluation Metrics – Accuracy, ROC, AUC
- Lab 20: Unsupervised Learning – PCA and Clustering
- Lab 21: K-Means Clustering from Scratch
- Lab 22: Hierarchical Clustering
- Lab 23: Generative Models and Joint Probability
- Lab 24: Bayesian Learning – Naive Bayes and Inference
- Lab 25: Multiclass Classification – CIFAR10
- Lab 26: Explainable AI with SHAP
- Lab 27: Linear Algebra and Optimization Theory
- Lab 28: Real-World Case Studies – Genomics and DNA/RNA Data
- Lab 29: Project Implementation and Presentation
- Implementation of ML models from scratch and using libraries
- Supervised and unsupervised learning techniques
- Gradient descent and optimization
- Model evaluation and cross-validation
- Decision trees and ensemble learning
- Support vector machines and kernel methods
- Bayesian models and inference
- Data preprocessing and encoding
- Explainable AI tools such as SHAP
- Working with biological datasets
- load_iris
- load_digits
- load_wine
- load_breast_cancer
- load_diabetes
- fetch_california_housing
- fetch_20newsgroups
- fetch_openml
- Gene expression data: https://archive.ics.uci.edu/dataset/401/gene+expression+cancer+rna+seq
- SMS spam detection: https://www.kaggle.com/datasets/vishakhdapat/sms-spam-detection-dataset
- Breast cancer: https://raw.githubusercontent.com/jbrownlee/Datasets/master/breast-cancer.csv
- Twitter sentiment: https://www.kaggle.com/code/langkilde/linear-svm-classification-of-sentiment-in-tweets
- Python programming
- Linear algebra and calculus basics
- Familiarity with pandas, NumPy, matplotlib
- Jupyter Notebooks or any Python IDE
Maintained by:
Sharmishta G
Big Data Biology Program, IBAB
GitHub: https://github.com/SharmishtaGanesh14
- ISLP: An Introduction to Statistical Learning
- Pattern Recognition and Machine Learning by C. Bishop
- scikit-learn documentation: https://scikit-learn.org/stable/
- SHAP: https://shap.readthedocs.io/en/latest/