BDBP-207: Machine Learning Laboratory

Instructor: Shyam Rajagopalan
Institute: Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru
Term: January – April 2025

Overview

This repository contains lab notes, exercises, and Python implementations for the Machine Learning Laboratory (BDBP-207) course. It spans foundational concepts to advanced topics in machine learning, including supervised and unsupervised learning, ensemble models, support vector machines, Bayesian learning, and explainable AI.

Lab Modules

Foundations

Lab 1: Functions and Derivatives
Lab 2: scikit-learn Basics – California Housing Dataset
Lab 3: Linear Regression
Lab 4: Gradient Descent and Normal Equations

Classification and Evaluation

Lab 5: Logistic Regression and SGD
Lab 6: k-Fold Cross-Validation and Model Selection
Lab 7: Data Preprocessing – Normalization and Standardization
Lab 8: Regularization and Encoding for Categorical Data

Decision Trees and Ensembles

Lab 9: Decision Trees with scikit-learn
Lab 10: Decision Tree Components – Entropy, Information Gain
Lab 11: Decision Tree Classifier from Scratch
Lab 12: Decision Tree Regressor from Scratch
Lab 13: Bagging and Random Forest
Lab 14: AdaBoost
Lab 15: Gradient Boosting
Lab 16: XGBoost

Advanced Topics

Lab 17: Kernel Methods – Feature Mapping and Kernel Trick
Lab 18: RBF Kernel and Support Vector Machines
Lab 19: Evaluation Metrics – Accuracy, ROC, AUC
Lab 20: Unsupervised Learning – PCA and Clustering
Lab 21: K-Means Clustering from Scratch
Lab 22: Hierarchical Clustering
Lab 23: Generative Models and Joint Probability
Lab 24: Bayesian Learning – Naive Bayes and Inference
Lab 25: Multiclass Classification – CIFAR10
Lab 26: Explainable AI with SHAP
Lab 27: Linear Algebra and Optimization Theory
Lab 28: Real-World Case Studies – Genomics and DNA/RNA Data
Lab 29: Project Implementation and Presentation

Key Skills Acquired

Implementation of ML models from scratch and using libraries
Supervised and unsupervised learning techniques
Gradient descent and optimization
Model evaluation and cross-validation
Decision trees and ensemble learning
Support vector machines and kernel methods
Bayesian models and inference
Data preprocessing and encoding
Explainable AI tools such as SHAP
Working with biological datasets

Datasets Used

scikit-learn Datasets

load_iris
load_digits
load_wine
load_breast_cancer
load_diabetes
fetch_california_housing
fetch_20newsgroups
fetch_openml

External Datasets

Gene expression data: https://archive.ics.uci.edu/dataset/401/gene+expression+cancer+rna+seq
SMS spam detection: https://www.kaggle.com/datasets/vishakhdapat/sms-spam-detection-dataset
Breast cancer: https://raw.githubusercontent.com/jbrownlee/Datasets/master/breast-cancer.csv
Twitter sentiment: https://www.kaggle.com/code/langkilde/linear-svm-classification-of-sentiment-in-tweets

Prerequisites

Python programming
Linear algebra and calculus basics
Familiarity with pandas, NumPy, matplotlib
Jupyter Notebooks or any Python IDE

Maintainers

Maintained by:
Sharmishta G Big Data Biology Program, IBAB
GitHub: https://github.com/SharmishtaGanesh14

References

ISLP: An Introduction to Statistical Learning
Pattern Recognition and Machine Learning by C. Bishop
scikit-learn documentation: https://scikit-learn.org/stable/
SHAP: https://shap.readthedocs.io/en/latest/

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Lab1		Lab1
Lab10and11		Lab10and11
Lab12		Lab12
Lab13		Lab13
Lab14		Lab14
Lab15		Lab15
Lab16		Lab16
Lab17		Lab17
Lab18		Lab18
Lab19		Lab19
Lab2		Lab2
Lab20		Lab20
Lab21		Lab21
Lab22		Lab22
Lab23		Lab23
Lab24		Lab24
Lab26		Lab26
Lab3&4		Lab3&4
Lab5		Lab5
Lab6		Lab6
Lab7		Lab7
Lab8		Lab8
Lab9		Lab9
BayseianLearning.py		BayseianLearning.py
MulticlassClassification.py		MulticlassClassification.py
README.md		README.md
SHAP.py		SHAP.py
cross_val.py		cross_val.py
examples_class.pdf		examples_class.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BDBP-207: Machine Learning Laboratory

Overview

Lab Modules

Foundations

Classification and Evaluation

Decision Trees and Ensembles

Advanced Topics

Key Skills Acquired

Datasets Used

scikit-learn Datasets

External Datasets

Prerequisites

Maintainers

References

About

Uh oh!

Releases

Packages

Languages

SharmishtaGanesh14/MachineLearniningLab

Folders and files

Latest commit

History

Repository files navigation

BDBP-207: Machine Learning Laboratory

Overview

Lab Modules

Foundations

Classification and Evaluation

Decision Trees and Ensembles

Advanced Topics

Key Skills Acquired

Datasets Used

scikit-learn Datasets

External Datasets

Prerequisites

Maintainers

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages