I build scalable machine learning systems with a focus on data pipelines, training infrastructure, and real-world ML applications.
- Working on scalable training pipelines for Simulation-Based Inference (SBI)
- Exploring memory-efficient data handling for large-scale ML workloads
- Building end-to-end ML systems and data-driven applications
- Machine Learning Systems
- Data Engineering
- Applied Machine Learning
- System Design for ML
-
Data Pipeline Design
- Handling large, disk-backed datasets
- Data cleaning, transformation, and feature engineering
- Designing flexible ingestion pipelines for structured data
-
Scalable Training Systems
- Batch-wise data processing using PyTorch
- Avoiding full dataset materialization
- Efficient integration of data pipelines into training loops
-
Statistical & Analytical Thinking
- Exploratory Data Analysis (EDA)
- Statistical reasoning and data-driven insights
- Feature analysis and model evaluation
-
ML System Design
- Designing modular and maintainable systems
- API-based ML workflows
- Trade-offs between performance, memory, and scalability
Languages
Python • SQL • C/C++ (basics)
Machine Learning & AI
PyTorch • Scikit-learn • NumPy • Pandas
Data Science & Analysis
Matplotlib • Seaborn • Exploratory Data Analysis (EDA) • Statistical Analysis
Data Engineering & Pipelines
Data Cleaning • Feature Engineering • Data Transformation • ETL Concepts
Backend & APIs
FastAPI • REST APIs
Databases
MySQL • PostgreSQL
Tools & Workflow
Git • GitHub • Jupyter Notebook • VS Code • Linux (basics)
- Contributed to open-source projects and actively exploring large codebases
- Experience with understanding and improving existing systems
