Pratyusha Mukherjee Pratyusha-DS13

👋 Hi, I’m Pratyusha

I build scalable machine learning systems with a focus on data pipelines, training infrastructure, and real-world ML applications.

Data Pipeline Design
- Handling large, disk-backed datasets
- Data cleaning, transformation, and feature engineering
- Designing flexible ingestion pipelines for structured data
Scalable Training Systems
- Batch-wise data processing using PyTorch
- Avoiding full dataset materialization
- Efficient integration of data pipelines into training loops
Statistical & Analytical Thinking
- Exploratory Data Analysis (EDA)
- Statistical reasoning and data-driven insights
- Feature analysis and model evaluation
ML System Design
- Designing modular and maintainable systems
- API-based ML workflows
- Trade-offs between performance, memory, and scalability

Languages
Python • SQL • C/C++ (basics)

Machine Learning & AI
PyTorch • Scikit-learn • NumPy • Pandas

Data Science & Analysis
Matplotlib • Seaborn • Exploratory Data Analysis (EDA) • Statistical Analysis

Data Engineering & Pipelines
Data Cleaning • Feature Engineering • Data Transformation • ETL Concepts

Backend & APIs
FastAPI • REST APIs

Databases
MySQL • PostgreSQL

Tools & Workflow
Git • GitHub • Jupyter Notebook • VS Code • Linux (basics)