Skip to content
View Pratyusha-DS13's full-sized avatar

Block or report Pratyusha-DS13

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Pratyusha-DS13/README.md

👋 Hi, I’m Pratyusha

I build scalable machine learning systems with a focus on data pipelines, training infrastructure, and real-world ML applications.


🚀 Currently

  • Working on scalable training pipelines for Simulation-Based Inference (SBI)
  • Exploring memory-efficient data handling for large-scale ML workloads
  • Building end-to-end ML systems and data-driven applications

🧠 Core Interests

  • Machine Learning Systems
  • Data Engineering
  • Applied Machine Learning
  • System Design for ML

⚙️ What I Work On

  • Data Pipeline Design

    • Handling large, disk-backed datasets
    • Data cleaning, transformation, and feature engineering
    • Designing flexible ingestion pipelines for structured data
  • Scalable Training Systems

    • Batch-wise data processing using PyTorch
    • Avoiding full dataset materialization
    • Efficient integration of data pipelines into training loops
  • Statistical & Analytical Thinking

    • Exploratory Data Analysis (EDA)
    • Statistical reasoning and data-driven insights
    • Feature analysis and model evaluation
  • ML System Design

    • Designing modular and maintainable systems
    • API-based ML workflows
    • Trade-offs between performance, memory, and scalability

🛠️ Tech Stack

Languages
Python • SQL • C/C++ (basics)

Machine Learning & AI
PyTorch • Scikit-learn • NumPy • Pandas

Data Science & Analysis
Matplotlib • Seaborn • Exploratory Data Analysis (EDA) • Statistical Analysis

Data Engineering & Pipelines
Data Cleaning • Feature Engineering • Data Transformation • ETL Concepts

Backend & APIs
FastAPI • REST APIs

Databases
MySQL • PostgreSQL

Tools & Workflow
Git • GitHub • Jupyter Notebook • VS Code • Linux (basics)


🌐 Open Source

  • Contributed to open-source projects and actively exploring large codebases
  • Experience with understanding and improving existing systems

📫 Contact

pratyushamukherjee2005@gmail.com

Pinned Loading

  1. Plant-disease-classifier Plant-disease-classifier Public

    Jupyter Notebook

  2. Rag_assisstant Rag_assisstant Public

    Python

  3. Radiology_report_simplifier Radiology_report_simplifier Public

    Jupyter Notebook