Applied ML researcher/engineer. I build scalable LLM systems, agentic AI, and robotics, bridging research โ production.
-
LLM Pre-training & Fine-tuning: Distributed pipelines (DeepSpeed, FSDP, SMP, LoRA/PEFT, CPU offload) for 7Bโ70B models โ training 57h โ 5.6h, multi-GPU/Node, benchmarked and scaled for production.
-
Inference Optimization: Triton + TensorRT + vLLM, fused attention, multi-LoRA adapters, quantization โ 70% latency reduction, 80+ QPS, production-grade deployments.
-
Embodied & Agentic AI: Modular exoskeleton (LLM + vision + speech) โ ICML 2025 demo. Multi-agent orchestration for GenAI campaigns & designer assistant with RAG, query rewriting, hallucination detection.
-
Mechanistic Interpretability: Crosscoders (sparse autoencoders) to probe LLM instruction-tuning; HF open-source pipeline.
-
NP-hard / HPC Projects: Brick Maestro: Lego assembly optimization using HPC, AWS ParallelCluster โ presented at AWS re:Invent, Paris Summit.
-
Foundations: Deep Learning for Face Anti-Spoofing (Thesis), TA at NIT Rourkela, Algorithm (karatsuba + quad itoh-tsuji) optimization @ DRDO.
- Training: PyTorch, DeepSpeed, FSDP, SMP, LoRA/PEFT, Multi-GPU/Node
- Inference: TensorRT-LLM, vLLM, sglang, LoRA adapters, Quantization & Distillation
- Infra: AWS (SageMaker, EKS, HyperPod, ParallelCluster), Docker, Prometheus, Grafana
- Other: HPC, distributed LLM scaling, agentic AI