Staff ML Engineer · Meta
Building agentic AI systems, LLM eval infrastructure, and XAI pipelines at billion-user scale. Current focus: making AI systems that can explain themselves, fail safely, and be trusted in production.
- Agentic AI — multi-step agent architectures, tool-use, planning, and safety guardrails at scale
- LLM Eval Infrastructure — consistency, hallucination detection, factual grounding, response drift
- LLM Inference Infrastructure — high-throughput model serving, torch.compile optimisation, KV cache efficiency, production latency SLAs
- MLOps & Observability — drift detection, model monitoring, evaluation pipelines, contributor to evidentlyai/evidently
- Explainable AI (XAI) — decision explainability hooks, counterfactual reasoning, causal attribution
- Security ML — real-time risk scoring, access intelligence, anomaly detection at billion-event scale
Identifies 7 failure modes in production agentic AI systems and introduces PAEF (Production Agentic Evaluation Framework) — validated on four controlled experiments. Reference implementation: llm-eval-toolkit.
My repos
| Repo | Description |
|---|---|
| llm-eval-toolkit | Production-grade framework for evaluating LLM agent outputs — consistency, grounding, hallucination, drift |
| agentic-safety-patterns | Pattern library for safe agentic systems — circuit breakers, explainability hooks, rollback, audit logging |
| retrieval-ranking-eval | Dense retrieval + cross-encoder reranking pipeline benchmarked on BEIR datasets — NDCG@K, Recall@K, MRR |
| QuantumAI-IntradayRiskDemo | Intraday risk pipeline: LSTM volatility forecasting + quantum-inspired QUBO/D-Wave portfolio optimisation |
Upstream contributions
| Repo | What |
|---|---|
| evidentlyai/evidently | Merged PR #1318 — ROUGE score descriptor (rouge1/2/L, F/P/R variants, 737 lines, 31 tests) |
| evidentlyai/evidently | Merged PR — KL-divergence drift score bug fix |
| vllm-project/vllm | PR #41381 open — torch.compile config hash typing cleanups + cache_key_factors debug expansion |

