AI Engineer | Software Engineer | LLM & MLOps Researcher
Research Assistant @ Micro Electronics Research Lab (MERL)
LFXβ25 Mentee @ RISC-V International
Open-source contributor focused on production-ready AI systems, LLM evaluation, and reproducible ML pipelines.
Iβm an AI Engineer and Researcher working at the intersection of LLMs, MLOps, and open-source systems.
My work focuses on building reliable, testable, and deployment-ready AI pipelines rather than experimental-only models.
- π§ LLM evaluation & benchmarking (functional, syntactic, adversarial)
- π‘οΈ Hallucination mitigation in private LLMs using GAN-based approaches
- βοΈ Reproducible ML pipelines with CI/CD, logging, and SLA-aware validation
- π RISC-V data & tooling for machine-readable specifications and verification
π‘ I care deeply about making AI systems trustworthy in production.
-
πΉ Research Assistant β Micro Electronics Research Lab (MERL)
Working on LLM evaluation pipelines, benchmarking frameworks, and RISC-V-related tooling -
πΉ LFXβ25 Mentee β RISC-V International
Contributing to machine-readable RISC-V specifications, schemas, and CI validation pipelines
Python Β· Scala Β· Verilog Β· Java Β· Shell Β· JavaScript Β· HTML Β· CSS
PyTorch Β· TensorFlow Β· Hugging Face Transformers Β· GANs Β· LLM Evaluation
NumPy Β· Pandas Β· Scikit-learn
CI/CD Β· Docker Β· REST/gRPC Β· Logging & Monitoring Β· Reproducible Pipelines
Git Β· GitHub Actions Β· Linux Β· pytest
JSON Β· YAML Β· MySQL
π https://github.com/merledu/ai4org
- Built a privacy-first ML pipeline to detect and mitigate hallucinations in private LLMs
- Designed a GAN-style generator/discriminator for hallucination detection
- End-to-end pipeline: ingestion β validation β reproducible training β containerized inference
- Integrated CI/CD, automated testing, and monitoring for production readiness
π Designed for enterprise and on-prem LLM deployments where reliability matters.
π https://github.com/merledu/ArcheV
- Engineered a reproducible LLM benchmarking framework
- Standardized JSON I/O and CI-driven evaluation pipelines
- Validates functional and syntactic correctness to support deployment decisions
π https://github.com/riscv-software-src/riscv-unified-db
- Maintained versioned YAML/JSON schemas for RISC-V tooling
- Implemented CI validation to ensure data integrity and observability
- Improved downstream reliability for tooling and ML pipelines
- π Linux Foundation Mentorship Program (LFX) 2025
- π§ͺ Research Assistant at MERL
- π Improved LLM benchmarking reliability by ~25%
- π§ Hands-on experience with LLMs, GANs, MLOps, and CI/CD
- π Contributor to open-source and research-grade tooling
- πΌ LinkedIn: https://linkedin.com/in/shehroz-kashif
- π§ Email: [email protected]
β If you find my work useful, feel free to star a repository.
π€ Open to collaborations in AI, LLMs, MLOps, and open-source systems.


