Shehroz Kashif Shehrozkashif

👋 Hi, I'm Shehroz Kashif

AI Engineer | Software Engineer | LLM & MLOps Researcher
Research Assistant @ Micro Electronics Research Lab (MERL)
LFX’25 Mentee @ RISC-V International

Open-source contributor focused on production-ready AI systems, LLM evaluation, and reproducible ML pipelines.

🚀 About Me

I’m an AI Engineer and Researcher working at the intersection of LLMs, MLOps, and open-source systems.
My work focuses on building reliable, testable, and deployment-ready AI pipelines rather than experimental-only models.

🔍 Current Focus

🧠 LLM evaluation & benchmarking (functional, syntactic, adversarial)
🛡️ Hallucination mitigation in private LLMs using GAN-based approaches
⚙️ Reproducible ML pipelines with CI/CD, logging, and SLA-aware validation
📊 RISC-V data & tooling for machine-readable specifications and verification

💡 I care deeply about making AI systems trustworthy in production.

🧠 Roles & Affiliations

🔹 Research Assistant — Micro Electronics Research Lab (MERL)
Working on LLM evaluation pipelines, benchmarking frameworks, and RISC-V-related tooling
🔹 LFX’25 Mentee — RISC-V International
Contributing to machine-readable RISC-V specifications, schemas, and CI validation pipelines

🧰 Tech Stack

🔤 Languages

Python · Scala · Verilog · Java · Shell · JavaScript · HTML · CSS

🧠 AI / ML

PyTorch · TensorFlow · Hugging Face Transformers · GANs · LLM Evaluation
NumPy · Pandas · Scikit-learn

⚙️ MLOps & Engineering

CI/CD · Docker · REST/gRPC · Logging & Monitoring · Reproducible Pipelines
Git · GitHub Actions · Linux · pytest

🧾 Data & Config

JSON · YAML · MySQL

💡 Featured Projects

🛡️ AI4org — GAN-based Hallucination Mitigation for Private LLMs

🔗 https://github.com/merledu/ai4org

Built a privacy-first ML pipeline to detect and mitigate hallucinations in private LLMs
Designed a GAN-style generator/discriminator for hallucination detection
End-to-end pipeline: ingestion → validation → reproducible training → containerized inference
Integrated CI/CD, automated testing, and monitoring for production readiness

📌 Designed for enterprise and on-prem LLM deployments where reliability matters.

🔬 ArcheV — LLM Benchmark Suite

🔗 https://github.com/merledu/ArcheV

Engineered a reproducible LLM benchmarking framework
Standardized JSON I/O and CI-driven evaluation pipelines
Validates functional and syntactic correctness to support deployment decisions

📘 RISC-V Unified Database

🔗 https://github.com/riscv-software-src/riscv-unified-db

Maintained versioned YAML/JSON schemas for RISC-V tooling
Implemented CI validation to ensure data integrity and observability
Improved downstream reliability for tooling and ML pipelines

🏆 Highlights & Achievements

🎓 Linux Foundation Mentorship Program (LFX) 2025
🧪 Research Assistant at MERL
📊 Improved LLM benchmarking reliability by ~25%
🧠 Hands-on experience with LLMs, GANs, MLOps, and CI/CD
📝 Contributor to open-source and research-grade tooling

📈 GitHub Stats

📫 Connect With Me

💼 LinkedIn: https://linkedin.com/in/shehroz-kashif
📧 Email: [email protected]

⭐ If you find my work useful, feel free to star a repository.
🤝 Open to collaborations in AI, LLMs, MLOps, and open-source systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly