Aiden Zhao yongkangzhao

👋 Hi, I'm Aiden Zhao

A Machine Learning Engineer shaping the future of AI, one dataset at a time.

I'm a Machine Learning & AI Engineer specializing in Foundation Models, AI Agents, and large-scale training systems. I operate on a core belief that has become the foundation of my work:

The quality of data defines the ceiling of a model's potential, while the algorithm determines how close we get to that ceiling. It is paramount to get the data right, and that is where I dedicate my expertise.

Currently, as a Senior Data Scientist at Capital One, my primary focus is on specialized dataset curation for training Foundational Language Models. I architect and implement the strategies that govern the data our models learn from, ensuring that from the very beginning, they are built on a foundation of quality, relevance, and integrity.

🚀 Career Highlights

Throughout my career, I've focused on creating tangible impact by bridging the gap between research and production.

🏆 1st Place Winner, ElevenLabs Hackathon: As the AI Engineer for a four-person team, co-developed 'Hugo Tour Guide,' an AI travel companion that won first prize. The project provides users with personalized, location-aware travel experiences by planning routes and answering cultural and historical questions on the go.
💡 Architect of 'EZ-Career' at Microsoft AI Agents Hackathon: Designed and co-developed "EZ-Career," a sophisticated multi-agent autonomous system to fully automate the job application process. The project leveraged a complex architecture with Playwright for browser control and Retrieval-Augmented Generation (RAG) for factual consistency, showcasing a deep dive into practical, real-world agentic workflows.
Enhanced AI Agent Capabilities at TIFIN: As a Senior Machine Learning Engineer, I enhanced conversational AI agents, fine-tuned open-source models to boost performance while reducing latency, and designed tools to expand agent functionality.
Engineered 10x Training Efficiency: At Megagon Labs, I boosted model training efficiency by over 10-fold by developing highly efficient (FSDP, LoRA, BF16) pretraining and fine-tuning scripts.
Developed AI Agent for Thousands of Users: As a Research Assistant at Georgia Tech, I developed and enhanced "Jill Watson," an AI Agent for Adult Learning deployed in multiple schools and used by thousands of users.
Pioneered Agentic Frameworks: I designed and implemented an agent-based task management framework using models like GPT, LLaMA, and Mixtral to facilitate complex dataset generation and information extraction.

🔬 What I'm Currently Exploring

My intellectual curiosity keeps me constantly learning. Right now, I'm diving deep into:

The Frontier of Model Efficiency with Tiny LLMs: I'm incredibly excited by the potential of small, hyper-efficient models. With my NVIDIA RTX PRO 6000 Blackwell 96GB GPU, I'm actively experimenting with aggressive FP4 quantization. My goal is to develop powerful, task-specific models with a minimal memory footprint, unlocking blazing-fast inference speeds suitable for real-time production environments.
Complex Document QA: Designing processes to help AI agents extract and answer questions from complex documents, enhancing information retrieval and decision-making for users.
Synthetic Data Generation: Establishing baseline methods and frameworks for generating high-quality synthetic question-and-answer pairs to augment training datasets.

🛠️ My Tech Stack

I am proficient with a wide array of tools and technologies across the full MLOps lifecycle.

Languages & Core Libraries

ML Frameworks & Libraries

Modeling & Optimization

Cloud, DevOps & Infra

Data Stack & Tools

Graph & Retrieval

🖥️ My Homelab: A Personal AI Supercomputer

Beyond my professional work, I've designed and built a high-performance homelab from the ground up. It serves as my personal cloud and sandbox for pushing the limits of AI/ML development, allowing for experimentation that rivals enterprise-grade environments. The entire system is architected for maximum throughput, from networking to storage to compute.

Core Workstation Specifications

Component	Specification
CPU	AMD Ryzen™ Threadripper™ PRO 7965WX (24 Cores / 48 Threads)
GPU	NVIDIA RTX PRO 6000 Blackwell 96GB GDDR7 VRAM
RAM	256 GB Kingston FURY Renegade Pro DDR5 ECC Registered (5600MT/s)
Storage (OS & Hot Data)	2TB Kingston FURY Renegade PCIe 5.0 NVMe M.2 SSD (Up to 14.8 GB/s)
Motherboard	ASRock WRX90 WS EVO
Power Supply	Seasonic PRIME 1600W 80+ Platinum (ATX 3.0)
Cooling	Thermaltake AW420 AIO Liquid Cooler
Chassis	Fractal Design Meshify 3 XL

Network & Storage Fabric

🚀 High-Throughput Compute: The core of the lab is the workstation, pairing the massive parallel processing power of a Threadripper PRO with the cutting-edge NVIDIA RTX PRO 6000 GPU. Built on the Blackwell architecture, it features 96GB of GDDR7 memory, fifth-generation Tensor Cores, and fourth-generation RT Cores, providing unparalleled acceleration for training, inference, and complex data visualization.
🗄️ Centralized Storage Backbone: A 1U rackmount server running TrueNAS SCALE acts as the central data hub. It's configured with a 96TB ZFS RAID-Z2 array, providing extreme data redundancy and integrity, and is connected directly to the workstation and network core via a 10GbE switch.
🌐 Multi-Gig Network Core: The network is managed by a Ubiquiti Dream Machine Pro. A dedicated 10GbE switch provides a high-speed data lane between the workstation and the NAS, eliminating I/O bottlenecks. A separate 2.5GbE PoE switch powers Wi-Fi 7 (U7 Pro) access points, ensuring general network access doesn't interfere with high-performance tasks.

What This Architecture Unlocks

This end-to-end, high-performance environment allows me to:

Explore novel quantization techniques like FP4 on state-of-the-art hardware, powered by next-gen Tensor Cores.
Finetune larger models that would be impractical on consumer-grade hardware, thanks to the GPU's massive 96GB VRAM.
Rapidly iterate on data-intensive workflows, with near-instantaneous dataset loading from the PCIe 5.0 SSD and the 10GbE-connected NAS.
Run stable, long-duration experiments with confidence, backed by ECC memory and an enterprise-grade power and cooling solution.
Self-host a complete suite of MLOps tools, from local vector databases to model registries, creating a fully integrated, private research cloud.

📊 My GitHub Stats

📜 Certifications

AWS Certified Cloud Practitioner
Tableau Desktop Certified Associate
Hack Together: AI Agents Hackathon Badge

📫 Let's Connect!

I'm always open to discussing new ideas, collaborating on projects, or just chatting about the future of AI. Feel free to reach out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly