Mook Godmook

Mook

I'm an MLOps / Infrastructure engineer focused on LLM inference and GPU serving.

Currently finishing my M.S. in CS at USC. Built a two-level K8s dispatcher for a heterogeneous 700-GPU cluster — session-sticky routing, gRPC state sync, sub-GPU partitioning with HAMi + Kueue. Also worked on CUDA kernel optimization and distributed training pipelines.

Interested in the systems side of AI: how inference actually runs at scale, where the bottlenecks are, and how to push utilization without blowing up memory. Lately spending time in vLLM and SGLang internals — and looking to do more.

Stack · Python · C++ · CUDA · PyTorch · SGLang · Kubernetes · AWS · vLLM · JAX · GCP

GitHub · changmoo@usc.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mook Godmook

Achievements

Achievements

Block or report Godmook

Mook

Pinned Loading

Uh oh!