abi2024

abi2024

Pinned Loading

DeepSeek-V3-Architecture-Implementation DeepSeek-V3-Architecture-Implementation Public

Converting SmolLM2-135M to DeepSeek-V3 with MLHA and MoE

Python
UNet-Segmentation UNet-Segmentation Public

A modular U-Net implementation in PyTorch built from scratch. Features a dynamic Model Factory to benchmark architectural variations (Pooling vs. Strided, Transpose vs. Upsample) and loss functions…

Python
smollm2-135-implementation smollm2-135-implementation Public

Complete from-scratch implementation of SmolLM2-135M, reverse-engineered from the pretrained model.

Jupyter Notebook
FlashMoE-Serve FlashMoE-Serve Public

High-performance MoE inference engine. Features fused OpenAI Triton kernels, continuous batching, and NF4 quantization. +72% throughput on RTX 3060.

Python