Skip to content

nikhiledm97/TheGEMMCoreProject

Repository files navigation

TheGEMMCoreProject

(Deprecated) SystemVerilog implementation of Nvidia's SIMT CUDA, Hybrid-Precision Tensor Core, and Google's Systolic Array TPU MXU GEMM Operations.

Note: Although these modules are performing the same "operations", they're by no means really emulating the actual microarchitecture executing CUDA Core/Tensor Core/MXU instructions. Think of this as an introductory educational repo for FP arithmetic digital design. You could however use these modules as a quick alternative to say a prototype FPU in your FPGA design.

If you're interested in going deeper, I highly recommend checking out my work on the Vortex GPGPU's Tensor Core Unit (TCU) extension's TFR Floating Point RTL backend for a significantly more researched, optimized and realistic microarchitecture implementation.

Tensor Core Versions

TensorCore v0: Volta Architecture [FP16MUL FP32ADD]

Volta Tensor Core Architecture Diagram
Volta Tensor Core Architecture Diagram

TensorCore v1: Ampere Architecture [TF32MUL FP32ADD / BF16MUL FP32ADD] + Fine-Grained Structured Sparsity

Ampere Tensor Core Architecture Diagram
Ampere Tensor Core Architecture Diagram

TensorCore v2: Hopper Architecture [FP8(E5M2/E4M3)MUL FP16ADD]

Hopper Tensor Core Architecture Diagram

About

SystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors