- GPU kernel optimization and low-level performance engineering
- Model quantization and precision-efficient inference
🎯
Open To Work
GPU poor guy | MLSys | Open to work
- Boston
-
17:03
(UTC -05:00) - https://ramshankar07.github.io/portfoliov3/index.html
- in/ramshankarb
- https://ramshankar07.substack.com/
Highlights
- Pro
Pinned Loading
-
CUDA-llama3.1-inference
CUDA-llama3.1-inference PublicThis repository is CUDA implementation for LLAMA 3.1 open models
Cuda 2
-
qwen600-ROCm-inference
qwen600-ROCm-inference PublicForked from yassa9/qwen600
Static suckless single batch qwen3-0.6B mini inference engine
C++ 1
-
Fintech-Data-Processing-ETL-Platform
Fintech-Data-Processing-ETL-Platform PublicAssignment 02 for the course work DAMG7245-Spring 2025
Jupyter Notebook
-
Parallelizing-Text-to-Image-Generation
Parallelizing-Text-to-Image-Generation PublicExplore the feasibility and performance characteristics of using multiple CPUs versus GPUs for preprocessing tasks in text-to-image generation pipelines. Compare speedup, efficiency, and cost to id…
Jupyter Notebook
-
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.



