| 
 | 1 | +---  | 
 | 2 | +title: "Supporting Thrust API in Clad"  | 
 | 3 | +layout: post  | 
 | 4 | +excerpt: "This summer, I am working on adding support for Thrust API in Clad, enabling automatic differentiation of GPU-accelerated code. This work bridges the gap between high-performance CUDA parallelism and source-to-source AD transformation."  | 
 | 5 | +sitemap: false  | 
 | 6 | +author: Abdelrhman Elrawy  | 
 | 7 | +permalink: blogs/gsoc25_/  | 
 | 8 | +banner_image: /images/blog/gsoc-banner.png  | 
 | 9 | +date: 2025-05-18  | 
 | 10 | +tags: gsoc llvm clang automatic-differentiation gpu cuda thrust  | 
 | 11 | +---  | 
 | 12 | + | 
 | 13 | +## About Me  | 
 | 14 | + | 
 | 15 | +Hi! I’m Abdelrhman Elrawy, a graduate student in Applied Computing specializing in Machine Learning and Parallel Programming. I’ll be working on enabling **Thrust API support in Clad**, bringing GPU-accelerated parallel computing to the world of automatic differentiation.  | 
 | 16 | + | 
 | 17 | +## Project Description  | 
 | 18 | + | 
 | 19 | +[Clad](https://github.com/vgvassilev/clad) is a Clang-based tool for source-to-source automatic differentiation (AD). It enables gradient computations by transforming C++ code at compile time.  | 
 | 20 | + | 
 | 21 | +However, many scientific and machine learning applications leverage **NVIDIA’s Thrust**, a C++ parallel algorithms library for GPUs, and currently, Clad doesn’t support differentiating through Thrust constructs. This limits the usability of Clad in high-performance CUDA code.  | 
 | 22 | + | 
 | 23 | +My project addresses this gap by enabling Clad to:  | 
 | 24 | + | 
 | 25 | +- Recognize and handle Thrust primitives like `thrust::transform` and `thrust::reduce`  | 
 | 26 | +- Implement **custom pullback/pushforward rules** for GPU kernels  | 
 | 27 | +- Ensure gradients maintain **parallel performance and correctness**  | 
 | 28 | +- Benchmark and validate derivatives in real-world ML and HPC use cases  | 
 | 29 | + | 
 | 30 | +## Technical Approach  | 
 | 31 | + | 
 | 32 | +The project begins with a **proof-of-concept**: manually writing derivatives for common Thrust operations like `transform` and `reduce`. These are compared against finite differences to validate correctness.  | 
 | 33 | + | 
 | 34 | +Following that, I’ll integrate custom differentiation logic inside Clad, building:  | 
 | 35 | +- A `ThrustBuiltins.h` header for recognizing Thrust calls  | 
 | 36 | +- Visitor pattern extensions in Clad’s AST traversal (e.g., `VisitCallExpr`)  | 
 | 37 | +- GPU-compatible derivative utilities (e.g., CUDA-aware `thrust::fill`, `transform`)  | 
 | 38 | + | 
 | 39 | +I'll also implement **unit tests**, real-world **mini-apps** (e.g., neural networks), and **benchmarks** to validate and demonstrate this feature.  | 
 | 40 | + | 
 | 41 | +## Expected Outcomes  | 
 | 42 | + | 
 | 43 | +By the end of GSoC 2025, Clad will be able to:  | 
 | 44 | +- Differentiate through key Thrust primitives with GPU execution preserved  | 
 | 45 | +- Provide documentation and tutorials for GPU-based automatic differentiation  | 
 | 46 | +- Contribute a robust test suite and benchmarks to the Clad ecosystem  | 
 | 47 | + | 
 | 48 | +## Related Links  | 
 | 49 | + | 
 | 50 | +- [Clad GitHub](https://github.com/vgvassilev/clad)  | 
 | 51 | +- [Project description](https://hepsoftwarefoundation.org/gsoc/2025/proposal_Clad-ThrustAPI.html)  | 
 | 52 | +- [My GitHub](https://github.com/a-elrawy)  | 
0 commit comments