|  | 
|  | 1 | +- name: "CompilerResearchCon 2025 (day 2)" | 
|  | 2 | +  date: 2025-11-13 15:00:00 +0200 | 
|  | 3 | +  time_cest: "15:00" | 
|  | 4 | +  connect: "[Link to zoom](https://princeton.zoom.us/j/97915651167?pwd=MXJ1T2lhc3Z5QWlYbUFnMTZYQlNRdz09)" | 
|  | 5 | +  label: crcon25_part_2 | 
|  | 6 | +  agenda: | 
|  | 7 | +    - title: "Implementing Debugging Support for xeus-cpp" | 
|  | 8 | +      speaker: | 
|  | 9 | +        name: "Abhinav Kumar" | 
|  | 10 | +      time_cest: "15:00 - 15:20" | 
|  | 11 | +      description: | | 
|  | 12 | +        This proposal outlines integrating debugging into the xeus-cpp kernel  | 
|  | 13 | +        for Jupyter using LLDB and its Debug Adapter Protocol (lldb-dap).  | 
|  | 14 | +        Modeled after xeus-python, it leverages LLDB’s Clang and JIT debugging  | 
|  | 15 | +        support to enable breakpoints, variable inspection, and step-through  | 
|  | 16 | +        execution. The modular design ensures compatibility with Jupyter’s  | 
|  | 17 | +        frontend, enhancing interactive C++ development in notebooks. | 
|  | 18 | +
 | 
|  | 19 | +        This project achieved DAP protocol integration with xeus-cpp. User can  | 
|  | 20 | +        use the JupyterLab’s debugger panel to debug C++ JIT code. Applying and  | 
|  | 21 | +        hitting breakpoints, stepping in and out of functions are supported in  | 
|  | 22 | +        xeus-cpp. Additionally, during this project I had refactored  | 
|  | 23 | +        the Out-of-Process JIT execution which was the major part in implementing  | 
|  | 24 | +        the debugger.  | 
|  | 25 | +
 | 
|  | 26 | +
 | 
|  | 27 | +      # slides: /assets/presentations/... | 
|  | 28 | + | 
|  | 29 | +    - title: "Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels" | 
|  | 30 | +      speaker: | 
|  | 31 | +        name: "Maksym Andriichuk" | 
|  | 32 | +      time_cest: "15:20 - 15:40" | 
|  | 33 | +      description: | | 
|  | 34 | +        Clad is a Clang plugin designed to provide automatic differentiation (AD) for C++  | 
|  | 35 | +        mathematical functions. It generates code for computing derivatives modifying  | 
|  | 36 | +        Abstract-Syntax-Tree(AST) using LLVM compiler features. It performs advanced program  | 
|  | 37 | +        optimization by implementing more sophisticated analyses because it has access to a  | 
|  | 38 | +        rich program representation – the Clang AST.  | 
|  | 39 | +
 | 
|  | 40 | +        The project achieved to optimize code  that contains potential data-race conditions,  | 
|  | 41 | +        significantly speeding up the execution. Thread Safety Analysis is a static analysis  | 
|  | 42 | +        that detects possible data-race conditions that would enable reducing atomic  | 
|  | 43 | +        operations in the Clad-produced code. | 
|  | 44 | +         | 
|  | 45 | +      # slides: /assets/presentations/... | 
|  | 46 | + | 
|  | 47 | +    - title: "Enable automatic differentiation of OpenMP programs with Clad" | 
|  | 48 | +      speaker: | 
|  | 49 | +        name: "Jiayang Li" | 
|  | 50 | +      time_cest: "15:40 - 16:00" | 
|  | 51 | +      description: | | 
|  | 52 | +        This project extends Clad, a Clang-based automatic differentiation tool for C++, to  | 
|  | 53 | +        support OpenMP programs. This project enables Clad to parse and differentiate  | 
|  | 54 | +        functions with OpenMP directives, thereby enabling gradient computation in  | 
|  | 55 | +        multi-threaded environments. | 
|  | 56 | +
 | 
|  | 57 | +        This project achieved Clad support for both forward and reverse mode differentiation  | 
|  | 58 | +        of common OpenMP directives (parallel, parallel for) and clauses (private,  | 
|  | 59 | +        firstprivate, lastprivate, shared, atomic, reduction) by implementing OpenMP-related  | 
|  | 60 | +        AST parsing and designing corresponding differentiation strategies. Additional  | 
|  | 61 | +        contributions include example applications and comprehensive tests. | 
|  | 62 | +
 | 
|  | 63 | +         | 
|  | 64 | +      # slides: /assets/presentations/... | 
|  | 65 | + | 
|  | 66 | +    - title: "Using ROOT in the field of Genome Sequencing" | 
|  | 67 | +      speaker: | 
|  | 68 | +        name: "Aditya Pandey" | 
|  | 69 | +      time_cest: "16:00 - 16:20" | 
|  | 70 | +      description: | | 
|  | 71 | +        The project extends ROOT, CERN's petabyte-scale data processing framework, to address  | 
|  | 72 | +        the critical challenge of managing genomic data that generates upto 200GB per human  | 
|  | 73 | +        genome. By leveraging ROOT's big data expertise and introducing the next-generation  | 
|  | 74 | +        RNTuple columnar storage format specifically optimized for genomic sequences, the  | 
|  | 75 | +        project eliminates the traditional trade-off between compression efficiency and  | 
|  | 76 | +        access speed in bioinformatics. | 
|  | 77 | +
 | 
|  | 78 | +        The project achieved comprehensive genomic data support through validating GeneROOT  | 
|  | 79 | +        baseline performance benchmarks against BAM/SAM formats, implementing RNTuple-based  | 
|  | 80 | +        RAM (ROOT Alignment Maps) format with full SAM/BAM field support and smart reference  | 
|  | 81 | +        management, demonstrating 23.5% smaller file sizes compared to CRAM while delivering  | 
|  | 82 | +        1.9x faster large region queries and 3.2x faster full chromosome scans, optimizing  | 
|  | 83 | +        FASTQ compression from 14.2GB to 6.8GB. We also developed chromosome based  | 
|  | 84 | +        file-splitting for larger genome file so that chromosome based data can be extracted.  | 
|  | 85 | +
 | 
|  | 86 | +         | 
|  | 87 | +      # slides: /assets/presentations/... | 
|  | 88 | + | 
|  | 89 | +- name: "CompilerResearchCon 2025 (day 1)" | 
|  | 90 | +  date: 2025-10-30 15:00:00 +0200 | 
|  | 91 | +  time_cest: "15:00" | 
|  | 92 | +  connect: "[Link to zoom](https://princeton.zoom.us/j/97915651167?pwd=MXJ1T2lhc3Z5QWlYbUFnMTZYQlNRdz09)" | 
|  | 93 | +  label: crcon25_part_1 | 
|  | 94 | +  agenda: | 
|  | 95 | +    - title: "CARTopiaX an Agent-Based Simulation of CAR -T -Cell Therapy built on BioDynaMo" | 
|  | 96 | +      speaker: | 
|  | 97 | +        name: "Salvador de la Torre Gonzalez" | 
|  | 98 | +      time_cest: "15:00 - 15:20" | 
|  | 99 | +      description: | | 
|  | 100 | +        CAR- T-cell therapy is a form of cancer immunotherapy that engineers a  | 
|  | 101 | +        patient’s T cells to recognize and eliminate malignant cells. Although  | 
|  | 102 | +        highly effective in leukemias and other hematological cancers, this therapy  | 
|  | 103 | +        faces significant challenges in solid tumors due to the complex and  | 
|  | 104 | +        heterogeneous tumor microenvironment. CARTopiaX is an advanced agent-based  | 
|  | 105 | +        model developed to address this challenge, using the mathematical framework  | 
|  | 106 | +        proposed in the Nature paper “In silico study of heterogeneous tumour-derived  | 
|  | 107 | +        organoid response to CAR T-cell therapy,” successfully replicating its core  | 
|  | 108 | +        results. Built on BioDynaMo, a high-performance, open-source platform for  | 
|  | 109 | +        large-scale and modular biological modeling, CARTopiaX enables detailed  | 
|  | 110 | +        exploration of complex biological interactions, hypothesis testing, and  | 
|  | 111 | +        data-driven discovery within solid tumor microenvironments.  | 
|  | 112 | +
 | 
|  | 113 | +        The project achieved major milestones, including simulations that run more than  | 
|  | 114 | +        twice as fast as previous model, allowing rapid scenario exploration and robust  | 
|  | 115 | +        hypothesis validation; high-quality, well-structured, and maintainable C++ code  | 
|  | 116 | +        developed following modern software engineering principles; and a scalable,  | 
|  | 117 | +        modular, and extensible architecture that fosters collaboration, customization,  | 
|  | 118 | +        and the continuous evolution of an open-source ecosystem. Altogether, this work  | 
|  | 119 | +        represents a meaningful advancement in computational biology, providing  | 
|  | 120 | +        researchers with a powerful tool to investigate CAR- T- cell dynamics in solid  | 
|  | 121 | +        tumors and accelerating scientific discovery while reducing the time and cost  | 
|  | 122 | +        associated with experimental wet-lab research. | 
|  | 123 | +
 | 
|  | 124 | +      # slides: /assets/presentations/... | 
|  | 125 | + | 
|  | 126 | +    - title: "Efficient LLM Training in C++ via Compiler-Level Autodiff with Clad" | 
|  | 127 | +      speaker: | 
|  | 128 | +        name: "Rohan Timmaraju" | 
|  | 129 | +      time_cest: "15:20 - 15:40" | 
|  | 130 | +      description: | | 
|  | 131 | +        The computational demands of Large Language Model (LLM) training are  | 
|  | 132 | +        often constrained by the performance of Python frameworks. This project  | 
|  | 133 | +        tackles these bottlenecks by developing a high-performance LLM training  | 
|  | 134 | +        pipeline in C++ using Clad, a Clang plugin for compiler-level automatic  | 
|  | 135 | +        differentiation. The core of this work involved creating cladtorch, a new  | 
|  | 136 | +        C++ tensor library with a PyTorch-style API designed for compatibility  | 
|  | 137 | +        with Clad's differentiation capabilities. This library provides a more  | 
|  | 138 | +        user-friendly interface for building and training neural networks while  | 
|  | 139 | +        enabling Clad to automatically generate gradient computations for  | 
|  | 140 | +        backpropagation. | 
|  | 141 | +
 | 
|  | 142 | +        Throughout the project, I successfully developed two distinct LLM training  | 
|  | 143 | +        implementations. The first, using the cladtorch library, established a  | 
|  | 144 | +        functional and flexible framework for Clad-driven AD. To further push  | 
|  | 145 | +        performance boundaries, I then developed a second, highly-optimized  | 
|  | 146 | +        implementation inspired by llm.c, which utilizes pre-allocated memory buffers  | 
|  | 147 | +        and custom kernels. This optimized C-style approach, when benchmarked for  | 
|  | 148 | +        GPT-2 training on a multithreaded CPU, outperformed the equivalent PyTorch  | 
|  | 149 | +        implementation. This work successfully demonstrates the viability and  | 
|  | 150 | +        performance benefits of compiler-based AD for deep learning in C++ and  | 
|  | 151 | +        provides a strong foundation for future hardware acceleration, such as porting  | 
|  | 152 | +        the implementation to CUDA. | 
|  | 153 | +         | 
|  | 154 | +      # slides: /assets/presentations/... | 
|  | 155 | + | 
|  | 156 | +    - title: "Implement and improve an efficient, layered tape with prefetching capabilities" | 
|  | 157 | +      speaker: | 
|  | 158 | +        name: "Aditi Milind Joshi" | 
|  | 159 | +      time_cest: "15:40 - 16:00" | 
|  | 160 | +      description: | | 
|  | 161 | +        Clad relies on a tape data structure to store intermediate values during reverse  | 
|  | 162 | +        mode differentiation. This project focuses on enhancing the core tape implementation  | 
|  | 163 | +        in Clad to make it more efficient and scalable. Key deliverables include replacing  | 
|  | 164 | +        the existing dynamic array-based tape with a slab allocation approach and small  | 
|  | 165 | +        buffer optimization, enabling multilayer storage, and introducing thread safety to  | 
|  | 166 | +        support concurrent access. | 
|  | 167 | +
 | 
|  | 168 | +        The current implementation replaces the dynamic array with a slab-based structure  | 
|  | 169 | +        and a small static buffer, eliminating costly reallocations. Thread-safe access  | 
|  | 170 | +        functions have been added through a mutex locking mechanism, ensuring safe parallel  | 
|  | 171 | +        tape operations. Ongoing work includes developing a multilayer tape system with  | 
|  | 172 | +        offloading capabilities, which will allow only the most recent slabs to remain in  | 
|  | 173 | +        memory. | 
|  | 174 | +
 | 
|  | 175 | +         | 
|  | 176 | +      # slides: /assets/presentations/... | 
|  | 177 | + | 
|  | 178 | +    - title: "Support usage of Thrust API in Clad" | 
|  | 179 | +      speaker: | 
|  | 180 | +        name: "Abdelrhman Elrawy" | 
|  | 181 | +      time_cest: "16:00 - 16:20" | 
|  | 182 | +      description: | | 
|  | 183 | +        This project integrates NVIDIA's Thrust library into Clad, a Clang-based automatic  | 
|  | 184 | +        differentiation tool for C++. By extending Clad's source-to-source transformation  | 
|  | 185 | +        engine to recognize and differentiate Thrust parallel algorithms, the project  | 
|  | 186 | +        enables automatic gradient generation for GPU-accelerated scientific computing  | 
|  | 187 | +        and machine learning applications. | 
|  | 188 | +
 | 
|  | 189 | +        The project achieved Thrust support in Clad through implementing custom derivatives  | 
|  | 190 | +        for core algorithms including thrust::reduce, thrust::transform,  | 
|  | 191 | +        thrust::transform_reduce, thrust::inner_product, thrust::copy, scan operations  | 
|  | 192 | +        (inclusive/exclusive), thrust::adjacent_difference, and sorting primitives.  | 
|  | 193 | +        Additional contributions include Thrust data containers like thrust::device_vector,  | 
|  | 194 | +        generic functor handling for transformations, demonstration applications, and  | 
|  | 195 | +        comprehensive unit tests. | 
|  | 196 | +         | 
|  | 197 | +      # slides: /assets/presentations/... | 
0 commit comments